{"doi":"10.1093/nar/gku989","title":"UniProt: a hub for protein information","abstract":"UniProt is an important collection of protein sequences and their annotations, which has doubled in size to 80 million sequences during the past year. This growth in sequences has prompted an extension of UniProt accession number space from 6 to 10 characters. An increasing fraction of new sequences are identical to a sequence that already exists in the database with the majority of sequences coming from genome sequencing projects. We have created a new proteome identifier that uniquely identifies a particular assembly of a species and strain or subspecies to help users track the provenance of sequences. We present a new website that has been designed using a user-experience design process. We have introduced an annotation score for all entries in UniProt to represent the relative amount of knowledge known about each protein. These scores will be helpful in identifying which proteins are the best characterized and most informative for comparative analysis. All UniProt data is provided freely and is available on the web at http://www.uniprot.org/.","journal":"Nucleic Acids Research","year":2014,"id":6381,"datarank":23.186372980190963,"base_score":8.557374981049069,"endowment":8.557374981049069,"self_citation_contribution":1.2836062471573606,"citation_network_contribution":21.9027667330336,"self_endowment_contribution":1.2836062471573606,"citer_contribution":21.9027667330336,"corpus_percentile":97.07078925956061,"corpus_rank":37,"citation_count":5274,"citer_count":200,"citers_with_citation_signal":200,"citers_with_endowment":200,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9506,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2014-10-27","fair_score":59.1667,"fair_percentile":92.10642040457344,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":58928,"name":"The UniProt Consortium","orcid":null,"position":0,"is_corresponding":true}],"reference_count":18,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"25348405","pmcid":"PMC4384041","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":77.5,"fair_a":67.5,"fair_i":50.0,"fair_r":41.6667,"fair_zscore":1.2623,"fair_rationale":{"fair_score":59.17,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":77.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper describes rich metadata (annotation scores, controlled vocabularies, cross-references) but does not explicitly address machine-readability (e.g., standard serialization formats like RDF/XML, schema.org markup)."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The data is freely accessible via a website and API (implicitly via help pages), but no explicit protocol or license for automated access (e.g., API terms, download format) is stated."}]},"I":{"name":"Interoperable","score":50.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper extensively uses standard vocabularies (GO, EC, ChEBI) and identifiers (UniProt accession, proteome ID), and introduces new accession format for interoperability."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"A Creative Commons Attribution license is provided, and data is freely available, but no explicit data-availability statement or reproducibility details (e.g., versioning, persistent identifiers for cited data) are included."}]}},"suggestions":["Include explicit machine-readable metadata serialization (e.g., JSON-LD, RDF) and a schema.org markup for the dataset.","Provide a clear statement of the API usage license and terms of automated access in the paper.","Add a formal data-availability statement that specifies data version, persistent identifiers (DOI), and exact download URLs for cited datasets.","Describe how to reproduce the annotation scores with a step-by-step protocol or reference a standalone script.","List the license of the software (if any) used to generate the data, separate from the data license."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:28:32.961577Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}