{"doi":"10.1093/nar/gkab1038","title":"The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences","abstract":"The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.","journal":"Nucleic Acids Research","year":2021,"id":9571,"datarank":12.079696039687194,"base_score":8.781248333236862,"endowment":8.781248333236862,"self_citation_contribution":1.3171872499855295,"citation_network_contribution":10.762508789701664,"self_endowment_contribution":1.3171872499855295,"citer_contribution":10.762508789701664,"corpus_percentile":82.75020341741254,"corpus_rank":213,"citation_count":6690,"citer_count":196,"citers_with_citation_signal":196,"citers_with_endowment":196,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9555,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2021-11-01","fair_score":69.5833,"fair_percentile":99.0325417766051,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":79924,"name":"Jingwen Bai","orcid":null,"position":1,"is_corresponding":false},{"id":80041,"name":"Chakradhar Bandla","orcid":"0000-0001-6392-3759","position":2,"is_corresponding":false},{"id":80042,"name":"David García-Seisdedos","orcid":null,"position":3,"is_corresponding":false},{"id":79926,"name":"Suresh Hewapathirana","orcid":"0000-0002-7862-5022","position":4,"is_corresponding":false},{"id":80043,"name":"Selvakumar Kamatchinathan","orcid":"0009-0001-3644-2586","position":5,"is_corresponding":false},{"id":79927,"name":"Deepti J Kundu","orcid":"0000-0003-2989-5971","position":6,"is_corresponding":false},{"id":80044,"name":"Ananth Prakash","orcid":"0000-0001-5799-9618","position":7,"is_corresponding":false},{"id":80045,"name":"Anika Frericks-Zipper","orcid":null,"position":8,"is_corresponding":false},{"id":79931,"name":"Martin Eisenacher","orcid":"0000-0003-2687-7444","position":9,"is_corresponding":false},{"id":79940,"name":"Mathias Walzer","orcid":"0000-0003-4538-2754","position":10,"is_corresponding":false},{"id":80046,"name":"Shengbo Wang","orcid":"0000-0001-5034-6374","position":11,"is_corresponding":false},{"id":32867,"name":"Ann-Christine Syvanen","orcid":null,"position":12,"is_corresponding":false},{"id":17887,"name":"Juan Antonio Vizcaino","orcid":"0000-0002-3905-4335","position":13,"is_corresponding":false},{"id":80047,"name":"David García‐Seisdedos","orcid":"0000-0002-8364-165X","position":14,"is_corresponding":false},{"id":33701,"name":"Joëlle Pineau","orcid":"0000-0003-0747-7250","position":15,"is_corresponding":false},{"id":19125,"name":"Yasset Perez-Riverol","orcid":"0000-0001-6579-6941","position":0,"is_corresponding":true}],"reference_count":58,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"34723319","pmcid":"PMC8728295","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"public-domain","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":90.0,"fair_a":80.0,"fair_i":50.0,"fair_r":58.3333,"fair_zscore":2.2046,"fair_rationale":{"fair_score":69.58,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":90.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper describes rich metadata practices including MAGE-TAB for proteomics with SDRF-Proteomics files, BioSample integration, and ontology-based annotations via OLS."}]},"A":{"name":"Accessible","score":80.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"Data access is clearly described via PRIDE Archive web interface, REST API, FTP, Aspera, and programmatic access via pridepy, with a CC0 license for new datasets."}]},"I":{"name":"Interoperable","score":50.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The resource extensively uses PSI standard formats (mzIdentML, mzTab, mzML), Universal Spectrum Identifiers, and controlled vocabularies from ontologies."}]},"R":{"name":"Reusable","score":58.33,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":1.0,"signal":null,"rationale":"The paper provides a strong data-availability statement (CC0 license), tracks dataset reuses in publications, and enables reproducibility through complete submissions, reanalysis pipelines, and integration into other resources."}]}},"suggestions":["Explicitly include a formal data-availability statement with a DOI for the paper itself.","Provide an explicit license for the paper's content beyond the CC0 for datasets.","Enhance machine-readability by providing structured metadata in a schema.org or similar format for the paper.","Improve findability by registering the dataset of the paper (if any) in a repository with a persistent identifier.","Ensure that all software versions and analysis parameters are clearly documented in a reproducible workflow."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:28:12.045586Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}