{"doi":"10.1093/nar/gkaf1057","title":"PRIME: a database for 16S rRNA microbiome data with phenotypic reference and comprehensive metadata","abstract":"PRIME (Phenotypic Reference for Integrated Microbiome Enrichment) is a curated and standardized database of human microbiome 16S rRNA amplicon sequencing data, designed to facilitate cross-study analysis, reproducibility, and phenotype-driven discovery. PRIME aggregates 53 449 samples from 111 public studies, covering 93 body sites and 101 phenotypic categories, with detailed harmonization of sample-level metadata such as disease status, demographics, body sites, sequencing protocols, and experimental design. Each sample includes taxonomic abundance profiles generated via a consistent pipeline using both SILVA (138.2) and Greengenes2 (2024.09) reference databases, with results reported at multiple taxonomic levels as observed abundances (read counts) and relative abundances (proportions). A major strength of PRIME is its extensive manual curation, which standardizes phenotypic and contextual metadata across studies, enabling precise querying and robust phenotype-based comparisons. Users can interactively explore the database through a modern web interface, filter and visualize data by metadata fields, and download customized subsets. Programmatic access is supported via RESTful APIs and R package. PRIME aims to advance microbiome data integration and is continuously updated to incorporate new studies and features. The database is freely available at https://primedb.sjtu.edu.cn.","journal":"Nucleic Acids Research","year":2025,"id":279,"datarank":0.0,"base_score":0.0,"endowment":0.0,"self_citation_contribution":0.0,"citation_network_contribution":0.0,"self_endowment_contribution":0.0,"citer_contribution":0.0,"corpus_percentile":0.0,"corpus_rank":765,"citation_count":0,"citer_count":0,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9507,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2025-10-31","fair_score":61.25,"fair_percentile":92.70008795074759,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":2737,"name":"Luca Pinello","orcid":"0000-0003-1195-9607","position":1,"is_corresponding":false},{"id":2738,"name":"Tao Wang","orcid":"0000-0002-1218-4017","position":2,"is_corresponding":false},{"id":2736,"name":"Zhizhuo Zhang","orcid":"0000-0003-1202-4037","position":0,"is_corresponding":true}],"reference_count":36,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"41171140","pmcid":"PMC12807763","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":77.5,"fair_a":80.0,"fair_i":37.5,"fair_r":50.0,"fair_zscore":1.4508,"fair_rationale":{"fair_score":61.25,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":77.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper describes extensive manual curation and standardization of metadata with a controlled vocabulary and high-level system categories, but does not provide evidence of machine-readable standardized formats like JSON-LD or schema.org markup."}]},"A":{"name":"Accessible","score":80.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":1.0,"signal":null,"rationale":"The paper specifies a clear data access protocol with a public web interface, RESTful API, and R package, all freely accessible without login or restrictions, and includes a permanent DOI link."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"Standard bioinformatics formats (JSON, QIIME 2) and controlled vocabularies are used, and taxonomic identifiers are mapped to NCBI Taxonomy, but there is no explicit mention of community-standard metadata schemas like MIxS."}]},"R":{"name":"Reusable","score":50.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.833,"signal":null,"rationale":"A data-availability statement with MIT license, a permanent DOI, and programmatic access are provided, but the paper lacks explicit reproducibility details such as version numbers for all software and exact parameter settings for the full pipeline."}]}},"suggestions":["Publish metadata in a machine-readable format such as JSON-LD or schema.org to enhance findability.","Adopt and state use of community metadata standards like MIxS (Minimum Information about any (x) Sequence).","Include a full reproducible workflow with exact software versions and all parameters in the supplementary materials.","Add a clear citation policy and formal data license for the API and R package outputs.","Document and version all automated curation scripts and bioinformatic pipeline code in a public repository."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T06:51:38.215455Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}