{"doi":"10.1093/nar/gkab1019","title":"GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison","abstract":"GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purposes are to increase the reusability and accessibility of human gut metagenomic data, and enable cross-project and phenotype comparisons. To achieve these goals, we performed manual curation on the meta-data and organized the datasets in a phenotype-centric manner. GMrepo v2 contains 353 projects and 71,642 runs/samples, which are significantly increased from the previous version. Among these runs/samples, 45,111 and 26,531 were obtained by 16S rRNA amplicon and whole-genome metagenomics sequencing, respectively. We also increased the number of phenotypes from 92 to 133. In addition, we introduced disease-marker identification and cross-project/phenotype comparison. We first identified disease markers between two phenotypes (e.g. health versus diseases) on a per-project basis for selected projects. We then compared the identified markers for each phenotype pair across datasets to facilitate the identification of consistent microbial markers across datasets. Finally, we provided a marker-centric view to allow users to check if a marker has different trends in different diseases. So far, GMrepo includes 592 marker taxa (350 species and 242 genera) for 47 phenotype pairs, identified from 83 selected projects. GMrepo v2 is freely available at: https://gmrepo.humangut.info.","journal":"Nucleic Acids Research","year":2021,"id":6195,"datarank":4.664345012567562,"base_score":5.111987788356544,"endowment":5.111987788356544,"self_citation_contribution":0.7667981682534817,"citation_network_contribution":3.8975468443140806,"self_endowment_contribution":0.7667981682534817,"citer_contribution":3.8975468443140806,"corpus_percentile":70.95199349064279,"corpus_rank":358,"citation_count":181,"citer_count":174,"citers_with_citation_signal":128,"citers_with_endowment":128,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.9467,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2021-11-12","fair_score":56.0417,"fair_percentile":91.38082673702726,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":58052,"name":"Jiaying Zhu","orcid":"0000-0001-5247-7829","position":1,"is_corresponding":false},{"id":39634,"name":"Chuqing Sun","orcid":"0000-0001-5025-2650","position":2,"is_corresponding":false},{"id":58053,"name":"Min Li","orcid":"0000-0002-0047-2804","position":3,"is_corresponding":false},{"id":58054,"name":"Jinxin Liu","orcid":"0000-0003-0753-5342","position":4,"is_corresponding":false},{"id":39633,"name":"Sicheng Wu","orcid":"0000-0002-5121-029X","position":5,"is_corresponding":false},{"id":39644,"name":"Kang Ning","orcid":"0000-0003-3325-5387","position":6,"is_corresponding":false},{"id":39645,"name":"Li-jie He","orcid":null,"position":7,"is_corresponding":false},{"id":23552,"name":"Xing‐Ming Zhao","orcid":"0000-0002-4531-3970","position":8,"is_corresponding":false},{"id":23546,"name":"Wei-Hua Chen","orcid":"0000-0001-5160-4398","position":9,"is_corresponding":false},{"id":39647,"name":"Lijie He","orcid":null,"position":10,"is_corresponding":false},{"id":39641,"name":"Die Dai","orcid":null,"position":0,"is_corresponding":true}],"reference_count":60,"raw_metadata":null,"created_at":"2026-03-01T18:20:47.508186Z","pmid":"34788838","pmcid":"PMC8728112","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":77.5,"fair_a":67.5,"fair_i":37.5,"fair_r":41.6667,"fair_zscore":0.9797,"fair_rationale":{"fair_score":56.04,"has_llm":true,"dimensions":{"F":{"name":"Findable","score":77.5,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no OpenAlex id","rationale":null},{"key":"f_metadata_richness","label":"Rich, machine-readable metadata","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper describes manual curation of metadata including technical and host-related metadata, but does not explicitly state that metadata is provided in a machine-readable format such as structured JSON or XML."}]},"A":{"name":"Accessible","score":67.5,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"Open Access","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"0 OA location(s)","rationale":null},{"key":"a_access_protocol","label":"Clear data/code access protocol","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper states the database is freely available at a URL and provides REST APIs and a GitHub page for programmable access, but does not specify a formal access protocol like OAuth or a data use agreement."}]},"I":{"name":"Interoperable","score":37.5,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null},{"key":"i_standards","label":"Standard formats, vocabularies & identifiers","kind":"llm","weight":1.0,"fraction":0.75,"signal":null,"rationale":"The paper uses standard formats (FASTQ, Greengenes, NCBI taxonomy) and identifiers (NCBI BioProject, SRA, ENA), but does not mention use of standard vocabularies like MIxS or ontologies for metadata."}]},"R":{"name":"Reusable","score":41.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.0,"signal":"no license","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null},{"key":"r_reusability","label":"Data-availability statement, license & reproducibility","kind":"llm","weight":2.0,"fraction":0.667,"signal":null,"rationale":"The paper provides a data availability statement with a CC BY-NC license, links to download data and code, and describes reproducible methods, but does not include a formal data citation or guarantee long-term preservation."}]}},"suggestions":["Provide metadata in a machine-readable format such as JSON-LD or RDF with explicit links to standard ontologies.","Document the access protocol (e.g., OAuth2) and include a formal data use agreement or license for the API.","Use standard community vocabularies (e.g., MIxS, NCBI Taxon) and persistent identifiers (e.g., DOIs) for all datasets and markers.","Include a formal data availability statement with a persistent identifier (e.g., DOI) and a clear license for the code and data, and ensure all scripts are versioned and archived in a repository like Zenodo."],"model":"deepseek/deepseek-v4-flash","agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":"deepseek/deepseek-v4-flash","fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":true,"fair_computed_at":"2026-06-18T00:37:23.160166Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}