{"doi":"10.1128/msystems.01045-20","title":"Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity","abstract":"Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut.<b>IMPORTANCE</b> Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.","journal":"mSystems","year":2020,"id":3911,"datarank":2.278145453892017,"base_score":4.430816798843313,"endowment":4.430816798843313,"self_citation_contribution":0.6646225198264971,"citation_network_contribution":1.61352293406552,"self_endowment_contribution":0.6646225198264971,"citer_contribution":1.61352293406552,"corpus_percentile":65.58177379983726,"corpus_rank":424,"citation_count":95,"citer_count":68,"citers_with_citation_signal":55,"citers_with_endowment":55,"datacite_reuse_total":0,"is_dataset":true,"is_dataset_confidence":0.8445,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2020-12-22","fair_score":84.1667,"fair_percentile":99.9,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":39731,"name":"Jacobo de la Cuesta-Zuluaga","orcid":"0000-0002-7369-992X","position":1,"is_corresponding":false},{"id":39732,"name":"Georg H. Reischer","orcid":"0000-0002-3962-8685","position":2,"is_corresponding":false},{"id":39733,"name":"Silke Dauser","orcid":null,"position":3,"is_corresponding":false},{"id":39734,"name":"Nathalie Schuster","orcid":null,"position":4,"is_corresponding":false},{"id":39735,"name":"Chris Walzer","orcid":"0000-0002-0437-5147","position":5,"is_corresponding":false},{"id":39736,"name":"Gabrielle Stalder","orcid":"0000-0002-8901-1181","position":6,"is_corresponding":false},{"id":39737,"name":"Andreas H. Farnleitner","orcid":"0000-0002-0542-5425","position":7,"is_corresponding":false},{"id":19807,"name":"Ruth E. Ley","orcid":"0000-0002-9087-1672","position":8,"is_corresponding":false},{"id":39730,"name":"Nicholas D. Youngblut","orcid":"0000-0002-7424-5276","position":0,"is_corresponding":true}],"reference_count":49,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":"33144315","pmcid":"PMC7646530","fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":"gold","license":"cc-by","views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":100.0,"fair_a":70.0,"fair_i":100.0,"fair_r":66.6667,"fair_zscore":0.4143,"fair_rationale":{"fair_score":84.17,"has_llm":false,"dimensions":{"F":{"name":"Findable","score":100.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=22, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"OpenAlex id present","rationale":null}]},"A":{"name":"Accessible","score":70.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":0.5,"signal":"files/OA location present but not flagged OA","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"15 OA location(s)","rationale":null}]},"I":{"name":"Interoperable","score":100.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"linked_datasets=22, datacite=22","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"accessions=1, trials=0","rationale":null}]},"R":{"name":"Reusable","score":66.67,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":1.0,"signal":"open license (cc-by)","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"is_dataset","rationale":null}]}},"suggestions":["Maintain explicit versioning for the dataset.","Make the paper/data Open Access or deposit the files in an open repository."],"model":null,"agent_version":"fair_agent_v2","fulltext_source":"epmc_xml"},"fair_model":null,"fair_agent_version":"fair_agent_v2","fair_fulltext_source":"epmc_xml","fair_has_llm":false,"fair_computed_at":"2026-06-22T09:06:54.549670Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}