{"doi":"10.1101/gr.258640.119","title":"Accurate and complete genomes from metagenomes","abstract":"<jats:p>Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.</jats:p>","journal":"Genome Research","year":2020,"id":44346,"datarank":9.11385521833465,"base_score":6.0014148779611505,"endowment":6.0014148779611505,"self_citation_contribution":0.9002122316941727,"citation_network_contribution":8.213642986640478,"self_endowment_contribution":0.9002122316941727,"citer_contribution":8.213642986640478,"corpus_percentile":79.0,"corpus_rank":314,"citation_count":403,"citer_count":200,"citers_with_citation_signal":200,"citers_with_endowment":200,"datacite_reuse_total":25,"is_dataset":true,"is_dataset_confidence":null,"is_oa":false,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":null,"fair_score":null,"fair_percentile":null,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":40438,"name":"Karthik Anantharaman","orcid":"0000-0002-9584-2491","position":1,"is_corresponding":false},{"id":208917,"name":"Alon Shaiber","orcid":"0000-0002-4806-9280","position":2,"is_corresponding":false},{"id":208918,"name":"A. Murat Eren","orcid":"0000-0001-9013-4827","position":3,"is_corresponding":false},{"id":206153,"name":"Jillian F. Banfield","orcid":"0000-0001-8203-8771","position":4,"is_corresponding":false},{"id":208916,"name":"Lin-Xing Chen","orcid":"0000-0003-2774-1952","position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":{"has_enrichment":true,"base_score":6.0014148779611505,"endowment":6.0014148779611505,"datacite_reuse_total":25,"file_count":0,"downloads":0,"views":0,"has_version_chain":false,"is_dataset":false,"is_oa":false,"pmid":"32188701","pmcid":"PMC7111523","openalex_id":"https://openalex.org/W3011678041","authors":[],"funders":[{"funder_name":"Lawrence Berkeley National Laboratory's Watershed Function Scientific Focus Area","grant_id":"DE-AC02-05CH11231","title":null},{"funder_name":"National Institutes of Health","grant_id":"RAI092531A","title":null},{"funder_name":"National Institutes of Health","grant_id":"R01-GM109454","title":null},{"funder_name":"NIAID NIH HHS","grant_id":"R01 AI092531","title":null},{"funder_name":"NIGMS NIH HHS","grant_id":"R01 GM109454","title":null},{"funder_name":"National Institutes of Health","grant_id":"5R01GM109454-03","title":"Methods for inference of complex demography and selection from genomic data"},{"funder_name":"Office of Science and Office of Biological and Environmental Research","grant_id":"","title":null},{"funder_name":"Genome Canada","grant_id":"","title":null},{"funder_name":"Innovative Genomics Institute","grant_id":"","title":null},{"funder_name":"Chan Zuckerberg Biohub","grant_id":"","title":null}],"total_grants":10,"fwci":null,"citation_percentile":null,"influential_citations":12,"citation_trend":[{"year":2019,"count":1},{"year":2020,"count":32},{"year":2021,"count":86},{"year":2022,"count":81},{"year":2023,"count":76},{"year":2024,"count":76},{"year":2025,"count":33},{"year":2026,"count":18}],"oa_status":"bronze","license":"CC BY","oa_locations":[{"url":"https://genome.cshlp.org/content/30/3/315.full.pdf","host_type":"journal"},{"url":"https://genome.cshlp.org/content/30/3/315.full.pdf","host_type":"HYBRID"},{"url":"https://genome.cshlp.org/content/30/3/315.full.pdf","host_type":"publisher"},{"url":"https://syndication.highwire.org/content/doi/10.1101/gr.258640.119","host_type":"publisher"},{"url":"https://doi.org/10.1101/gr.258640.119","host_type":"journal"},{"url":"https://pubmed.ncbi.nlm.nih.gov/32188701","host_type":"repository"},{"url":"https://www.osti.gov/biblio/1625640","host_type":"repository"},{"url":"https://escholarship.org/uc/item/0zd6w9cv","host_type":"repository"},{"url":"http://genome.cshlp.org/cgi/content/short/30/3/315","host_type":"repository"},{"url":"https://www.osti.gov/biblio/1756327","host_type":"repository"},{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/7111523","host_type":"repository"},{"url":"https://www.osti.gov/servlets/purl/1625640","host_type":"repository"},{"url":"https://europepmc.org/articles/PMC7111523","host_type":"Europe_PMC"},{"url":"https://europepmc.org/articles/PMC7111523?pdf=render","host_type":"Europe_PMC"},{"url":"https://doi.org/10.1101/808410","host_type":""},{"url":"https://escholarship.org/content/qt81z3f98f/qt81z3f98f.pdf?t=qmibxn","host_type":""},{"url":"http://dx.doi.org/10.1101/gr.258640.119","host_type":""},{"url":"https://dx.doi.org/10.1101/gr.258640.119","host_type":""},{"url":"https://dx.doi.org/10.1101/808410","host_type":""},{"url":"https://escholarship.org/content/qt0zd6w9cv/qt0zd6w9cv.pdf","host_type":""},{"url":"https://escholarship.org/uc/item/81z3f98f","host_type":""},{"url":"http://dx.doi.org/10.1101/808410","host_type":""},{"url":"https://doi.org/https://doi.org/10.1101/gr.258640.119","host_type":""}],"fields_of_study":["Genomics and Phylogenetic Studies","Microbial Community Ecology and Physiology","Bacteriophages and microbial interactions","Biology","Medicine","Environmental Science","0301 basic medicine","0303 health sciences","03 medical and health sciences","Data Curation","Genome, Archaeal","Genome, Bacterial","Metagenome","Metagenomics"],"mesh_terms":["Genome, Bacterial","Genome, Archaeal","Metagenome","Metagenomics","Data Curation"],"keywords":["Genome","Biology","Metagenomics","Bacterial genome size","Computational biology","Genomics","Shotgun sequencing","Genetics","Sequence assembly","Gene","Transcriptome","570","Bioinformatics","Human Genome","Bioinformatics and Computational Biology","Bacterial","Review","Biological Sciences","Microbiology","Medical and Health Sciences","576","Archaeal","Genome, Archaeal","Metagenome","Generic health relevance","Infection","Data Curation","Genome, Bacterial","Biotechnology"],"sdg_mappings":[{"sdg_number":0,"sdg_label":"Life in Land"}],"linked_datasets":[{"doi":"10.6084/m9.figshare.12940795.v1","title":"Additional file 1 of Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.12940795","title":"Additional file 1 of Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.12940813.v1","title":"Additional file 7 of Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.12940813","title":"Additional file 7 of Estimating the quality of eukaryotic genomes recovered from metagenomic analysis with EukCC","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.14775421.v1","title":"Additional file 1 of Linking genomic and physiological characteristics of psychrophilic Arthrobacter to metagenomic data to explain global environmental distribution","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.14775421","title":"Additional file 1 of Linking genomic and physiological characteristics of psychrophilic Arthrobacter to metagenomic data to explain global environmental distribution","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.14776610","title":"Additional file 1 of GUNC: detection of chimerism and contamination in prokaryotic genomes","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.14776616.v1","title":"Additional file 3 of GUNC: detection of chimerism and contamination in prokaryotic genomes","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.14776616","title":"Additional file 3 of GUNC: detection of chimerism and contamination in prokaryotic genomes","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22603592.v1","title":"Additional file 1 of A novel and diverse group of Candidatus Patescibacteria from bathypelagic Lake Baikal revealed through long-read metagenomics","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22603592","title":"Additional file 1 of A novel and diverse group of Candidatus Patescibacteria from bathypelagic Lake Baikal revealed through long-read metagenomics","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22609738.v1","title":"Additional file 1 of MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22609738","title":"Additional file 1 of MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22609741.v1","title":"Additional file 2 of MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22609741","title":"Additional file 2 of MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649383.v1","title":"Additional file 2 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649383","title":"Additional file 2 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649395.v1","title":"Additional file 6 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649395","title":"Additional file 6 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649422.v1","title":"Additional file 11 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.22649422","title":"Additional file 11 of Metabolic independence drives gut microbial colonization and resilience in health and disease","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.26620882.v1","title":"Additional file 1 of happi: a hierarchical approach to pangenomics inference","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.26620882","title":"Additional file 1 of happi: a hierarchical approach to pangenomics inference","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.26620885.v1","title":"Additional file 2 of happi: a hierarchical approach to pangenomics inference","publisher":"figshare","resource_type":"JournalArticle"},{"doi":"10.6084/m9.figshare.26620885","title":"Additional file 2 of happi: a hierarchical approach to pangenomics inference","publisher":"figshare","resource_type":"JournalArticle"}],"clinical_trials":[],"software_tools":[],"database_accessions":[{"name":"gen"}],"source":"live","citation_network_status":"fetched"},"created_at":"2026-06-22T09:26:45.432183Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":null,"fair_a":null,"fair_i":null,"fair_r":null,"fair_zscore":null,"fair_rationale":null,"fair_model":null,"fair_agent_version":null,"fair_fulltext_source":null,"fair_has_llm":null,"fair_computed_at":null,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}