{"doi":"10.1038/s41467-021-22905-7","title":"SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes","abstract":"Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution.","journal":"Nature Communications","year":2021,"id":10145,"datarank":5.622185199968646,"base_score":5.1647859739235145,"endowment":5.1647859739235145,"self_citation_contribution":0.7747178960885273,"citation_network_contribution":4.847467303880118,"self_endowment_contribution":0.7747178960885273,"citer_contribution":4.847467303880118,"corpus_percentile":null,"corpus_rank":null,"citation_count":174,"citer_count":169,"citers_with_citation_signal":142,"citers_with_endowment":142,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.3858,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2021-05-11","algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":5957,"name":"Rachel Sealfon","orcid":"0000-0002-3007-4698","position":1,"is_corresponding":false},{"id":14693,"name":"Sharon L. R. Kardia","orcid":"0000-0002-9853-3379","position":2,"is_corresponding":false},{"id":292,"name":"Irwin Jungreis","orcid":"0000-0002-3197-5367","position":0,"is_corresponding":true}],"reference_count":100,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}