{"doi":"10.1093/hmg/ddab203","title":"False positive findings during genome-wide association studies with imputation: influence of allele frequency and imputation accuracy","abstract":"<jats:title>Abstract</jats:title>\n               <jats:p>Genotype imputation is widely used in genetic studies to boost the power of GWAS, to combine multiple studies for meta-analysis and to perform fine mapping. With advances of imputation tools and large reference panels, genotype imputation has become mature and accurate. However, the uncertain nature of imputed genotypes can cause bias in the downstream analysis. Many studies have compared the performance of popular imputation approaches, but few investigated bias characteristics of downstream association analyses. Herein, we showed that the imputation accuracy is diminished if the real genotypes contain minor alleles. Although these genotypes are less common, which is particularly true for loci with low minor allele frequency, a large discordance between imputed and observed genotypes significantly inflated the association results, especially in data with a large portion of uncertain SNPs. The significant discordance of P-values happened as the P-value approached 0 or the imputation quality was poor. Although elimination of poorly imputed SNPs can remove false positive (FP) SNPs, it sacrificed, sometimes, more than 80% true positive (TP) SNPs. For top ranked SNPs, removing variants with moderate imputation quality cannot reduce the proportion of FP SNPs, and increasing sample size in reference panels did not greatly benefit the results as well. Additionally, samples with a balanced ratio between cases and controls can dramatically improve the number of TP SNPs observed in the imputation based GWAS. These results raise concerns about results from analysis of association studies when rare variants are studied, particularly when case–control studies are unbalanced.</jats:p>","journal":"Human Molecular Genetics","year":2021,"id":14590,"datarank":0.860081768022205,"base_score":3.258096538021482,"endowment":3.258096538021482,"self_citation_contribution":0.4887144807032224,"citation_network_contribution":0.3713672873189826,"self_endowment_contribution":0.4887144807032224,"citer_contribution":0.3713672873189826,"corpus_percentile":null,"corpus_rank":null,"citation_count":25,"citer_count":25,"citers_with_citation_signal":14,"citers_with_endowment":14,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":null,"is_oa":false,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":null,"fair_score":48.75,"fair_percentile":44.94283201407212,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":15498,"name":"Xiangjun Xiao","orcid":null,"position":1,"is_corresponding":false},{"id":88591,"name":"Wen Zhou","orcid":"0000-0002-7506-3669","position":2,"is_corresponding":false},{"id":97265,"name":"Dakai Zhu","orcid":"0000-0002-1938-9947","position":3,"is_corresponding":false},{"id":112179,"name":"Christopher I Amos","orcid":null,"position":4,"is_corresponding":false},{"id":16174,"name":"Zhihui Zhang","orcid":"0000-0002-3611-7601","position":0,"is_corresponding":false}],"reference_count":0,"raw_metadata":{"has_enrichment":true,"base_score":3.258096538021482,"endowment":3.258096538021482,"datacite_reuse_total":0,"file_count":0,"downloads":0,"views":0,"has_version_chain":false,"is_dataset":false,"is_oa":false,"pmid":"34368847","pmcid":"PMC8682785","openalex_id":"https://openalex.org/W3190133082","authors":[],"funders":[{"funder_name":"Cancer Prevention Research Institute of Texas","grant_id":"RR170048","title":null},{"funder_name":"National Institutes of Health","grant_id":"R01CA242218","title":null},{"funder_name":"National Institutes of Health","grant_id":"U19CA203654","title":null},{"funder_name":"NCI NIH HHS","grant_id":"R03 CA256222","title":null},{"funder_name":"National Institutes of Health","grant_id":"5U19CA203654-04","title":"Integrative analysis of lung cancer etiology and risk"},{"funder_name":"National Institutes of Health","grant_id":"1R01CA242218-01","title":"Precision approaches to refining TP53-associated cancer risk"}],"total_grants":6,"fwci":1.3224,"citation_percentile":0.82241804,"influential_citations":1,"citation_trend":[{"year":2021,"count":1},{"year":2022,"count":1},{"year":2023,"count":3},{"year":2024,"count":3},{"year":2025,"count":14},{"year":2026,"count":3}],"oa_status":"green","license":"OUP Standard Publication Reuse","oa_locations":[{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/8682785","host_type":"repository"},{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8682785","host_type":"GREEN"},{"url":"https://www.ncbi.nlm.nih.gov/pmc/articles/8682785","host_type":"repository"},{"url":"http://academic.oup.com/hmg/advance-article-pdf/doi/10.1093/hmg/ddab203/39843780/ddab203.pdf","host_type":"publisher"},{"url":"https://academic.oup.com/hmg/article-pdf/31/1/146/41796300/ddab203.pdf","host_type":"publisher"},{"url":"https://doi.org/10.1093/hmg/ddab203","host_type":"journal"},{"url":"https://pubmed.ncbi.nlm.nih.gov/34368847","host_type":"repository"},{"url":"https://dx.doi.org/10.1093/hmg/ddab203","host_type":""}],"fields_of_study":["Genetic Associations and Epidemiology","Genetic and phenotypic traits in livestock","Genetic Mapping and Diversity in Plants and Animals","Biology","Medicine","0301 basic medicine","0303 health sciences","03 medical and health sciences","Alleles","Gene Frequency","Genome-Wide Association Study","Genotype","Polymorphism, Single Nucleotide"],"mesh_terms":["Alleles","Gene Frequency","Genotype","Polymorphism, Single Nucleotide","Genome-Wide Association Study"],"keywords":["Imputation (statistics)","Genome-wide association study","Minor allele frequency","Single-nucleotide polymorphism","Biology","Genetic association","Genotype","Genetics","Allele","1000 Genomes Project","Allele frequency","Statistical power","Statistics","Missing data","Gene","Mathematics","Gene Frequency","Polymorphism, Single Nucleotide","Alleles"],"sdg_mappings":[{"sdg_number":0,"sdg_label":"No poverty"}],"linked_datasets":[],"clinical_trials":[],"software_tools":[],"database_accessions":[],"source":"live","citation_network_status":"fetched"},"created_at":"2026-06-01T12:41:46.547515Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":100.0,"fair_a":70.0,"fair_i":0.0,"fair_r":25.0,"fair_zscore":0.3201,"fair_rationale":{"fair_score":48.75,"has_llm":false,"dimensions":{"F":{"name":"Findable","score":100.0,"criteria":[{"key":"f_has_doi","label":"Has a persistent DOI","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"DOI present","rationale":null},{"key":"f_repository_presence","label":"Indexed in repositories / literature DBs","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"datacite=0, pmcid=True, pmid=True","rationale":null},{"key":"f_persistent_ids","label":"Resolvable scholarly identifiers (OpenAlex)","kind":"deterministic","weight":0.5,"fraction":1.0,"signal":"OpenAlex id present","rationale":null}]},"A":{"name":"Accessible","score":70.0,"criteria":[{"key":"a_open_access","label":"Open Access / files deposited","kind":"deterministic","weight":1.5,"fraction":0.5,"signal":"files/OA location present but not flagged OA","rationale":null},{"key":"a_retrievable","label":"Free full text retrievable","kind":"deterministic","weight":1.0,"fraction":1.0,"signal":"8 OA location(s)","rationale":null}]},"I":{"name":"Interoperable","score":0.0,"criteria":[{"key":"i_linked_data","label":"Linked datasets / DataCite relations","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"linked_datasets=0, datacite=0","rationale":null},{"key":"i_standard_ids","label":"References data via standard accessions","kind":"deterministic","weight":1.0,"fraction":0.0,"signal":"accessions=0, trials=0","rationale":null}]},"R":{"name":"Reusable","score":25.0,"criteria":[{"key":"r_license","label":"Clear, open reuse license","kind":"deterministic","weight":1.5,"fraction":0.5,"signal":"license present (OUP Standard Publication Reuse)","rationale":null},{"key":"r_downloads","label":"Demonstrated reuse (downloads)","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"downloads=0","rationale":null},{"key":"r_version","label":"Versioned / maintained","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"no version chain","rationale":null},{"key":"r_dataset","label":"Classified as a data resource","kind":"deterministic","weight":0.5,"fraction":0.0,"signal":"not a dataset","rationale":null}]}},"suggestions":["Link the underlying datasets via DOIs / DataCite relations.","Reference data using standard accessions (e.g. GEO, PDB, ClinicalTrials.gov).","Maintain explicit versioning for the dataset.","Make the paper/data Open Access or deposit the files in an open repository.","Attach a clear, open reuse license (e.g. CC-BY or CC0)."],"model":null,"agent_version":"fair_agent_v1","fulltext_source":"abstract_only"},"fair_model":null,"fair_agent_version":"fair_agent_v1","fair_fulltext_source":"abstract_only","fair_has_llm":false,"fair_computed_at":"2026-06-16T23:03:14.932580Z","clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}