{"doi":"10.48550/arxiv.2005.14165","title":"Language Models are Few-Shot Learners","abstract":"Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.","journal":"arXiv (Cornell University)","year":2020,"id":11862,"datarank":13.373847670866079,"base_score":8.016317898503415,"endowment":8.016317898503415,"self_citation_contribution":1.2024476847755123,"citation_network_contribution":12.171399986090567,"self_endowment_contribution":1.2024476847755123,"citer_contribution":12.171399986090567,"corpus_percentile":83.6,"corpus_rank":2435,"citation_count":3029,"citer_count":199,"citers_with_citation_signal":199,"citers_with_endowment":199,"datacite_reuse_total":0,"is_dataset":false,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":true,"published_date":"2020-05-28","authors":[{"id":95281,"name":"Benjamin Mann","orcid":null,"position":1,"is_corresponding":false},{"id":95282,"name":"Nick Ryder","orcid":null,"position":2,"is_corresponding":false},{"id":95283,"name":"Melanie Subbiah","orcid":null,"position":3,"is_corresponding":false},{"id":95284,"name":"Jared Kaplan","orcid":null,"position":4,"is_corresponding":false},{"id":95285,"name":"Prafulla Dhariwal","orcid":null,"position":5,"is_corresponding":false},{"id":95286,"name":"Arvind Neelakantan","orcid":null,"position":6,"is_corresponding":false},{"id":95287,"name":"Pranav Shyam","orcid":null,"position":7,"is_corresponding":false},{"id":95288,"name":"Girish Sastry","orcid":null,"position":8,"is_corresponding":false},{"id":95289,"name":"Amanda Askell","orcid":null,"position":9,"is_corresponding":false},{"id":95290,"name":"Sandhini Agarwal","orcid":null,"position":10,"is_corresponding":false},{"id":95291,"name":"Ariel Herbert-Voss","orcid":null,"position":11,"is_corresponding":false},{"id":95292,"name":"Gretchen Krueger","orcid":null,"position":12,"is_corresponding":false},{"id":95293,"name":"Tom Henighan","orcid":null,"position":13,"is_corresponding":false},{"id":95294,"name":"Rewon Child","orcid":null,"position":14,"is_corresponding":false},{"id":40854,"name":"Aditya Ramesh","orcid":"0000-0001-5984-8282","position":15,"is_corresponding":false},{"id":95295,"name":"Daniel M. Ziegler","orcid":null,"position":16,"is_corresponding":false},{"id":95296,"name":"Jeffrey Wu","orcid":null,"position":17,"is_corresponding":false},{"id":95297,"name":"Clemens Winter","orcid":null,"position":18,"is_corresponding":false},{"id":95298,"name":"Christopher Hesse","orcid":null,"position":19,"is_corresponding":false},{"id":95299,"name":"Mark Chen","orcid":"0000-0001-9369-5830","position":20,"is_corresponding":false},{"id":95300,"name":"Eric J. Sigler","orcid":"0000-0002-1063-4440","position":21,"is_corresponding":false},{"id":95301,"name":"Mateusz Litwin","orcid":null,"position":22,"is_corresponding":false},{"id":95302,"name":"Scott Gray","orcid":null,"position":23,"is_corresponding":false},{"id":95303,"name":"Benjamin Chess","orcid":null,"position":24,"is_corresponding":false},{"id":95304,"name":"Jack Clark","orcid":null,"position":25,"is_corresponding":false},{"id":95305,"name":"Christopher Berner","orcid":null,"position":26,"is_corresponding":false},{"id":95306,"name":"Sam McCandlish","orcid":null,"position":27,"is_corresponding":false},{"id":95307,"name":"Alec Radford","orcid":null,"position":28,"is_corresponding":false},{"id":58454,"name":"Ilya Sutskever","orcid":null,"position":29,"is_corresponding":false},{"id":95308,"name":"Dario Amodei","orcid":null,"position":30,"is_corresponding":false},{"id":95280,"name":"T. B. Brown","orcid":null,"position":0,"is_corresponding":false}],"reference_count":127,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}