{"doi":"10.1093/bioinformatics/btr507","title":"FLASH: fast length adjustment of short reads to improve genome assemblies","abstract":"<h4>Motivation</h4>Next-generation sequencing technologies generate very large numbers of short reads. Even with very deep genome coverage, short read lengths cause problems in de novo assemblies. The use of paired-end libraries with a fragment size shorter than twice the read length provides an opportunity to generate much longer reads by overlapping and merging read pairs before assembling a genome.<h4>Results</h4>We present FLASH, a fast computational tool to extend the length of short reads by overlapping paired-end reads from fragment libraries that are sufficiently short. We tested the correctness of the tool on one million simulated read pairs, and we then applied it as a pre-processor for genome assemblies of Illumina reads from the bacterium Staphylococcus aureus and human chromosome 14. FLASH correctly extended and merged reads >99% of the time on simulated reads with an error rate of <1%. With adequately set parameters, FLASH correctly merged reads over 90% of the time even when the reads contained up to 5% errors. When FLASH was used to extend reads prior to assembly, the resulting assemblies had substantially greater N50 lengths for both contigs and scaffolds.<h4>Availability and implementation</h4>The FLASH system is implemented in C and is freely available as open-source code at http://www.cbcb.umd.edu/software/flash.<h4>Contact</h4>t.magoc@gmail.com.","journal":"Bioinformatics","year":2011,"id":9597,"datarank":1.4475503584471758,"base_score":9.65033572298117,"endowment":9.65033572298117,"self_citation_contribution":1.4475503584471758,"citation_network_contribution":0.0,"self_endowment_contribution":1.4475503584471758,"citer_contribution":0.0,"corpus_percentile":null,"corpus_rank":null,"citation_count":15526,"citer_count":0,"citers_with_citation_signal":0,"citers_with_endowment":0,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.0385,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2011-09-07","fair_score":null,"fair_percentile":null,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":4334,"name":"Steven L. Salzberg","orcid":"0000-0002-8859-7432","position":1,"is_corresponding":false},{"id":80259,"name":"Tanja Magoč","orcid":null,"position":0,"is_corresponding":true}],"reference_count":9,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":null,"fair_a":null,"fair_i":null,"fair_r":null,"fair_zscore":null,"fair_rationale":null,"fair_model":null,"fair_agent_version":null,"fair_fulltext_source":null,"fair_has_llm":null,"fair_computed_at":null,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}