{"doi":"10.1186/1755-8794-8-s3-s1","title":"Konnector v2.0: pseudo-long reads from paired-end sequencing data","abstract":"<h4>Background</h4>Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool.<h4>Results</h4>Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences.<h4>Conclusions</h4>Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.","journal":"BMC Medical Genomics","year":2015,"id":1935,"datarank":0.7301910077747965,"base_score":3.1780538303479458,"endowment":3.1780538303479458,"self_citation_contribution":0.47670807455219194,"citation_network_contribution":0.25348293322260457,"self_endowment_contribution":0.47670807455219194,"citer_contribution":0.25348293322260457,"corpus_percentile":null,"corpus_rank":null,"citation_count":23,"citer_count":11,"citers_with_citation_signal":10,"citers_with_endowment":10,"datacite_reuse_total":0,"is_dataset":false,"is_dataset_confidence":0.063,"is_oa":true,"file_count":0,"downloads":0,"has_version_chain":false,"published_date":"2015-09-23","fair_score":null,"fair_percentile":null,"algorithm_id":"datarank_citation_only_1hop_v6","ranking_scope":"data_only","authors":[{"id":4999,"name":"Chen Yang","orcid":"0000-0001-9202-5309","position":1,"is_corresponding":false},{"id":22081,"name":"Zhuyi Xue","orcid":null,"position":2,"is_corresponding":false},{"id":22082,"name":"Karthika Raghavan","orcid":null,"position":3,"is_corresponding":false},{"id":3632,"name":"Justin Chu","orcid":"0000-0003-0549-4997","position":4,"is_corresponding":false},{"id":22083,"name":"Hamid Mohamadi","orcid":null,"position":5,"is_corresponding":false},{"id":22084,"name":"Shaun D. Jackman","orcid":"0000-0002-9275-5966","position":6,"is_corresponding":false},{"id":22085,"name":"Readman Chiu","orcid":"0000-0002-2215-5535","position":7,"is_corresponding":false},{"id":18717,"name":"René L. Warren","orcid":"0000-0002-9890-2293","position":8,"is_corresponding":false},{"id":18731,"name":"Inanc Birol","orcid":"0000-0003-0950-7839","position":10,"is_corresponding":false},{"id":1170,"name":"Benjamin P. Vandervalk","orcid":"0000-0002-3082-3476","position":0,"is_corresponding":true}],"reference_count":28,"raw_metadata":{"citation_network_status":"fetched"},"created_at":"2026-03-01T18:20:47.508186Z","pmid":null,"pmcid":null,"fwci":null,"citation_percentile":null,"influential_citations":0,"oa_status":null,"license":null,"views":0,"total_file_size_bytes":0,"version_count":0,"fair_f":null,"fair_a":null,"fair_i":null,"fair_r":null,"fair_zscore":null,"fair_rationale":null,"fair_model":null,"fair_agent_version":null,"fair_fulltext_source":null,"fair_has_llm":null,"fair_computed_at":null,"clinical_trials":[],"software_tools":[],"db_accessions":[],"linked_datasets":[],"topics":[]}