Log in

Relevant bibliographies by topics / Accurate Alignment of Short Reads / Journal articles

To see the other types of publications on this topic, follow the link: Accurate Alignment of Short Reads.

Journal articles on the topic 'Accurate Alignment of Short Reads'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Accurate Alignment of Short Reads.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Asghari, Hossein, Yen-Yi Lin, Yang Xu, Ehsan Haghshenas, Colin C. Collins, and Faraz Hach. "CircMiner: accurate and rapid detection of circular RNA through splice-aware pseudo-alignment scheme." Bioinformatics 36, no. 12 (April 7, 2020): 3703–11. http://dx.doi.org/10.1093/bioinformatics/btaa232.

Full text

Abstract:

Abstract Motivation The ubiquitous abundance of circular RNAs (circRNAs) has been revealed by performing high-throughput sequencing in a variety of eukaryotes. circRNAs are related to some diseases, such as cancer in which they act as oncogenes or tumor-suppressors and, therefore, have the potential to be used as biomarkers or therapeutic targets. Accurate and rapid detection of circRNAs from short reads remains computationally challenging. This is due to the fact that identifying chimeric reads, which is essential for finding back-splice junctions, is a complex process. The sensitivity of discovery methods, to a high degree, relies on the underlying mapper that is used for finding chimeric reads. Furthermore, all the available circRNA discovery pipelines are resource intensive. Results We introduce CircMiner, a novel stand-alone circRNA detection method that rapidly identifies and filters out linear RNA sequencing reads and detects back-splice junctions. CircMiner employs a rapid pseudo-alignment technique to identify linear reads that originate from transcripts, genes or the genome. CircMiner further processes the remaining reads to identify the back-splice junctions and detect circRNAs with single-nucleotide resolution. We evaluated the efficacy of CircMiner using simulated datasets generated from known back-splice junctions and showed that CircMiner has superior accuracy and speed compared to the existing circRNA detection tools. Additionally, on two RNase R treated cell line datasets, CircMiner was able to detect most of consistent, high confidence circRNAs compared to untreated samples of the same cell line. Availability and implementation CircMiner is implemented in C++ and is available online at https://github.com/vpc-ccg/circminer. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

2

Kumar, Sanjeev, Suneeta Agarwal, and Ranvijay. "Fast and memory efficient approach for mapping NGS reads to a reference genome." Journal of Bioinformatics and Computational Biology 17, no. 02 (April 2019): 1950008. http://dx.doi.org/10.1142/s0219720019500082.

Full text

Abstract:

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome re-sequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20–40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows–Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6[Formula: see text]N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25[Formula: see text]N to 5[Formula: see text]N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html .

APA, Harvard, Vancouver, ISO, and other styles

3

Flouri, Tomas, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. "Mapping Short Reads to a Genomic Sequence with Circular Structure." International Journal of Systems Biology and Biomedical Technologies 1, no. 1 (January 2012): 26–34. http://dx.doi.org/10.4018/ijsbbt.2012010103.

Full text

Abstract:

Constant advances in DNA sequencing technologies are turning whole-genome sequencing into a routine procedure, resulting in massive amounts of data that need to be processed. Tens of gigabytes of data, in the form of short sequences (reads), need to be mapped back onto reference sequences, a few gigabases long. A first generation of short-read alignment algorithms successfully employed hash tables, and the current second generation uses the Burrows-Wheeler transform, further improving speed and memory footprint. These next-generation sequencing technologies allow researchers to characterise a bacterial genome, during a single experiment, at a moderate cost. In this article, as most of the bacterial chromosomes contain a circular DNA molecule, the authors present a new simple, yet efficient, sensitive and accurate algorithm, specifically designed for mapping millions of short reads to a genomic sequence with circular structure.

APA, Harvard, Vancouver, ISO, and other styles

4

Teixeira, Andreia Sofia, Francisco Fernandes, and Alexandre P. Francisco. "SpliceTAPyR — An Efficient Method for Transcriptome Alignment." International Journal of Foundations of Computer Science 29, no. 08 (December 2018): 1297–310. http://dx.doi.org/10.1142/s0129054118430049.

Full text

Abstract:

RNA-Seq is a Next-Generation Sequencing (NGS) protocol for sequencing the messenger RNA in a cell and generates millions of short sequence fragments, reads, in a single run. These reads can be used to measure levels of gene expression and to identify novel splice variants of genes. One of the critical steps in an RNA-Seq experiment is mapping NGS reads to the reference genome. Because RNA-Seq reads can span over more than one exon in the genome, this task is challenging. In the last decade, tools for RNA-Seq alignment have emerged, but most of them run in two phases. First, the pipeline only maps reads that have a direct match in the reference, and the remaining are set aside as initially unmapped reads. Then, they use heuristics based approaches, clustering or even annotations, to decide where to align the later. This work presents an efficient computational solution for the problem of transcriptome alignment, named SpliceTAPyR. It identifies signals of splice junctions and relies on compressed full-text indexing methods and succinct data structures to efficiently align RNA-Seq reads in a single phase. This way it achieves the same or better accuracy than other tools while using considerably less memory and time to the most competitive tools.

APA, Harvard, Vancouver, ISO, and other styles

5

Ebler, Jana, Peter Ebert, Wayne E. Clarke, Tobias Rausch, Peter A. Audano, Torsten Houwaart, Yafei Mao, et al. "Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes." Nature Genetics 54, no. 4 (April 2022): 518–25. http://dx.doi.org/10.1038/s41588-022-01043-w.

Full text

Abstract:

AbstractTypical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation—a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

APA, Harvard, Vancouver, ISO, and other styles

6

Li, H., and R. Durbin. "Fast and accurate short read alignment with Burrows-Wheeler transform." Bioinformatics 25, no. 14 (May 18, 2009): 1754–60. http://dx.doi.org/10.1093/bioinformatics/btp324.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

MAURER-STROH, SEBASTIAN, VITHIAGARAN GUNALAN, WING-CHEONG WONG, and FRANK EISENHABER. "A SIMPLE SHORTCUT TO UNSUPERVISED ALIGNMENT-FREE PHYLOGENETIC GENOME GROUPINGS, EVEN FROM UNASSEMBLED SEQUENCING READS." Journal of Bioinformatics and Computational Biology 11, no. 06 (December 2013): 1343005. http://dx.doi.org/10.1142/s0219720013430051.

Full text

Abstract:

We propose an extension to alignment-free approaches that can produce reasonably accurate phylogenetic groupings starting from unaligned genomes, for example, as fast as 1 min on a standard desktop computer for 25 bacterial genomes. A 6-fold speed-up and 11-fold reduction in memory requirements compared to previous alignment-free methods is achieved by reducing the comparison space to a representative sample of k-mers of optimal length and with specific tag motifs. This approach was applied to the test case of fitting the enterohemorrhagic O104:H4 E.coli strain from the 2011 outbreak in Germany into the phylogenetic network of previously known E.coli-related strains and extend the method to allow assigning any new strain to the correct phylogenetic group even directly from unassembled short sequence reads from next generation sequencing data. Hence, this approach is also useful to quickly identify the most suitable reference genome for subsequent assembly steps.

APA, Harvard, Vancouver, ISO, and other styles

8

Ghoneimy, Samy, and Samir Abou El-Seoud. "A MapReduce Framework for DNA Sequencing Data Processing." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 4, no. 4 (December 30, 2016): 11. http://dx.doi.org/10.3991/ijes.v4i4.6537.

Full text

Abstract:

<p class="Els-1storder-head">Genomics and Next Generation Sequencers (NGS) like Illumina Hiseq produce data in the order of ‎‎200 billion base pairs in a single one-week run for a 60x human genome coverage, which ‎requires modern high-throughput experimental technologies that can ‎only be tackled with high performance computing (HPC) and specialized software algorithms called ‎‎“short read aligners”. This paper focuses on the implementation of the DNA sequencing as a set of MapReduce programs that will accept a DNA data set as a FASTQ file and finally generate a VCF (variant call format) file, which has variants for a given DNA data set. In this paper MapReduce/Hadoop along with Burrows-Wheeler Aligner (BWA), Sequence Alignment/Map (SAM) ‎tools, are fully utilized to provide various utilities for manipulating alignments, including sorting, merging, indexing, ‎and generating alignments. The Map-Sort-Reduce process is designed to be suited for a Hadoop framework in ‎which each cluster is a traditional N-node Hadoop cluster to utilize all of the Hadoop features like HDFS, program ‎management and fault tolerance. The Map step performs multiple instances of the short read alignment algorithm ‎‎(BoWTie) that run in parallel in Hadoop. The ordered list of the sequence reads are used as input tuples and the ‎output tuples are the alignments of the short reads. In the Reduce step many parallel instances of the Short ‎Oligonucleotide Analysis Package for SNP (SOAPsnp) algorithm run in the cluster. Input tuples are sorted ‎alignments for a partition and the output tuples are SNP calls. Results are stored via HDFS, and then archived in ‎SOAPsnp format. ‎ The proposed framework enables extremely fast discovering somatic mutations, inferring population genetical ‎parameters, and performing association tests directly based on sequencing data without explicit genotyping or ‎linkage-based imputation. It also demonstrate that this method achieves comparable accuracy to alternative ‎methods for sequencing data processing.‎‎</p><p class="Abstract"><em></em><em><br /></em></p>

APA, Harvard, Vancouver, ISO, and other styles

9

Prodanov, Timofey, and Vikas Bansal. "Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications." Nucleic Acids Research 48, no. 19 (October 9, 2020): e114-e114. http://dx.doi.org/10.1093/nar/gkaa829.

Full text

Abstract:

Abstract The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)—sequence differences between paralogous sequences—to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3–90.6%) and BLASR (82.9–90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8–21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.

APA, Harvard, Vancouver, ISO, and other styles

10

Teng, Carolina, Renan Weege Achjian, Jiang Chau Wang, and Fernando Josepetti Fonseca. "Adapting the GACT-X Aligner to Accelerate Minimap2 in an FPGA Cloud Instance." Applied Sciences 13, no. 7 (March 30, 2023): 4385. http://dx.doi.org/10.3390/app13074385.

Full text

Abstract:

In genomic analysis, long reads are an emerging type of data processed by assembly algorithms to recover the complete genome sample. They are, on average, one or two orders of magnitude longer than short reads from the previous generation, which provides important advantages in information quality. However, longer sequences bring new challenges to computer processing, undermining the performance of assembly algorithms developed for short reads. This issue is amplified by the exponential growth of genetic data generation and by the slowdown of transistor technology progress, illustrated by Moore’s Law. Minimap2 is the current state-of-the-art long-read assembler and takes dozens of CPU hours to assemble a human genome with clinical standard coverage. One of its bottlenecks, the alignment stage, has not been successfully accelerated on FPGAs in the literature. GACT-X is an alignment algorithm developed for FPGA implementation, suitable for any size input sequence. In this work, GACT-X was adapted to work as the aligner of Minimap2, and these are integrated and implemented in an FPGA cloud platform. The measurements for accuracy and speed-up are presented for three different datasets in different combinations of numbers of kernels and threads. The integrated solution’s performance limitations due to data transfer are also analyzed and discussed.

APA, Harvard, Vancouver, ISO, and other styles

11

Al-Absi, Ahmed Abdulhakim, and Dae-Ki Kang. "Long Read Alignment with Parallel MapReduce Cloud Platform." BioMed Research International 2015 (2015): 1–13. http://dx.doi.org/10.1155/2015/807407.

Full text

Abstract:

Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner’s Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

APA, Harvard, Vancouver, ISO, and other styles

12

Marin, Wesley M., Ravi Dandekar, Danillo G. Augusto, Tasneem Yusufali, Bianca Heyn, Jan Hofmann, Vinzenz Lange, Jürgen Sauter, Paul J. Norman, and Jill A. Hollenbach. "High-throughput Interpretation of Killer-cell Immunoglobulin-like Receptor Short-read Sequencing Data with PING." PLOS Computational Biology 17, no. 8 (August 2, 2021): e1008904. http://dx.doi.org/10.1371/journal.pcbi.1008904.

Full text

Abstract:

The killer-cell immunoglobulin-like receptor (KIR) complex on chromosome 19 encodes receptors that modulate the activity of natural killer cells, and variation in these genes has been linked to infectious and autoimmune disease, as well as having bearing on pregnancy and transplant outcomes. The medical relevance and high variability of KIR genes makes short-read sequencing an attractive technology for interrogating the region, providing a high-throughput, high-fidelity sequencing method that is cost-effective. However, because this gene complex is characterized by extensive nucleotide polymorphism, structural variation including gene fusions and deletions, and a high level of homology between genes, its interrogation at high resolution has been thwarted by bioinformatic challenges, with most studies limited to examining presence or absence of specific genes. Here, we present the PING (Pushing Immunogenetics to the Next Generation) pipeline, which incorporates empirical data, novel alignment strategies and a custom alignment processing workflow to enable high-throughput KIR sequence analysis from short-read data. PING provides KIR gene copy number classification functionality for all KIR genes through use of a comprehensive alignment reference. The gene copy number determined per individual enables an innovative genotype determination workflow using genotype-matched references. Together, these methods address the challenges imposed by the structural complexity and overall homology of the KIR complex. To determine copy number and genotype determination accuracy, we applied PING to European and African validation cohorts and a synthetic dataset. PING demonstrated exceptional copy number determination performance across all datasets and robust genotype determination performance. Finally, an investigation into discordant genotypes for the synthetic dataset provides insight into misaligned reads, advancing our understanding in interpretation of short-read sequencing data in complex genomic regions. PING promises to support a new era of studies of KIR polymorphism, delivering high-resolution KIR genotypes that are highly accurate, enabling high-quality, high-throughput KIR genotyping for disease and population studies.

APA, Harvard, Vancouver, ISO, and other styles

13

Chen, Siyuan, Chengzhi Ren, Jingjing Zhai, Jiantao Yu, Xuyang Zhao, Zelong Li, Ting Zhang, Wenlong Ma, Zhaoxue Han, and Chuang Ma. "CAFU: a Galaxy framework for exploring unmapped RNA-Seq data." Briefings in Bioinformatics 21, no. 2 (February 28, 2019): 676–86. http://dx.doi.org/10.1093/bib/bbz018.

Full text

Abstract:

Abstract A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.

APA, Harvard, Vancouver, ISO, and other styles

14

Mukherjee, Kingshuk, Bahar Alipanahi, Tamer Kahveci, Leena Salmela, and Christina Boucher. "Aligning optical maps to de Bruijn graphs." Bioinformatics 35, no. 18 (January 30, 2019): 3250–56. http://dx.doi.org/10.1093/bioinformatics/btz069.

Full text

Abstract:

Abstract Motivation Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps—called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. Availability and implementation The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

15

Magdy Mohamed Abdelaziz Barakat, Sherif, Roselina Sallehuddin, Siti Sophiayati Yuhaniz, Raja Farhana R. Khairuddin, and Yasir Mahmood. "Genome assembly composition of the String “ACGT” array: a review of data structure accuracy and performance challenges." PeerJ Computer Science 9 (July 13, 2023): e1180. http://dx.doi.org/10.7717/peerj-cs.1180.

Full text

Abstract:

Background The development of sequencing technology increases the number of genomes being sequenced. However, obtaining a quality genome sequence remains a challenge in genome assembly by assembling a massive number of short strings (reads) with the presence of repetitive sequences (repeats). Computer algorithms for genome assembly construct the entire genome from reads in two approaches. The de novo approach concatenates the reads based on the exact match between their suffix-prefix (overlapping). Reference-guided approach orders the reads based on their offsets in a well-known reference genome (reads alignment). The presence of repeats extends the technical ambiguity, making the algorithm unable to distinguish the reads resulting in misassembly and affecting the assembly approach accuracy. On the other hand, the massive number of reads causes a big assembly performance challenge. Method The repeat identification method was introduced for misassembly by prior identification of repetitive sequences, creating a repeat knowledge base to reduce ambiguity during the assembly process, thus enhancing the accuracy of the assembled genome. Also, hybridization between assembly approaches resulted in a lower misassembly degree with the aid of the reference genome. The assembly performance is optimized through data structure indexing and parallelization. This article’s primary aim and contribution are to support the researchers through an extensive review to ease other researchers’ search for genome assembly studies. The study also, highlighted the most recent developments and limitations in genome assembly accuracy and performance optimization. Results Our findings show the limitations of the repeat identification methods available, which only allow to detect of specific lengths of the repeat, and may not perform well when various types of repeats are present in a genome. We also found that most of the hybrid assembly approaches, either starting with de novo or reference-guided, have some limitations in handling repetitive sequences as it is more computationally costly and time intensive. Although the hybrid approach was found to outperform individual assembly approaches, optimizing its performance remains a challenge. Also, the usage of parallelization in overlapping and reads alignment for genome assembly is yet to be fully implemented in the hybrid assembly approach. Conclusion We suggest combining multiple repeat identification methods to enhance the accuracy of identifying the repeats as an initial step to the hybrid assembly approach and combining genome indexing with parallelization for better optimization of its performance.

APA, Harvard, Vancouver, ISO, and other styles

16

Liu, Yongchao, Bernt Popp, and Bertil Schmidt. "CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding." PLoS ONE 9, no. 1 (January 22, 2014): e86869. http://dx.doi.org/10.1371/journal.pone.0086869.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Wong, Thomas K. F., Teng Li, Louis Ranjard, Steven H. Wu, Jeet Sukumaran, and Allen G. Rodrigo. "An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes." PLOS Computational Biology 17, no. 9 (September 13, 2021): e1008949. http://dx.doi.org/10.1371/journal.pcbi.1008949.

Full text

Abstract:

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

APA, Harvard, Vancouver, ISO, and other styles

18

Tello, Daniel, Juanita Gil, Cristian D. Loaiza, John J. Riascos, Nicolás Cardozo, and Jorge Duitama. "NGSEP3: accurate variant calling across species and sequencing protocols." Bioinformatics 35, no. 22 (April 25, 2019): 4716–23. http://dx.doi.org/10.1093/bioinformatics/btz275.

Full text

Abstract:

Abstract Motivation Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. Results Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. Availability and implementation NGSEP is available as open source software at http://ngsep.sf.net. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

19

Chu, Wai Keung, Peter Edge, Ho Suk Lee, Vikas Bansal, Vineet Bafna, Xiaohua Huang, and Kun Zhang. "Ultraaccurate genome sequencing and haplotyping of single human cells." Proceedings of the National Academy of Sciences 114, no. 47 (October 24, 2017): 12512–17. http://dx.doi.org/10.1073/pnas.1707609114.

Full text

Abstract:

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10−8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhurbenko, Peter M., and Fedor N. Klimenko. "PhaseAll: a simple tool for read-based allele phasing." Ecological genetics 20, no. 1S (December 8, 2022): 32. http://dx.doi.org/10.17816/ecogen112363.

Full text

Abstract:

The currently used genome assembly algorithms do not provide for allele phasing. This can lead to the loss of important information about the genotype of diploid and polyploid individuals. Here we introduce PhaseAll, a simple tool for allele phasing based on short reads obtained by second-generation sequencing. As input data, the tool takes paired reeds in SAM format. PhaseAll iterates sequentially through each alignment position. When a polymorphic position (SNP, insertion or deletion) is first encountered, a unique mutation is written to each allele. For each subsequent polymorphic position, a test is made to verify whether it is located on the same pair of reads (one DNA fragment) as the previous one. If two mutations are located on the same fragment, they are considered to belong to the same allele. If no fragments are found that connected at least one pair of neighboring polymorphic positions, an X is written in the allele sequences. This means that the alleles can swap at this position. PhaseAll is written in python 3. SAM files are processed using the pysam library. PhaseAll is designed to separate only two alleles. To avoid possible sequencing errors, the user can set a read depth threshold below which the polymorphic position will be skipped. Some indels can cause errors in allele phasing, so PhaseAll has an option to skip indels for more accurate SNP reconstruction. The tool was tested on sequences of agrobacterial origin in the Camellia L. genome in more than 100 samples. PhaseAll is available for download on the GitHub: https://github.com/pzhurbenko/PhaseAll The research was supported by RSF (project No. 21-14-00050).

APA, Harvard, Vancouver, ISO, and other styles

21

Marin, Maximillian, Roger Vargas, Michael Harris, Brendan Jeffrey, L. Elaine Epperson, David Durbin, Michael Strong, et al. "Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome." Bioinformatics 38, no. 7 (January 10, 2022): 1781–87. http://dx.doi.org/10.1093/bioinformatics/btac023.

Full text

Abstract:

Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. Results Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results, we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems and more generally for WGS applications in other organisms. Availability and implementation All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

22

Tapinos, Avraam, Bede Constantinides, My V. T. Phan, Samaneh Kouchaki, Matthew Cotten, and David L. Robertson. "The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences." Viruses 11, no. 5 (April 26, 2019): 394. http://dx.doi.org/10.3390/v11050394.

Full text

Abstract:

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.

APA, Harvard, Vancouver, ISO, and other styles

23

Linard, Benjamin, Krister Swenson, and Fabio Pardi. "Rapid alignment-free phylogenetic identification of metagenomic sequences." Bioinformatics 35, no. 18 (January 29, 2019): 3303–12. http://dx.doi.org/10.1093/bioinformatics/btz068.

Full text

Abstract:

Abstract Motivation Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing. Results We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics. Availability and implementation Program and sources freely available for download at https://github.com/blinard-BIOINFO/RAPPAS. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

24

Abde Aliy, Mohammed, Senbeta Bayeta, and Worku Takale. "Pacific bioscience sequence technology: Review." International Journal of Veterinary Science and Research 8, no. 1 (March 29, 2022): 027–33. http://dx.doi.org/10.17352/ijvsr.000108.

Full text

Abstract:

Pacific Biosciences has developed a platform that may sequence one molecule of DNA in a period via the polymerization of that strand with one enzyme. Single-molecule real-time sequencing by Pacific BioSciences’ technology is one of the most widely utilized third-generation sequencing technologies. PacBio single-molecule real-time Sequencing uses the Zero-mode waveguide’s ingenuity to distinguish the best fluorescence signal from the stable fluorescent backgrounds generated by disorganized free-floating nucleotides. PacBio single-molecule real-time sequencing does not require PCR amplification, and the browse length is a hundred times longer than next-generation sequencing. It will only cover high-GC and high-repeat sections and is more accurate in quantifying low-frequency mutations. PacBio single-molecule real-time sequencing will have a relatively high error rate of 10%-15% (which is practically a standard flaw of existing single-molecule sequencing technology). In contrast to next-generation sequencing, however, the errors are unintentionally random. As a result, multiple sequencing will effectively rectify the bottom deviance. Unlike second-generation sequencing, PacBio sequencing may be a technique for period sequencing and doesn’t need an intermission between browse steps. These options distinguish PacBio sequencing from second-generation sequencing, therefore it’s classified because of the third-generation sequencing. PacBio sequencing produces extremely lengthy reads with a high error rate and low yield. Short reads refine alignments/assemblies/detections to single-nucleotide precision, whereas PacBio long reads provide reliable alignments, scaffolds, and approximate detections of genomic variations. Through extraordinarily long sequencing reads (average >10,000 bp) and high accord precision, the PacBio Sequencing System can provide a terribly high depth of genetic information. To measure and promote the event of modern bioinformatics tools for PacBio sequencing information analysis, a good browse machine is required.

APA, Harvard, Vancouver, ISO, and other styles

25

Elrick, Hillary, Jose Espejo Valle-Inclan, Katherine E. Trevers, Francesc Muyas, Rita Cascão, Angela Afonso, Cláudia C. Faria, Adrienne M. Flanagan, and Isidro Cortés-Ciriano. "Abstract LB080: SAVANA: a computational method to characterize structural variation in human cancer genomes using nanopore sequencing." Cancer Research 83, no. 8_Supplement (April 14, 2023): LB080. http://dx.doi.org/10.1158/1538-7445.am2023-lb080.

Full text

Abstract:

Abstract Whole-genome sequencing (WGS) of human cancers has revealed that structural variation, which refers to the rearrangement of the genome leading to the deletion, amplification of reshuffling of DNA segments ranging from a few hundred bp to entire chromosomes, is a key mutational process in cancer evolution. Notably, pan-cancer analyses have revealed that both simple and complex forms of structural variation are pervasive across diverse human cancers, and often underpin drug resistance and metastasis. To date, the study of cancer genomes has relied on the analysis of short-read WGS on the dominant Illumina platform, which generates short, highly-accurate reads of 100-300bp that allow the study of point mutations at high resolution. However, detection of structural variants (SVs) using short reads is limited, as breakpoints falling in repetitive regions cannot be reliably mapped to the human genome. As a result, our understanding of the patterns and mechanisms underpinning structural variation in cancer genomes remains incomplete. In contrast to short-read sequencing, long-read sequencing technologies, such as Oxford Nanopore and PacBio, permit continuous reading of individual DNA molecules over 10 kilobases, thus providing unparalleled information to resolve SVs in repetitive regions and complex genome rearrangements. However, novel bioinformatics methods that account for the higher error rate of long-read methods are needed to take advantage of their capabilities for cancer genome analysis. Here, we present SAVANA, a novel structural variant caller for long-read sequencing data specifically designed for the analysis of cancer genomes. To identify both somatic and germline SVs, SAVANA takes as input long-read WGS data from a tumor and normal sample pair. SAVANA scans sequencing reads to detect split reads and gapped alignments, which are then clustered to define putative SVs. Next, SAVANA applies a machine learning-informed set of heuristics to remove false positives arising from mapping errors and sequencing artifacts. Extensively validated against a multi-platform truthset, we show that SAVANA identifies a range of somatic rearrangements with high recall and precision, outperforming existing tools while maintaining a lower execution time than competing methods. In patient samples, SAVANA identifies clinically relevant alterations, such as oncogenic gene fusions, with high accuracy. Additionally, SAVANA permits the reconstruction of double minutes, multi-chromosomal chromothripsis events, and SVs mapping to highly repetitive regions, including centromeres. In sum, SAVANA permits the characterization of complex structural variants and can uncover clinically relevant mutations across diverse cancer types with high accuracy. Citation Format: Hillary Elrick, Jose Espejo Valle-Inclan, Katherine E. Trevers, Francesc Muyas, Rita Cascão, Angela Afonso, Cláudia C. Faria, Adrienne M. Flanagan, Isidro Cortés-Ciriano. SAVANA: a computational method to characterize structural variation in human cancer genomes using nanopore sequencing [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 2 (Clinical Trials and Late-Breaking Research); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(8_Suppl):Abstract nr LB080.

APA, Harvard, Vancouver, ISO, and other styles

26

Atshemyan, Sofi, Andranik Chavushyan, Nerses Berberian, Arthur Sahakyan, Roksana Zakharyan, and Arsen Arakelyan. "Characterization of BRCA1/2 mutations in patients with family history of breast cancer in Armenia." F1000Research 6 (January 10, 2017): 29. http://dx.doi.org/10.12688/f1000research.10434.1.

Full text

Abstract:

Background. Breast cancer is one of the most common cancers in women worldwide. The germline mutations of the BRCA1 and BRCA2 genes are the most significant and well characterized genetic risk factors for hereditary breast cancer. Intensive research in the last decades has demonstrated that the incidence of mutations varies widely among different populations. In this study we attempted to perform a pilot study for identification and characterization of mutations in BRCA1 and BRCA2 genes among Armenian patients with family history of breast cancer and their healthy relatives. Methods. We performed targeted exome sequencing for BRCA1 and BRCA2 genes in 6 patients and their healthy relatives. After alignment of short reads to the reference genome, germline single nucleotide variation and indel discovery was performed using GATK software. Functional implications of identified variants were assessed using ENSEMBL Variant Effect Predictor tool. Results. In total, 39 single nucleotide variations and 4 indels were identified, from which 15 SNPs and 3 indels were novel. No known pathogenic mutations were identified, but 2 SNPs causing missense amino acid mutations had significantly increased frequencies in the study group compared to the 1000 Genome populations. Conclusions. Our results demonstrate the importance of screening of BRCA1 and BRCA2 gene variants in the Armenian population in order to identity specifics of mutation spectrum and frequencies and enable accurate risk assessment of hereditary breast cancers.

APA, Harvard, Vancouver, ISO, and other styles

27

Olawoye, Idowu B., Simon D. W. Frost, and Christian T. Happi. "The Bacteria Genome Pipeline (BAGEP): an automated, scalable workflow for bacteria genomes with Snakemake." PeerJ 8 (October 27, 2020): e10121. http://dx.doi.org/10.7717/peerj.10121.

Full text

Abstract:

Next generation sequencing technologies are becoming more accessible and affordable over the years, with entire genome sequences of several pathogens being deciphered in few hours. However, there is the need to analyze multiple genomes within a short time, in order to provide critical information about a pathogen of interest such as drug resistance, mutations and genetic relationship of isolates in an outbreak setting. Many pipelines that currently do this are stand-alone workflows and require huge computational requirements to analyze multiple genomes. We present an automated and scalable pipeline called BAGEP for monomorphic bacteria that performs quality control on FASTQ paired end files, scan reads for contaminants using a taxonomic classifier, maps reads to a reference genome of choice for variant detection, detects antimicrobial resistant (AMR) genes, constructs a phylogenetic tree from core genome alignments and provide interactive short nucleotide polymorphism (SNP) visualization across core genomes in the data set. The objective of our research was to create an easy-to-use pipeline from existing bioinformatics tools that can be deployed on a personal computer. The pipeline was built on the Snakemake framework and utilizes existing tools for each processing step: fastp for quality trimming, snippy for variant calling, Centrifuge for taxonomic classification, Abricate for AMR gene detection, snippy-core for generating whole and core genome alignments, IQ-TREE for phylogenetic tree construction and vcfR for an interactive heatmap visualization which shows SNPs at specific locations across the genomes. BAGEP was successfully tested and validated with Mycobacterium tuberculosis (n = 20) and Salmonella enterica serovar Typhi (n = 20) genomes which are about 4.4 million and 4.8 million base pairs, respectively. Running these test data on a 8 GB RAM, 2.5 GHz quad core laptop took 122 and 61 minutes on respective data sets to complete the analysis. BAGEP is a fast, calls accurate SNPs and an easy to run pipeline that can be executed on a mid-range laptop; it is freely available on: https://github.com/idolawoye/BAGEP.

APA, Harvard, Vancouver, ISO, and other styles

28

Trost, Brett, Susan Walker, Syed A. Haider, Wilson W. L. Sung, Sergio Pereira, Charly L. Phillips, Edward J. Higginbotham, et al. "Impact of DNA source on genetic variant detection from human whole-genome sequencing data." Journal of Medical Genetics 56, no. 12 (September 12, 2019): 809–17. http://dx.doi.org/10.1136/jmedgenet-2019-106281.

Full text

Abstract:

BackgroundWhole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated.MethodsMatched blood, saliva and buccal samples from four unrelated individuals were used to compare sequencing metrics and variant-detection accuracy among these DNA sources.ResultsWe observed significant differences among DNA sources for sequencing quality metrics such as percentage of reads aligned and mean read depth (p<0.05). Differences were negligible in the accuracy of detecting short insertions and deletions; however, the false positive rate for single nucleotide variation detection was slightly higher in some saliva and buccal samples. The sensitivity of copy number variant (CNV) detection was up to 25% higher in blood samples, depending on CNV size and type, and appeared to be worse in saliva and buccal samples with high bacterial concentration. We also show that methylation-based enrichment for eukaryotic DNA in saliva and buccal samples increased alignment rates but also reduced read-depth uniformity, hampering CNV detection.ConclusionFor WGS, we recommend using DNA extracted from blood rather than saliva or buccal swabs; if saliva or buccal samples are used, we recommend against using methylation-based eukaryotic DNA enrichment. All data used in this study are available for further open-science investigation.

APA, Harvard, Vancouver, ISO, and other styles

29

Gnerre, Sante, Brian Yik Tak Tsui, Tingting Jiang, Yvonne Kim, Dustin Ma, Indira Wu, Rebecca Nagy, and Han-Yu Chuang. "Abstract 1220: Accurately genotyping HLA and KIR alleles using cfDNA assay and k-mer based algorithm for immunotherapy." Cancer Research 82, no. 12_Supplement (June 15, 2022): 1220. http://dx.doi.org/10.1158/1538-7445.am2022-1220.

Full text

Abstract:

Abstract Background: HLA and KIR genotypes show great promise as emerging biomarkers for immune checkpoint inhibitors (ICIs) and understanding patient prognosis. Multiple studies have shown that HLA-I heterozygosity and high sequence divergence across alleles positively correlates with response to ICIs. However the high degree of polymorphism and allele sequence similarities in HLA and KIR present a challenge to accurate allele calling. To address these difficulties we developed kmerizer, a novel allele caller optimized for short fragments, such as reads from a cfDNA assay. Methods: We tested kmerizer on MHC class1 and class2 genes, and on KIR genes, on both simulated datasets and real samples (cell lines and plasma samples). To assess the capability of the algorithm to distinguish highly homologous allele pairs, we simulated cfDNA-like fragments with errors on randomly selected allele pairs, and on allele pairs with a high level of homology. Twelve plasma samples and 19 reference cell lines fragmented to cfDNA size were analyzed using a NGS cfDNA assay. For plasma samples, paired buffy coats were sent to an external vendor for HLA typing using multiplex PCR-based amplicon and sequenced by 300bp paired-end reads. Results: Of the 19 cell lines and 12 plasma samples for HLA typing (A, B, C, and DQB1 loci), kmerizer delivered 100% sensitivity with 98% specificity. On the simulated dataset, kmerizer achieved 99% sensitivity and specificity on all the MHC class1 and class 2 loci, and 90% sensitivity and specificity on all KIR loci, for both homozygous and heterozygous pairs. The novel allele caller kmerizer also demonstrated a lighter footprint on computational resource need: one deep-sequencing plasma sample on average can be processed in less than 2 minutes which is about 15 times faster than the most commonly used HLA typing tool HISAT21, which does not support KIR typing. Conclusions: As utilization of ICIs increases, the use of genetic and genomic information to accurately identify patients more likely to respond to ICIs will be critical. kmerizer is a fast and highly sensitive and specific allele caller, and it can effectively call alleles on both HLA and KIR. References: [1] Kim, D., Paggi, J. M., Park, C., Bennett, C., & Salzberg, S. L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology, 37(8), 907-915. Citation Format: Sante Gnerre, Brian Yik Tak Tsui, Tingting Jiang, Yvonne Kim, Dustin Ma, Indira Wu, Rebecca Nagy, Han-Yu Chuang. Accurately genotyping HLA and KIR alleles using cfDNA assay and k-mer based algorithm for immunotherapy [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1220.

APA, Harvard, Vancouver, ISO, and other styles

30

Rajasagi, Mohini, Sachet A. Shukla, Edward F. Fritsch, David DeLuca, Gad Getz, Nir Hacohen, and Catherine J. Wu. "Tumor Neoantigens Are Abundant Across Cancers." Blood 122, no. 21 (November 15, 2013): 3265. http://dx.doi.org/10.1182/blood.v122.21.3265.3265.

Full text

Abstract:

Abstract Tumor neoantigens are a promising class of vaccine immunogens as they arise from gene alterations in tumor cells and are hence exquisitely tumor-specific. We recently reported the development of a pipeline that leverages massively parallel sequencing data with HLA-peptide binding predictions to identify candidate neoantigens. By applying this pipeline to cases of chronic lymphocytic leukemia (CLL) with known HLA typing, we described the prediction of personal tumor neoantigens against which long-lived memory T cell responses developed following remission-inducing therapy. Our pipeline thus provides a method for selecting neoantigens for developing future personalized tumor vaccines. In order to extend this approach beyond CLL, we sought to gain estimates of tumor neoantigen loads across cancers. We hypothesized that the numbers of neoantigens within cancers would be proportional to their mutation frequency. To examine this hypothesis, we turned to the extensive collections of whole-exome sequencing (WES) data that have been generated through recent large-scale cancer sequencing projects. In order to generate accurate estimates of personal tumor neoantigen loads, HLA typing information is required. While in theory this information should be directly extractable from WES, direct inference of HLA type from standard WES reads has not been previously possible due to suboptimal alignments against a standard reference genome arising from the highly polymorphic nature of the HLA region. We therefore developed a strategy to optimize alignment. Based on the IMGT database, we constructed a reference library of all known HLA alleles (6597 unique entries) and aligned WES reads containing one or more short sequence segments corresponding to any HLA allele against this reference using the Novoalign software. HLA alleles were then inferred through a model that enabled calculation of allele probabilities by taking into account the number and quality of reads aligned to each allele. Alleles with the highest probabilities were then identified as winners. We trained the algorithm on 8 CLL cases for which WES data and HLA typing (based on conventional molecular typing) were available, and established a performance accuracy of ∼94% (45 of 48 alleles). This was further validated using a set of 133 Hap Map samples with known HLA typing, in which 94.61% (755 of 798) alleles were identified correctly at protein level resolution. We applied the HLA typing algorithm together with the neoantigen discovery pipeline across WES from 2488 cases collected from publicly available datasets of 13 diverse cancers. Mutation rates in solid tumor malignancies were consistently higher, in some cases by more than an order of magnitude, than the blood malignancies. For example, the high mutation rate tumor melanoma displayed a median of 300 (range, 34-4276) missense mutations per case, while renal cell carcinoma (RCC) had 41 (range, 10-101) and CLL had 16 (range, 0-75). The number of frame-shifting events (indels and termination read-throughs) was generally 10-fold or more lower in each tumor type than missense mutations and did not correlate with the number of missense mutations. As expected, the rate of predicted HLA binding peptides mirrored the somatic mutation rate per tumor type. The median number of predicted class I HLA-binding neopeptides (with IC50 < 500 nM) per sample generated from missense and frameshift events for melanoma was 488 (range: 18-5811), for RCC, 80 (range: 6-407), and for CLL 24 (range 2-124). Overall, we found an average of 1.5 HLA-binding peptides (i.e. with IC50<500nM) was generated per missense mutation and 4 binding peptides per frameshift mutation. By predicting tumor neoantigens in a variety of low and high mutation rate cancers, we established that dozens to hundreds of potential neoantigens are present in most tumors. In the process, we developed a highly accurate analytic approach that provides a solution for extracting HLA typing information from WES data but which could, in principle, be applied to other highly polymorphic regions of the genome. Ongoing studies focus on integrating estimates of tumor neoantigen load with understanding of HLA expression in order to optimize selection of antigen targets to build future personalized tumor vaccines. Disclosures: No relevant conflicts of interest to declare.

APA, Harvard, Vancouver, ISO, and other styles

31

Au, Kin Fai, Jason G. Underwood, Lawrence Lee, and Wing Hung Wong. "Improving PacBio Long Read Accuracy by Short Read Alignment." PLoS ONE 7, no. 10 (October 4, 2012): e46679. http://dx.doi.org/10.1371/journal.pone.0046679.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Wang, Dan, Hai Xiang, Chao Ning, Hao Liu, Jian-Feng Liu, and Xingbo Zhao. "Mitochondrial DNA enrichment reduced NUMT contamination in porcine NGS analyses." Briefings in Bioinformatics 21, no. 4 (June 14, 2019): 1368–77. http://dx.doi.org/10.1093/bib/bbz060.

Full text

Abstract:

Abstract Genetic associations between mitochondrial DNA (mtDNA) and economic traits have been widely reported for pigs, which indicate the importance of mtDNA. However, studies on mtDNA heteroplasmy in pigs are rare. Next generation sequencing (NGS) methodologies have emerged as a promising genomic approach for detection of mitochondrial heteroplasmy. Due to the short reads, flexible bioinformatic analyses and the contamination of nuclear mitochondrial sequences (NUMTs), NGS was expected to increase false-positive detection of heteroplasmy. In this study, Sanger sequencing was performed as a gold standard to detect heteroplasmy with a detection sensitivity of 5% in pigs and then one whole-genome sequencing method (WGS) and two mtDNA enrichment sequencing methods (Capture and LongPCR) were carried out. The aim of this study was to determine whether mitochondrial heteroplasmy identification from NGS data was affected by NUMTs. We find that WGS generated more false intra-individual polymorphisms and less mapping specificity than the two enrichment sequencing methods, suggesting NUMTs indeed led to false-positive mitochondrial heteroplasmies from NGS data. In addition, to accurately detect mitochondrial diversity, three commonly used tools—SAMtools, VarScan and GATK—with different parameter values were compared. VarScan achieved the best specificity and sensitivity when considering the base alignment quality re-computation and the minimum variant frequency of 0.25. It also suggested bioinformatic workflow interfere in the identification of mtDNA SNPs. In conclusion, intra-individual polymorphism in pig mitochondria from NGS data was confused with NUMTs, and mtDNA-specific enrichment is essential before high-throughput sequencing in the detection of mitochondrial genome sequences.

APA, Harvard, Vancouver, ISO, and other styles

33

Wilton, Richard, and Alexander S. Szalay. "Performance optimization in DNA short-read alignment." Bioinformatics 38, no. 8 (February 9, 2022): 2081–87. http://dx.doi.org/10.1093/bioinformatics/btac066.

Full text

Abstract:

Abstract Summary Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged. In this review, we examine three general-purpose short-read alignment tools—BWA-MEM, Bowtie 2 and Arioc—with a focus on performance optimization. We analyze the performance-related behavior of the algorithms and heuristics each tool implements, with the goal of arriving at practical methods of improving processing speed and accuracy. We indicate where an aligner's default behavior may result in suboptimal performance, explore the effects of computational constraints such as end-to-end mapping and alignment scoring threshold, and discuss sources of imprecision in the computation of alignment scores and mapping quality. With this perspective, we describe an approach to tuning short-read aligner performance to meet specific data-analysis and throughput requirements while avoiding potential inaccuracies in subsequent analysis of alignment results. Finally, we illustrate how this approach avoids easily overlooked pitfalls and leads to verifiable improvements in alignment speed and accuracy. Contact richard.wilton@jhu.edu Supplementary information Appendices referenced in this article are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

34

Nakabayashi, Ryo, and Shinichi Morishita. "HiC-Hiker: a probabilistic model to determine contig orientation in chromosome-length scaffolds with Hi-C." Bioinformatics 36, no. 13 (May 5, 2020): 3966–74. http://dx.doi.org/10.1093/bioinformatics/btaa288.

Full text

Abstract:

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

APA, Harvard, Vancouver, ISO, and other styles

35

Schneeberger, Korbinian, Jörg Hagmann, Stephan Ossowski, Norman Warthmann, Sandra Gesing, Oliver Kohlbacher, and Detlef Weigel. "Simultaneous alignment of short reads against multiple genomes." Genome Biology 10, no. 9 (2009): R98. http://dx.doi.org/10.1186/gb-2009-10-9-r98.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Ji, Mingeun, Yejin Kan, Dongyeon Kim, Jaehee Jung, and Gangman Yi. "cPlot: Contig-Plotting Visualization for the Analysis of Short-Read Nucleotide Sequence Alignments." International Journal of Molecular Sciences 23, no. 19 (September 29, 2022): 11484. http://dx.doi.org/10.3390/ijms231911484.

Full text

Abstract:

Advances in the next-generation sequencing technology have led to a dramatic decrease in read-generation cost and an increase in read output. Reconstruction of short DNA sequence reads generated by next-generation sequencing requires a read alignment method that reconstructs a reference genome. In addition, it is essential to analyze the results of read alignments for a biologically meaningful inference. However, read alignment from vast amounts of genomic data from various organisms is challenging in that it involves repeated automatic and manual analysis steps. We, here, devised cPlot software for read alignment of nucleotide sequences, with automated read alignment and position analysis, which allows visual assessment of the analysis results by the user. cPlot compares sequence similarity of reads by performing multiple read alignments, with FASTA format files as the input. This application provides a web-based interface for the user for facile implementation, without the need for a dedicated computing environment. cPlot identifies the location and order of the sequencing reads by comparing the sequence to a genetically close reference sequence in a way that is effective for visualizing the assembly of short reads generated by NGS and rapid gene map construction.

APA, Harvard, Vancouver, ISO, and other styles

37

Guguchkin, Egor Pavlovich, and Evgeny Andreevich Karpulevich. "Modification of the short read alignment algorithm to improve the quality of the human whole genome sequencing data processing pipeline." Proceedings of the Institute for System Programming of the RAS 35, no. 2 (2023): 235–48. http://dx.doi.org/10.15514/ispras-2023-35(2)-17.

Full text

Abstract:

This study emphasizes the importance of aligning short reads in the analysis of human whole-genome sequencing data. The alignment process involves determining the positions of short genetic sequences relative to a known reference genome sequence of the human genome. Traditional alignment methods use a linear reference sequence, but this can lead to incorrect alignment, especially when short reads contain genetic variations. In this work, the index file of the reference sequence was modified using the minimap2 tool. Experimental results showed that adding information about frequently occurring genetic variations to the minimap2 index increases the number of correctly identified genetic variants, which affects the quality of subsequent data analysis.

APA, Harvard, Vancouver, ISO, and other styles

38

Wilton, Richard, Xin Li, Andrew P. Feinberg, and Alexander S. Szalay. "Arioc: GPU-accelerated alignment of short bisulfite-treated reads." Bioinformatics 34, no. 15 (March 15, 2018): 2673–75. http://dx.doi.org/10.1093/bioinformatics/bty167.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Zhang, Wenjing, Neng Huang, Jiantao Zheng, Xingyu Liao, Jianxin Wang, and Hong-Dong Li. "A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads." Genes 10, no. 1 (January 14, 2019): 44. http://dx.doi.org/10.3390/genes10010044.

Full text

Abstract:

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

APA, Harvard, Vancouver, ISO, and other styles

40

Lim, Jing-Quan, Chandana Tennakoon, Peiyong Guan, and Wing-Kin Sung. "BatAlign: an incremental method for accurate alignment of sequencing reads." Nucleic Acids Research 43, no. 16 (July 13, 2015): e107-e107. http://dx.doi.org/10.1093/nar/gkv533.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

PACHECO BAUTISTA, D., R. CARREÑO AGUILERA, E. CORTÉS PÉREZ, M. GONZÁLEZ PÉREZ, J. J. MEDEL, M. A. ACEVEDO, and WEN YU. "NONLINEAR FM INDEX APPLICATION FOR ALIGNMENT OF SHORT DNA SEQUENCES USING RE-PARAMETRIZATION OF ALGORITHMS." Fractals 26, no. 03 (June 2018): 1850023. http://dx.doi.org/10.1142/s0218348x18500238.

Full text

Abstract:

An innovative reconfiguration application is proposed to re-calculate the parameters of the Ferragina and Manzini exact search algorithm (or FM indexes), using a modular and efficient hardware implementation to accelerate alignment programs of short DNA sequence reads. Although these programs use multi-core execution strategies or multiple computers, they have become slow considering the very high speed at which the new massively parallel sequencing machines produce the reads to be aligned. Consequently, a search for different ways to accelerate the alignment is crucial. The proposed design runs with software functions in a hybrid system, and has the ability to align millions of reads to reference as large as the human genome. Tests on the M505k325t card show that a single alignment core can accelerate the computation by a factor close to [Formula: see text] in relation to BWA. Due to the minor consumption of area and power, multiple alignment cores can fill the Field Programmable Gate Array (FPGA) by multiplying the computation speed. With a multiple-core implementation, the processing speed of the design outperforms applications that are accelerated by GPUs and competes with similar FPGA proposals whose cost is much higher.

APA, Harvard, Vancouver, ISO, and other styles

42

Chon, Alvin, and Xiaoqiu Huang. "SRAMM: Short Read Alignment Mapping Metrics." International Journal on Bioinformatics & Biosciences 11, no. 02 (June 30, 2021): 01–07. http://dx.doi.org/10.5121/ijbb.2021.11201.

Full text

Abstract:

Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationally intensive or designed for specific applications. SRAMM efficiently generates multiple different concept-based mapping scores to provide for an informative post alignment examination and filtering process of aligned short reads for various downstream applications. SRAMM is compatible with Python 2.6+ and Python 3.6+ on all operating systems. It works with any short read aligner that generates SAM/BAM/CRAM file outputs and reports 'AS' tags. It is freely available under the MIT license at http://github.com/achon/sramm.

APA, Harvard, Vancouver, ISO, and other styles

43

Bao, H., H. Guo, J. Wang, R. Zhou, X. Lu, and S. Shi. "MapView: visualization of short reads alignment on a desktop computer." Bioinformatics 25, no. 12 (April 15, 2009): 1554–55. http://dx.doi.org/10.1093/bioinformatics/btp255.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Grimm, Dominik, Jörg Hagmann, Daniel Koenig, Detlef Weigel, and Karsten Borgwardt. "Accurate indel prediction using paired-end short reads." BMC Genomics 14, no. 1 (2013): 132. http://dx.doi.org/10.1186/1471-2164-14-132.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Rumble, Stephen M., Phil Lacroute, Adrian V. Dalca, Marc Fiume, Arend Sidow, and Michael Brudno. "SHRiMP: Accurate Mapping of Short Color-space Reads." PLoS Computational Biology 5, no. 5 (May 22, 2009): e1000386. http://dx.doi.org/10.1371/journal.pcbi.1000386.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Lee, E. Alice, Boram Lee, Yoonjoo Choi, Junseok Park, Adam Voshall, Eduardo Maury, Yeeok Kang, et al. "Abstract 1122: Pan-cancer analysis reveals roles of retrotransposon-fusion RNAs." Cancer Research 83, no. 7_Supplement (April 4, 2023): 1122. http://dx.doi.org/10.1158/1538-7445.am2023-1122.

Full text

Abstract:

Abstract Transposons comprise half of the human genome and some create novel insertions in human germ cells and somatic tissues of healthy and diseased individuals. These genetic elements may be an abundant yet underexplored source of immunogenic molecules that create chimeric transcripts composed of transposons and non-transposons. However, it remains challenging to accurately identify such transposon-fusion events from short RNA-seq reads mainly due to the repetitive nature of transposon sequences and error-prone read alignment near exon-intron junctions, and there are few existing methods for such detection. We developed a computational pipeline, rTea (RNA Transposable Element Analyzer) to detect transposons-fusion transcripts from RNA-seq data. Using rTea, we performed a pan-cancer analysis of 10,257 cancer samples across 34 cancer types as well as 2,994 normal tissue samples from the TCGA/ICGC, our unpublished colorectal cancer cohort, and the GTEx consortia. Among normal human tissues, higher fusion loads were found most notably in the testis and to a lesser extent in other tissues that have been characterized as immune-privileged. Somatic fusions found in cancer were enriched in known cancer genes, implicating their contribution to tumorigenesis. We also found distinct epigenetic and tumorigenic mechanisms underlying fusions from different transposon families. Through in silico immunogenicity modeling and experimental validation, we confirmed the MHC-I binding and CD8+ T cell activation by peptides derived from transposon fusions to the extent comparable to EBV viruses. Our findings highlight endogenous retroelements as novel therapeutic targets and a significant source of neoantigens. Citation Format: E. Alice Lee, Boram Lee, Yoonjoo Choi, Junseok Park, Adam Voshall, Eduardo Maury, Yeeok Kang, Yeon Jeong Kim, Jin-Young Lee, Hye-Ran Shim, Si-Cho Kim, Hoang Bao Khanh Chu, Da-Won Kim, Minjeong Kim, Eun-Ji Choi, Kyungsoo Ha, Jung Kyoon Choi, Yongjoon Kim, Woong-Yang Park. Pan-cancer analysis reveals roles of retrotransposon-fusion RNAs [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 1122.

APA, Harvard, Vancouver, ISO, and other styles

47

Firtina, Can, Jeremie S. Kim, Mohammed Alser, Damla Senol Cali, A. Ercument Cicek, Can Alkan, and Onur Mutlu. "Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm." Bioinformatics 36, no. 12 (March 13, 2020): 3669–79. http://dx.doi.org/10.1093/bioinformatics/btaa179.

Full text

Abstract:

Abstract Motivation Third-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs. These long reads are used to construct an assembly (i.e. the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of base pairs in these long reads is incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e. read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads from either a certain sequencing technology or a small assembly. Such technology-dependency and assembly-size dependency require researchers to (i) run multiple polishing algorithms and (ii) use small chunks of a large genome to use all available readsets and polish large genomes, respectively. Results We introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e. both large and small genomes) using reads from all sequencing technologies (i.e. second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo (i) models an assembly as a profile hidden Markov model (pHMM), (ii) uses read-to-assembly alignment to train the pHMM with the Forward–Backward algorithm and (iii) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real readsets demonstrate that Apollo is the only algorithm that (i) uses reads from any sequencing technology within a single run and (ii) scales well to polish large assemblies without splitting the assembly into multiple parts. Availability and implementation Source code is available at https://github.com/CMU-SAFARI/Apollo. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

48

Wilton, Richard, and Alexander S. Szalay. "Arioc: High-concurrency short-read alignment on multiple GPUs." PLOS Computational Biology 16, no. 11 (November 9, 2020): e1008383. http://dx.doi.org/10.1371/journal.pcbi.1008383.

Full text

Abstract:

In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software. When two or more GPUs are available, Arioc's speed increases proportionately because the software executes concurrently on each available GPU device. We have adapted Arioc to recent multi-GPU hardware architectures that support high-bandwidth peer-to-peer memory accesses among multiple GPUs. By modifying Arioc's implementation to exploit this GPU memory architecture we obtained a further 1.8x-2.9x increase in overall alignment speeds. With this additional acceleration, Arioc computes two million short-read alignments per second in a four-GPU system; it can align the reads from a human WGS sequencer run–over 500 million 150nt paired-end reads–in less than 15 minutes. As WGS data accumulates exponentially and high-concurrency computational resources become widespread, Arioc addresses a growing need for timely computation in the short-read data analysis toolchain.

APA, Harvard, Vancouver, ISO, and other styles

49

Lee, Robyn S., and Marcel A. Behr. "Does Choice Matter? Reference-Based Alignment for Molecular Epidemiology of Tuberculosis." Journal of Clinical Microbiology 54, no. 7 (April 13, 2016): 1891–95. http://dx.doi.org/10.1128/jcm.00364-16.

Full text

Abstract:

When using genome sequencing for molecular epidemiology, short sequence reads are aligned to an arbitrary reference strain to detect single nucleotide polymorphisms. We investigated whether reference genome selection influences epidemiological inferences ofMycobacterium tuberculosistransmission by aligning sequence reads from 162 closely related lineage 4 (Euro-American) isolates to 7 different genomes. Phylogenetic trees were consistent with use of all but the most divergent genomes, suggesting that reference choice can be based on considerations other thanM. tuberculosislineage.

APA, Harvard, Vancouver, ISO, and other styles

50

Zhao, Yongan, Xiaofeng Wang, and Haixu Tang. "A Secure Alignment Algorithm for Mapping Short Reads to Human Genome." Journal of Computational Biology 25, no. 6 (June 2018): 529–40. http://dx.doi.org/10.1089/cmb.2017.0094.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!