Log in

Relevant bibliographies by topics / Short Read Mapping (SRM) / Journal articles

To see the other types of publications on this topic, follow the link: Short Read Mapping (SRM).

Journal articles on the topic 'Short Read Mapping (SRM)'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Short Read Mapping (SRM).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Chon, Alvin, and Xiaoqiu Huang. "SRAMM: Short Read Alignment Mapping Metrics." International Journal on Bioinformatics & Biosciences 11, no. 02 (June 30, 2021): 01–07. http://dx.doi.org/10.5121/ijbb.2021.11201.

Full text

Abstract:

Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationally intensive or designed for specific applications. SRAMM efficiently generates multiple different concept-based mapping scores to provide for an informative post alignment examination and filtering process of aligned short reads for various downstream applications. SRAMM is compatible with Python 2.6+ and Python 3.6+ on all operating systems. It works with any short read aligner that generates SAM/BAM/CRAM file outputs and reports 'AS' tags. It is freely available under the MIT license at http://github.com/achon/sramm.

APA, Harvard, Vancouver, ISO, and other styles

2

Cline, Eliot, Nuttachat Wisittipanit, Tossapon Boongoen, Ekachai Chukeatirote, Darush Struss, and Anant Eungwanichayapant. "Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data." PeerJ 8 (December 7, 2020): e10501. http://dx.doi.org/10.7717/peerj.10501.

Full text

Abstract:

Background Low-coverage sequencing is a cost-effective way to obtain reads spanning an entire genome. However, read depth at each locus is low, making sequencing error difficult to separate from actual variation. Prior to variant calling, sequencer reads are aligned to a reference genome, with alignments stored in Sequence Alignment/Map (SAM) files. Each alignment has a mapping quality (MAPQ) score indicating the probability a read is incorrectly aligned. This study investigated the recalibration of probability estimates used to compute MAPQ scores for improving variant calling performance in single-sample, low-coverage settings. Materials and Methods Simulated tomato, hot pepper and rice genomes were implanted with known variants. From these, simulated paired-end reads were generated at low coverage and aligned to the original reference genomes. Features extracted from the SAM formatted alignment files for tomato were used to train machine learning models to detect incorrectly aligned reads and output estimates of the probability of misalignment for each read in all three data sets. MAPQ scores were then re-computed from these estimates. Next, the SAM files were updated with new MAPQ scores. Finally, Variant calling was performed on the original and recalibrated alignments and the results compared. Results Incorrectly aligned reads comprised only 0.16% of the reads in the training set. This severe class imbalance required special consideration for model training. The F1 score for detecting misaligned reads ranged from 0.76 to 0.82. The best performing model was used to compute new MAPQ scores. Single Nucleotide Polymorphism (SNP) detection was improved after mapping score recalibration. In rice, recall for called SNPs increased by 5.2%, while for tomato and pepper it increased by 3.1% and 1.5%, respectively. For all three data sets the precision of SNP calls ranged from 0.91 to 0.95, and was largely unchanged both before and after mapping score recalibration. Conclusion Recalibrating MAPQ scores delivers modest improvements in single-sample variant calling results. Some variant callers operate on multiple samples simultaneously. They exploit every sample’s reads to compensate for the low read-depth of individual samples. This improves polymorphism detection and genotype inference. It may be that small improvements in single-sample settings translate to larger gains in a multi-sample experiment. A study to investigate this is ongoing.

APA, Harvard, Vancouver, ISO, and other styles

3

Yang, Xiaohong, Yue Li, Yu Wei, Zhanlong Chen, and Peng Xie. "Water Body Extraction from Sentinel-3 Image with Multiscale Spatiotemporal Super-Resolution Mapping." Water 12, no. 9 (September 17, 2020): 2605. http://dx.doi.org/10.3390/w12092605.

Full text

Abstract:

Water body mapping is significant for water resource management. In the view of 21 spectral bands and a short revisit time of no more than two days, a Sentinel-3 OLCI (Ocean and Land Colour Instrument) image could be the optimum data source in the near-real-time mapping of water bodies. However, the image is often limited by its low spatial resolution in practice. Super-resolution mapping (SRM) is a good solution to generate finer spatial resolution maps than the input data allows. In this paper, a multiscale spatiotemporal super-resolution mapping (MSST_SRM) method for water bodies is proposed, particularly for Sentinel-3 OLCI images. The proposed MSST_SRM method employs the integrated Normalized Difference Water Index (NDWI) images calculated from four near-infrared (NIR) bands and Green Band 6 of the Sentinel-3 OLCI image as input data and combined the spectral, multispatial, and temporal terms into one objective function to generate a fine water body map. Two experiments in the Tibet Plate and Daye lakes were employed to test the effectiveness of the MSST_SRM method. Results revealed that by using multiscale spatial dependence under the framework of spatiotemporal super-resolution Mapping, MSST_SRM could generate finer water body maps than the hard classification method and the other three SRM-based methods. Therefore, the proposed MSST_SRM method shows marked efficiency and potential in water body mapping using Sentinel-3 OLCI images.

APA, Harvard, Vancouver, ISO, and other styles

4

Canzar, Stefan, and Steven L. Salzberg. "Short Read Mapping: An Algorithmic Tour." Proceedings of the IEEE 105, no. 3 (March 2017): 436–58. http://dx.doi.org/10.1109/jproc.2015.2455551.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Deorowicz, Sebastian, and Adam Gudyś. "Whisper 2: Indel-sensitive short read mapping." SoftwareX 14 (June 2021): 100692. http://dx.doi.org/10.1016/j.softx.2021.100692.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

David, Matei, Misko Dzamba, Dan Lister, Lucian Ilie, and Michael Brudno. "SHRiMP2: Sensitive yet Practical Short Read Mapping." Bioinformatics 27, no. 7 (January 28, 2011): 1011–12. http://dx.doi.org/10.1093/bioinformatics/btr046.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Smith, A. D., W. Y. Chung, E. Hodges, J. Kendall, G. Hannon, J. Hicks, Z. Xuan, and M. Q. Zhang. "Updates to the RMAP short-read mapping software." Bioinformatics 25, no. 21 (September 7, 2009): 2841–42. http://dx.doi.org/10.1093/bioinformatics/btp533.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Gao, Lei, Cong Wu, and Lin Liu. "AUSPP: A universal short-read pre-processing package." Journal of Bioinformatics and Computational Biology 17, no. 06 (December 2019): 1950037. http://dx.doi.org/10.1142/s0219720019500379.

Full text

Abstract:

There are many short-read aligners that can map short reads to a reference genome/sequence, and most of them can directly accept a FASTQ file as the input query file. However, the raw data usually need to be pre-processed. Few software programs specialize in pre-processing raw data generated by a variety of next-generation sequencing (NGS) technologies. Here, we present AUSPP, a Perl script-based pipeline for pre-processing and automatic mapping of NGS short reads. This pipeline encompasses quality control, adaptor trimming, collapsing of reads, structural RNA removal, length selection, read mapping, and normalized wiggle file creation. It facilitates the processing from raw data to genome mapping and is therefore a powerful tool for the steps before meta-analysis. Most importantly, since AUSPP has default processing pipeline settings for many types of NGS data, most of the time, users will simply need to provide the raw data and genome. AUSPP is portable and easy to install, and the source codes are freely available at https://github.com/highlei/AUSPP .

APA, Harvard, Vancouver, ISO, and other styles

9

Hach, Faraz, Fereydoun Hormozdiari, Can Alkan, Farhad Hormozdiari, Inanc Birol, Evan E. Eichler, and S. Cenk Sahinalp. "mrsFAST: a cache-oblivious algorithm for short-read mapping." Nature Methods 7, no. 8 (August 2010): 576–77. http://dx.doi.org/10.1038/nmeth0810-576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Martinez, Hector, Joaquin Tarraga, Ignacio Medina, Sergio Barrachina, Maribel Castillo, Joaquin Dopazo, and Enrique S. Quintana-Orti. "Concurrent and Accurate Short Read Mapping on Multicore Processors." IEEE/ACM Transactions on Computational Biology and Bioinformatics 12, no. 5 (September 1, 2015): 995–1007. http://dx.doi.org/10.1109/tcbb.2015.2392077.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Tran, Hong, Jacob Porter, Ming-an Sun, Hehuang Xie, and Liqing Zhang. "Objective and Comprehensive Evaluation of Bisulfite Short Read Mapping Tools." Advances in Bioinformatics 2014 (April 15, 2014): 1–11. http://dx.doi.org/10.1155/2014/472045.

Full text

Abstract:

Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.

APA, Harvard, Vancouver, ISO, and other styles

12

Houtgast, Ernst Joachim, Vlad-Mihai Sima, Koen Bertels, and Zaid Al-Ars. "Hardware acceleration of BWA-MEM genomic short read mapping for longer read lengths." Computational Biology and Chemistry 75 (August 2018): 54–64. http://dx.doi.org/10.1016/j.compbiolchem.2018.03.024.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Wilton, Richard, and Alexander S. Szalay. "Performance optimization in DNA short-read alignment." Bioinformatics 38, no. 8 (February 9, 2022): 2081–87. http://dx.doi.org/10.1093/bioinformatics/btac066.

Full text

Abstract:

Abstract Summary Over the past decade, short-read sequence alignment has become a mature technology. Optimized algorithms, careful software engineering and high-speed hardware have contributed to greatly increased throughput and accuracy. With these improvements, many opportunities for performance optimization have emerged. In this review, we examine three general-purpose short-read alignment tools—BWA-MEM, Bowtie 2 and Arioc—with a focus on performance optimization. We analyze the performance-related behavior of the algorithms and heuristics each tool implements, with the goal of arriving at practical methods of improving processing speed and accuracy. We indicate where an aligner's default behavior may result in suboptimal performance, explore the effects of computational constraints such as end-to-end mapping and alignment scoring threshold, and discuss sources of imprecision in the computation of alignment scores and mapping quality. With this perspective, we describe an approach to tuning short-read aligner performance to meet specific data-analysis and throughput requirements while avoiding potential inaccuracies in subsequent analysis of alignment results. Finally, we illustrate how this approach avoids easily overlooked pitfalls and leads to verifiable improvements in alignment speed and accuracy. Contact richard.wilton@jhu.edu Supplementary information Appendices referenced in this article are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

14

Wood, David L. A., Qinying Xu, John V. Pearson, Nicole Cloonan, and Sean M. Grimmond. "X-MATE: a flexible system for mapping short read data." Bioinformatics 27, no. 4 (January 6, 2011): 580–81. http://dx.doi.org/10.1093/bioinformatics/btq698.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Pireddu, L., S. Leo, and G. Zanetti. "SEAL: a distributed short read mapping and duplicate removal tool." Bioinformatics 27, no. 15 (June 22, 2011): 2159–60. http://dx.doi.org/10.1093/bioinformatics/btr325.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Linheiro, Raquel, and John Archer. "Quantification of the effects of chimerism on read mapping, differential expression and annotation following short-read de novo assembly." F1000Research 11 (January 31, 2022): 120. http://dx.doi.org/10.12688/f1000research.108489.1.

Full text

Abstract:

Background: De novo assembly is often required for analysing short-read RNA sequencing data. An under-characterized aspect of the contigs produced is chimerism, the extent to which affects mapping, differential expression analysis and annotation. Despite long-read sequencing negating this issue, short-reads remain in use through on-going research and archived datasets created during the last two decades. Consequently, there is still a need to quantify chimerism and its effects. Methods: Effects on mapping were quantified by simulating reads off the Drosophila melanogaster cDNA library and mapping these to related reference sets containing increasing levels of chimerism. Next, ten read datasets were simulated and divided into two conditions where, within one, reads representing 1000 randomly selected transcripts were over-represented across replicates. Differential expression analysis was performed iteratively with increasing chimerism within the reference set. Finally, an expectation of r-squared values describing the relationship between alignment and transcript lengths for matches involving cDNA library transcripts and those within sets containing incrementing chimerism was created. Similar values calculated for contigs produced by three graph-based assemblers, relative to the cDNA library from which input reads were simulated, or sequenced (relative to the species represented), were compared. Results: At 5% and 95% chimerism within reference sets, 100% and 77% of reads still mapped, making mapping success a poor indicator of chimerism. At 5% chimerism, of the 1000 transcripts selected for over-representation, 953 were identified during differential expression analysis; at 10% 936 were identified, while at 95% it was 510. This indicates that despite mapping success, per-transcript counts are unpredictably altered. R-squared values obtained for the three assemblers suggest that between 5-15% of contigs are chimeric. Conclusions: Although not evident based on mapping, chimerism had a significant impact on differential expression analysis and megablast identification. This will have consequences for past and present experiments involving short-reads.

APA, Harvard, Vancouver, ISO, and other styles

17

Tewolde, Rediat, Timothy Dallman, Ulf Schaefer, Carmen L. Sheppard, Philip Ashton, Bruno Pichon, Matthew Ellington, Craig Swift, Jonathan Green, and Anthony Underwood. "MOST: a modified MLST typing tool based on short read sequencing." PeerJ 4 (August 17, 2016): e2308. http://dx.doi.org/10.7717/peerj.2308.

Full text

Abstract:

Multilocus sequence typing (MLST) is an effective method to describe bacterial populations. Conventionally, MLST involves Polymerase Chain Reaction (PCR) amplification of housekeeping genes followed by Sanger DNA sequencing. Public Health England (PHE) is in the process of replacing the conventional MLST methodology with a method based on short read sequence data derived from Whole Genome Sequencing (WGS). This paper reports the comparison of the reliability of MLST results derived from WGS data, comparing mapping and assembly-based approaches to conventional methods using 323 bacterial genomes of diverse species. The sensitivity of the two WGS based methods were further investigated with 26 mixed and 29 low coverage genomic data sets fromSalmonella enteridisandStreptococcus pneumoniae. Of the 323 samples, 92.9% (n= 300), 97.5% (n= 315) and 99.7% (n= 322) full MLST profiles were derived by the conventional method, assembly- and mapping-based approaches, respectively. The concordance between samples that were typed by conventional (92.9%) and both WGS methods was 100%. From the 55 mixed and low coverage genomes, 89.1% (n= 49) and 67.3% (n= 37) full MLST profiles were derived from the mapping and assembly based approaches, respectively. In conclusion, deriving MLST from WGS data is more sensitive than the conventional method. When comparing WGS based methods, the mapping based approach was the most sensitive. In addition, the mapping based approach described here derives quality metrics, which are difficult to determine quantitatively using conventional and WGS-assembly based approaches.

APA, Harvard, Vancouver, ISO, and other styles

18

Houtgast, Ernst Joachim, VladMihai Sima, Koen Bertels, and Zaid AlArs. "An Efficient GPUAccelerated Implementation of Genomic Short Read Mapping with BWAMEM." ACM SIGARCH Computer Architecture News 44, no. 4 (January 11, 2017): 38–43. http://dx.doi.org/10.1145/3039902.3039910.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Porter, Jacob, Ming-an Sun, Hehuang Xie, and Liqing Zhang. "Investigating bisulfite short-read mapping failure with hairpin bisulfite sequencing data." BMC Genomics 16, Suppl 11 (2015): S2. http://dx.doi.org/10.1186/1471-2164-16-s11-s2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Cechova, Monika. "Probably Correct: Rescuing Repeats with Short and Long Reads." Genes 12, no. 1 (December 31, 2020): 48. http://dx.doi.org/10.3390/genes12010048.

Full text

Abstract:

Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.

APA, Harvard, Vancouver, ISO, and other styles

21

Prodanov, Timofey, and Vikas Bansal. "Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications." Nucleic Acids Research 48, no. 19 (October 9, 2020): e114-e114. http://dx.doi.org/10.1093/nar/gkaa829.

Full text

Abstract:

Abstract The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)—sequence differences between paralogous sequences—to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3–90.6%) and BLASR (82.9–90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8–21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.

APA, Harvard, Vancouver, ISO, and other styles

22

Castells-Rufas, David, Santiago Marco-Sola, Juan Carlos Moure, Quim Aguado, and Antonio Espinosa. "FPGA Acceleration of Pre-Alignment Filters for Short Read Mapping With HLS." IEEE Access 10 (2022): 22079–100. http://dx.doi.org/10.1109/access.2022.3153032.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Pandey, Ram Vinay, and Christian Schlötterer. "DistMap: A Toolkit for Distributed Short Read Mapping on a Hadoop Cluster." PLoS ONE 8, no. 8 (August 23, 2013): e72614. http://dx.doi.org/10.1371/journal.pone.0072614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ruffalo, M., M. Koyuturk, S. Ray, and T. LaFramboise. "Accurate estimation of short read mapping quality for next-generation genome sequencing." Bioinformatics 28, no. 18 (September 7, 2012): i349—i355. http://dx.doi.org/10.1093/bioinformatics/bts408.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Gouil, Quentin, and Andrew Keniry. "Latest techniques to study DNA methylation." Essays in Biochemistry 63, no. 6 (November 22, 2019): 639–48. http://dx.doi.org/10.1042/ebc20190027.

Full text

Abstract:

Abstract Bisulfite sequencing is a powerful technique to detect 5-methylcytosine in DNA that has immensely contributed to our understanding of epigenetic regulation in plants and animals. Meanwhile, research on other base modifications, including 6-methyladenine and 4-methylcytosine that are frequent in prokaryotes, has been impeded by the lack of a comparable technique. Bisulfite sequencing also suffers from a number of drawbacks that are difficult to surmount, among which DNA degradation, lack of specificity, or short reads with low sequence diversity. In this review, we explore the recent refinements to bisulfite sequencing protocols that enable targeting genomic regions of interest, detecting derivatives of 5-methylcytosine, and mapping single-cell methylomes. We then present the unique advantage of long-read sequencing in detecting base modifications in native DNA and highlight the respective strengths and weaknesses of PacBio and Nanopore sequencing for this application. Although analysing epigenetic data from long-read platforms remains challenging, the ability to detect various modified bases from a universal sample preparation, in addition to the mapping and phasing advantages of the longer read lengths, provide long-read sequencing with a decisive edge over short-read bisulfite sequencing for an expanding number of applications across kingdoms.

APA, Harvard, Vancouver, ISO, and other styles

26

Limasset, Antoine, Jean-François Flot, and Pierre Peterlongo. "Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs." Bioinformatics 36, no. 5 (February 20, 2019): 1374–81. http://dx.doi.org/10.1093/bioinformatics/btz102.

Full text

Abstract:

Abstract Motivation Short-read accuracy is important for downstream analyses such as genome assembly and hybrid long-read correction. Despite much work on short-read correction, present-day correctors either do not scale well on large datasets or consider reads as mere suites of k-mers, without taking into account their full-length sequence information. Results We propose a new method to correct short reads using de Bruijn graphs and implement it as a tool called Bcool. As a first step, Bcool constructs a compacted de Bruijn graph from the reads. This graph is filtered on the basis of k-mer abundance then of unitig abundance, thereby removing most sequencing errors. The cleaned graph is then used as a reference on which the reads are mapped to correct them. We show that this approach yields more accurate reads than k-mer-spectrum correctors while being scalable to human-size genomic datasets and beyond. Availability and implementation The implementation is open source, available at http://github.com/Malfoy/BCOOL under the Affero GPL license and as a Bioconda package. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

27

Marin, Maximillian, Roger Vargas, Michael Harris, Brendan Jeffrey, L. Elaine Epperson, David Durbin, Michael Strong, et al. "Benchmarking the empirical accuracy of short-read sequencing across the M. tuberculosis genome." Bioinformatics 38, no. 7 (January 10, 2022): 1781–87. http://dx.doi.org/10.1093/bioinformatics/btac023.

Full text

Abstract:

Abstract Motivation Short-read whole-genome sequencing (WGS) is a vital tool for clinical applications and basic research. Genetic divergence from the reference genome, repetitive sequences and sequencing bias reduces the performance of variant calling using short-read alignment, but the loss in recall and specificity has not been adequately characterized. To benchmark short-read variant calling, we used 36 diverse clinical Mycobacterium tuberculosis (Mtb) isolates dually sequenced with Illumina short-reads and PacBio long-reads. We systematically studied the short-read variant calling accuracy and the influence of sequence uniqueness, reference bias and GC content. Results Reference-based Illumina variant calling demonstrated a maximum recall of 89.0% and minimum precision of 98.5% across parameters evaluated. The approach that maximized variant recall while still maintaining high precision (<99%) was tuning the mapping quality filtering threshold, i.e. confidence of the read mapping (recall = 85.8%, precision = 99.1%, MQ ≥ 40). Additional masking of repetitive sequence content is an alternative conservative approach to variant calling that increases precision at cost to recall (recall = 70.2%, precision = 99.6%, MQ ≥ 40). Of the genomic positions typically excluded for Mtb, 68% are accurately called using Illumina WGS including 52/168 PE/PPE genes (34.5%). From these results, we present a refined list of low confidence regions across the Mtb genome, which we found to frequently overlap with regions with structural variation, low sequence uniqueness and low sequencing coverage. Our benchmarking results have broad implications for the use of WGS in the study of Mtb biology, inference of transmission in public health surveillance systems and more generally for WGS applications in other organisms. Availability and implementation All relevant code is available at https://github.com/farhat-lab/mtb-illumina-wgs-evaluation. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

28

Lee, Wan-Ping, Michael P. Stromberg, Alistair Ward, Chip Stewart, Erik P. Garrison, and Gabor T. Marth. "MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping." PLoS ONE 9, no. 3 (March 5, 2014): e90581. http://dx.doi.org/10.1371/journal.pone.0090581.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Southgate, Joel A., Matthew J. Bull, Clare M. Brown, Joanne Watkins, Sally Corden, Benjamin Southgate, Catherine Moore, and Thomas R. Connor. "Influenza classification from short reads with VAPOR facilitates robust mapping pipelines and zoonotic strain detection for routine surveillance applications." Bioinformatics 36, no. 6 (November 6, 2019): 1681–88. http://dx.doi.org/10.1093/bioinformatics/btz814.

Full text

Abstract:

Abstract Motivation Influenza viruses represent a global public health burden due to annual epidemics and pandemic potential. Due to a rapidly evolving RNA genome, inter-species transmission, intra-host variation, and noise in short-read data, reads can be lost during mapping, and de novo assembly can be time consuming and result in misassembly. We assessed read loss during mapping and designed a graph-based classifier, VAPOR, for selecting mapping references, assembly validation and detection of strains of non-human origin. Results Standard human reference viruses were insufficient for mapping diverse influenza samples in simulation. VAPOR retrieved references for 257 real whole-genome sequencing samples with a mean of >99.8% identity to assemblies, and increased the proportion of mapped reads by up to 13.3% compared to standard references. VAPOR has the potential to improve the robustness of bioinformatics pipelines for surveillance and could be adapted to other RNA viruses. Availability and implementation VAPOR is available at https://github.com/connor-lab/vapor. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

30

Wei, Po-Li, Ching-Sheng Hung, Yi-Wei Kao, Ying-Chin Lin, Cheng-Yang Lee, Tzu-Hao Chang, Ben-Chang Shia, and Jung-Chun Lin. "Characterization of Fecal Microbiota with Clinical Specimen Using Long-Read and Short-Read Sequencing Platform." International Journal of Molecular Sciences 21, no. 19 (September 26, 2020): 7110. http://dx.doi.org/10.3390/ijms21197110.

Full text

Abstract:

Accurate and rapid identification of microbiotic communities using 16S ribosomal (r)RNA sequencing is a critical task for expanding medical and clinical applications. Next-generation sequencing (NGS) is widely considered a practical approach for direct application to communities without the need for in vitro culturing. In this report, a comparative evaluation of short-read (Illumina) and long-read (Oxford Nanopore Technologies (ONT)) platforms toward 16S rRNA sequencing with the same batch of total genomic DNA extracted from fecal samples is presented. Different 16S gene regions were amplified, bar-coded, and sequenced using the Illumina MiSeq and ONT MinION sequencers and corresponding kits. Mapping of the sequenced amplicon using MinION to the entire 16S rRNA gene was analyzed with the cloud-based EPI2ME algorithm. V3–V4 reads generated using MiSeq were aligned by applying the CLC genomics workbench. More than 90% of sequenced reads generated using distinct sequencers were accurately classified at the genus or species level. The misclassification of sequenced reads at the species level between the two approaches was less substantial as expected. Taken together, the comparative results demonstrate that MinION sequencing platform coupled with the corresponding algorithm could function as a practicable strategy in classifying bacterial community to the species level.

APA, Harvard, Vancouver, ISO, and other styles

31

Flouri, Tomas, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. "Mapping Short Reads to a Genomic Sequence with Circular Structure." International Journal of Systems Biology and Biomedical Technologies 1, no. 1 (January 2012): 26–34. http://dx.doi.org/10.4018/ijsbbt.2012010103.

Full text

Abstract:

Constant advances in DNA sequencing technologies are turning whole-genome sequencing into a routine procedure, resulting in massive amounts of data that need to be processed. Tens of gigabytes of data, in the form of short sequences (reads), need to be mapped back onto reference sequences, a few gigabases long. A first generation of short-read alignment algorithms successfully employed hash tables, and the current second generation uses the Burrows-Wheeler transform, further improving speed and memory footprint. These next-generation sequencing technologies allow researchers to characterise a bacterial genome, during a single experiment, at a moderate cost. In this article, as most of the bacterial chromosomes contain a circular DNA molecule, the authors present a new simple, yet efficient, sensitive and accurate algorithm, specifically designed for mapping millions of short reads to a genomic sequence with circular structure.

APA, Harvard, Vancouver, ISO, and other styles

32

Richmond, Phillip Andrew, Alice Mary Kaye, Godfrain Jacques Kounkou, Tamar Vered Av-Shalom, and Wyeth W. Wasserman. "Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper." PLOS Computational Biology 17, no. 3 (March 22, 2021): e1008815. http://dx.doi.org/10.1371/journal.pcbi.1008815.

Full text

Abstract:

Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper.

APA, Harvard, Vancouver, ISO, and other styles

33

Chen, Yen-Lung, Bo-Yi Chang, Chia-Hsiang Yang, and Tzi-Dar Chiueh. "A High-Throughput FPGA Accelerator for Short-Read Mapping of the Whole Human Genome." IEEE Transactions on Parallel and Distributed Systems 32, no. 6 (June 1, 2021): 1465–78. http://dx.doi.org/10.1109/tpds.2021.3051011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Zhao, Qiong-Yi, Jacob Gratten, Restuadi Restuadi, and Xuan Li. "Mapping and differential expression analysis from short-read RNA-Seq data in model organisms." Quantitative Biology 4, no. 1 (March 2016): 22–35. http://dx.doi.org/10.1007/s40484-016-0060-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Alser, Mohammed, Hasan Hassan, Hongyi Xin, Oğuz Ergin, Onur Mutlu, and Can Alkan. "GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping." Bioinformatics 33, no. 21 (May 31, 2017): 3355–63. http://dx.doi.org/10.1093/bioinformatics/btx342.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Watson, Simon J., Matthijs R. A. Welkers, Daniel P. Depledge, Eve Coulter, Judith M. Breuer, Menno D. de Jong, and Paul Kellam. "Viral population analysis and minority-variant detection using short read next-generation sequencing." Philosophical Transactions of the Royal Society B: Biological Sciences 368, no. 1614 (March 19, 2013): 20120205. http://dx.doi.org/10.1098/rstb.2012.0205.

Full text

Abstract:

RNA viruses within infected individuals exist as a population of evolutionary-related variants. Owing to evolutionary change affecting the constitution of this population, the frequency and/or occurrence of individual viral variants can show marked or subtle fluctuations. Since the development of massively parallel sequencing platforms, such viral populations can now be investigated to unprecedented resolution. A critical problem with such analyses is the presence of sequencing-related errors that obscure the identification of true biological variants present at low frequency. Here, we report the development and assessment of the Quality Assessment of Short Read (QUASR) Pipeline ( http://sourceforge.net/projects/quasr ) specific for virus genome short read analysis that minimizes sequencing errors from multiple deep-sequencing platforms, and enables post-mapping analysis of the minority variants within the viral population. QUASR significantly reduces the error-related noise in deep-sequencing datasets, resulting in increased mapping accuracy and reduction of erroneous mutations. Using QUASR, we have determined influenza virus genome dynamics in sequential samples from an in vitro evolution of 2009 pandemic H1N1 (A/H1N1/09) influenza from samples sequenced on both the Roche 454 GSFLX and Illumina GAIIx platforms. Importantly, concordance between the 454 and Illumina sequencing allowed unambiguous minority-variant detection and accurate determination of virus population turnover in vitro .

APA, Harvard, Vancouver, ISO, and other styles

37

Kim, Youngho, Munseong Kang, Ju-Hui Jeong, Dae Woong Kang, Soo Jun Park, and Jeong Seop Sim. "Reference Mapping Considering Swaps of Adjacent Bases." Applied Sciences 11, no. 11 (May 29, 2021): 5038. http://dx.doi.org/10.3390/app11115038.

Full text

Abstract:

Since the time of the HGP, research into next-generation sequencing, which can reduce the cost and time of sequence analysis using computer algorithms, has been actively conducted. Mapping is a next-generation sequencing method that identifies sequences by aligning short reads with a reference genome for which sequence information is known. Mapping can be applied to tasks such as SNP calling, motif searches, and gene identification. Research on mapping that utilizes BWT and GPU has been undertaken in order to obtain faster mapping. In this paper, we propose a new mapping algorithm with additional consideration for base swaps. The experimental results demonstrate that when the penalty score for swaps was −1, −2, and −3 in paired-end alignment, for the human whole genome, SOAP3-swap aligned 4667, 2318, and 972 more read pairs, respectively, than SOAP3-dp, and for the drosophila genome, SOAP3-swap aligned 1253, 454, and 129 more read pairs, respectively, than SOAP3-dp. SOAP3-swap has the same functionality as that of SOAP3-dp and also improves the alignment ratio by taking biologically significant swaps into account for the first time.

APA, Harvard, Vancouver, ISO, and other styles

38

Weissensteiner, Matthias H., Andy W. C. Pang, Ignas Bunikis, Ida Höijer, Olga Vinnere-Petterson, Alexander Suh, and Jochen B. W. Wolf. "Combination of short-read, long-read, and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications." Genome Research 27, no. 5 (March 30, 2017): 697–708. http://dx.doi.org/10.1101/gr.215095.116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Soto, Daniela C., Colin Shew, Mira Mastoras, Joshua M. Schmidt, Ruta Sahasrabudhe, Gulhan Kaya, Aida M. Andrés, and Megan Y. Dennis. "Identification of Structural Variation in Chimpanzees Using Optical Mapping and Nanopore Sequencing." Genes 11, no. 3 (March 4, 2020): 276. http://dx.doi.org/10.3390/genes11030276.

Full text

Abstract:

Recent efforts to comprehensively characterize great ape genetic diversity using short-read sequencing and single-nucleotide variants have led to important discoveries related to selection within species, demographic history, and lineage-specific traits. Structural variants (SVs), including deletions and inversions, comprise a larger proportion of genetic differences between and within species, making them an important yet understudied source of trait divergence. Here, we used a combination of long-read and -range sequencing approaches to characterize the structural variant landscape of two additional Pan troglodytes verus individuals, one of whom carries 13% admixture from Pan troglodytes troglodytes. We performed optical mapping of both individuals followed by nanopore sequencing of one individual. Filtering for larger variants (>10 kbp) and combined with genotyping of SVs using short-read data from the Great Ape Genome Project, we identified 425 deletions and 59 inversions, of which 88 and 36, respectively, were novel. Compared with gene expression in humans, we found a significant enrichment of chimpanzee genes with differential expression in lymphoblastoid cell lines and induced pluripotent stem cells, both within deletions and near inversion breakpoints. We examined chromatin-conformation maps from human and chimpanzee using these same cell types and observed alterations in genomic interactions at SV breakpoints. Finally, we focused on 56 genes impacted by SVs in >90% of chimpanzees and absent in humans and gorillas, which may contribute to chimpanzee-specific features. Sequencing a greater set of individuals from diverse subspecies will be critical to establish the complete landscape of genetic variation in chimpanzees.

APA, Harvard, Vancouver, ISO, and other styles

40

Feng, Yi, Leslie Y. Beh, Wei-Jen Chang, and Laura F. Landweber. "SIGAR: Inferring Features of Genome Architecture and DNA Rearrangements by Split-Read Mapping." Genome Biology and Evolution 12, no. 10 (August 13, 2020): 1711–18. http://dx.doi.org/10.1093/gbe/evaa147.

Full text

Abstract:

Abstract Ciliates are microbial eukaryotes with distinct somatic and germline genomes. Postzygotic development involves extensive remodeling of the germline genome to form somatic chromosomes. Ciliates therefore offer a valuable model for studying the architecture and evolution of programed genome rearrangements. Current studies usually focus on a few model species, where rearrangement features are annotated by aligning reference germline and somatic genomes. Although many high-quality somatic genomes have been assembled, a high-quality germline genome assembly is difficult to obtain due to its smaller DNA content and abundance of repetitive sequences. To overcome these hurdles, we propose a new pipeline, SIGAR (Split-read Inference of Genome Architecture and Rearrangements) to infer germline genome architecture and rearrangement features without a germline genome assembly, requiring only short DNA sequencing reads. As a proof of principle, 93% of rearrangement junctions identified by SIGAR in the ciliate Oxytricha trifallax were validated by the existing germline assembly. We then applied SIGAR to six diverse ciliate species without germline genome assemblies, including Ichthyophthirius multifilii, a fish pathogen. Despite the high level of somatic DNA contamination in each sample, SIGAR successfully inferred rearrangement junctions, short eliminated sequences, and potential scrambled genes in each species. This pipeline enables pilot surveys or exploration of DNA rearrangements in species with limited DNA material access, thereby providing new insights into the evolution of chromosome rearrangements.

APA, Harvard, Vancouver, ISO, and other styles

41

Tárraga, Joaquín, Vicente Arnau, Héctor Martínez, Raul Moreno, Diego Cazorla, José Salavert-Torres, Ignacio Blanquer-Espert, Joaquín Dopazo, and Ignacio Medina. "Acceleration of short and long DNA read mapping without loss of accuracy using suffix array." Bioinformatics 30, no. 23 (August 20, 2014): 3396–98. http://dx.doi.org/10.1093/bioinformatics/btu553.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Lee, Hayan, and Michael C. Schatz. "Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score." Bioinformatics 28, no. 16 (July 4, 2012): 2097–105. http://dx.doi.org/10.1093/bioinformatics/bts330.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Valiente-Mullor, Carlos, Beatriz Beamud, Iván Ansari, Carlos Francés-Cuesta, Neris García-González, Lorena Mejía, Paula Ruiz-Hueso, and Fernando González-Candelas. "One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads." PLOS Computational Biology 17, no. 1 (January 27, 2021): e1008678. http://dx.doi.org/10.1371/journal.pcbi.1008678.

Full text

Abstract:

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.

APA, Harvard, Vancouver, ISO, and other styles

44

Xie, Chao, Zhen Xuan Yeo, Marie Wong, Jason Piper, Tao Long, Ewen F. Kirkness, William H. Biggs, et al. "Fast and accurate HLA typing from short-read next-generation sequence data with xHLA." Proceedings of the National Academy of Sciences 114, no. 30 (July 3, 2017): 8059–64. http://dx.doi.org/10.1073/pnas.1707945114.

Full text

Abstract:

The HLA gene complex on human chromosome 6 is one of the most polymorphic regions in the human genome and contributes in large part to the diversity of the immune system. Accurate typing of HLA genes with short-read sequencing data has historically been difficult due to the sequence similarity between the polymorphic alleles. Here, we introduce an algorithm, xHLA, that iteratively refines the mapping results at the amino acid level to achieve 99–100% four-digit typing accuracy for both class I and II HLA genes, taking only∼3 min to process a 30× whole-genome BAM file on a desktop computer.

APA, Harvard, Vancouver, ISO, and other styles

45

Jeske, Tim, Peter Huypens, Laura Stirm, Selina Höckele, Christine M. Wurmser, Anja Böhm, Cora Weigert, et al. "DEUS: an R package for accurate small RNA profiling based on differential expression of unique sequences." Bioinformatics 35, no. 22 (June 22, 2019): 4834–36. http://dx.doi.org/10.1093/bioinformatics/btz495.

Full text

Abstract:

Abstract Summary Despite their fundamental role in various biological processes, the analysis of small RNA sequencing data remains a challenging task. Major obstacles arise when short RNA sequences map to multiple locations in the genome, align to regions that are not annotated or underwent post-transcriptional changes which hamper accurate mapping. In order to tackle these issues, we present a novel profiling strategy that circumvents the need for read mapping to a reference genome by utilizing the actual read sequences to determine expression intensities. After differential expression analysis of individual sequence counts, significant sequences are annotated against user defined feature databases and clustered by sequence similarity. This strategy enables a more comprehensive and concise representation of small RNA populations without any data loss or data distortion. Availability and implementation Code and documentation of our R package at http://ibis.helmholtz-muenchen.de/deus/. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

46

Hamada, Michiaki, Edward Wijaya, Martin C. Frith, and Kiyoshi Asai. "Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection." Bioinformatics 27, no. 22 (October 5, 2011): 3085–92. http://dx.doi.org/10.1093/bioinformatics/btr537.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Liu, Yuan, Yongchao Ma, Evan Salsman, Frank A. Manthey, Elias M. Elias, Xuehui Li, and Changhui Yan. "An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis." Journal of Bioinformatics and Computational Biology 17, no. 06 (December 2019): 1940012. http://dx.doi.org/10.1142/s0219720019400122.

Full text

Abstract:

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.

APA, Harvard, Vancouver, ISO, and other styles

48

Coombe, Lauren, Vladimir Nikolić, Justin Chu, Inanc Birol, and René L. Warren. "ntJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs." Bioinformatics 36, no. 12 (April 20, 2020): 3885–87. http://dx.doi.org/10.1093/bioinformatics/btaa253.

Full text

Abstract:

Abstract Summary The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory. Availability and implementation ntJoin is written in C++ and Python and is freely available at https://github.com/bcgsc/ntjoin. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

49

Shen, Feichen, and Jeffrey M. Kidd. "Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2." Genes 11, no. 2 (January 29, 2020): 141. http://dx.doi.org/10.3390/genes11020141.

Full text

Abstract:

Gene duplication is a major mechanism for the evolution of gene novelty, and copy-number variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

APA, Harvard, Vancouver, ISO, and other styles

50

SOGABE, Yoko, and Tsutomu MARUYAMA. "A Fast and Accurate FPGA System for Short Read Mapping Based on Parallel Comparison on Hash Table." IEICE Transactions on Information and Systems E100.D, no. 5 (2017): 1016–25. http://dx.doi.org/10.1587/transinf.2016edp7262.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!