Journal articles on the topic 'Basecalling'

To see the other types of publications on this topic, follow the link: Basecalling.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 43 journal articles for your research on the topic 'Basecalling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Walther, D., G. Bartha, and M. Morris. "Basecalling with LifeTrace." Genome Research 11, no. 5 (May 1, 2001): 875–88. http://dx.doi.org/10.1101/gr.177901.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Boufounos, Petros, Sameh El-Difrawy, and Dan Ehrlich. "Basecalling using hidden Markov models." Journal of the Franklin Institute 341, no. 1-2 (January 2004): 23–36. http://dx.doi.org/10.1016/j.jfranklin.2003.12.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Elbialy, Ali, M. A. El-Dosuky, and Ibrahim M. El-Henawy. "Quality of Third Generation Sequencing." Journal of Computational and Theoretical Nanoscience 17, no. 12 (December 1, 2020): 5205–9. http://dx.doi.org/10.1166/jctn.2020.9630.

Full text
Abstract:
Third generation sequencing (TGS) relates to long reads but with relatively high error rates. Quality of TGS is a hot topic, dealing with errors. This paper combines and investigates three quality related metrics. They are basecalling accuracy, Phred Quality Scores, and GC content. For basecalling accuracy, a deep neural network is adopted. The measured loss does not exceed 5.42.
APA, Harvard, Vancouver, ISO, and other styles
4

Napieralski, Adam, and Robert Nowak. "Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing." Sensors 22, no. 6 (March 15, 2022): 2275. http://dx.doi.org/10.3390/s22062275.

Full text
Abstract:
Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina’s. Basecallers differ in the input data type—currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder–decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller—Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence.
APA, Harvard, Vancouver, ISO, and other styles
5

Liang, Kuo-ching, Xiaodong Wang, and Dimitris Anastassiou. "Bayesian Basecalling for DNA Sequence Analysis Using Hidden Markov Models." IEEE/ACM Transactions on Computational Biology and Bioinformatics 4, no. 3 (July 2007): 430–40. http://dx.doi.org/10.1109/tcbb.2007.1027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Bonet, Jose, Mandi Chen, Marc Dabad, Simon Heath, Abel Gonzalez-Perez, Nuria Lopez-Bigas, and Jens Lagergren. "DeepMP: a deep learning tool to detect DNA base modifications on Nanopore sequencing data." Bioinformatics 38, no. 5 (October 28, 2021): 1235–43. http://dx.doi.org/10.1093/bioinformatics/btab745.

Full text
Abstract:
Abstract Motivation DNA methylation plays a key role in a variety of biological processes. Recently, Nanopore long-read sequencing has enabled direct detection of these modifications. As a consequence, a range of computational methods have been developed to exploit Nanopore data for methylation detection. However, current approaches rely on a human-defined threshold to detect the methylation status of a genomic position and are not optimized to detect sites methylated at low frequency. Furthermore, most methods use either the Nanopore signals or the basecalling errors as the model input and do not take advantage of their combination. Results Here, we present DeepMP, a convolutional neural network-based model that takes information from Nanopore signals and basecalling errors to detect whether a given motif in a read is methylated or not. Besides, DeepMP introduces a threshold-free position modification calling model sensitive to sites methylated at low frequency across cells. We comprehensively benchmarked DeepMP against state-of-the-art methods on Escherichia coli, human and pUC19 datasets. DeepMP outperforms current approaches at read-based and position-based methylation detection across sites methylated at different frequencies in the three datasets. Availability and implementation DeepMP is implemented and freely available under MIT license at https://github.com/pepebonet/DeepMP. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhan, Y., and D. Kulp. "Model-P: a basecalling method for resequencing microarrays of diploid samples." Bioinformatics 21, Suppl 2 (September 1, 2005): ii182—ii189. http://dx.doi.org/10.1093/bioinformatics/bti1129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tonazzini, Anna, and Luigi Bedini. "Statistical analysis of electrophoresis time series for improving basecalling in DNA sequencing." International Journal of Signal and Imaging Systems Engineering 1, no. 1 (2008): 36. http://dx.doi.org/10.1504/ijsise.2008.017772.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lou, Qian, and Lei Jiang. "BRAWL: A Spintronics-Based Portable Basecalling-in-Memory Architecture for Nanopore Genome Sequencing." IEEE Computer Architecture Letters 17, no. 2 (July 1, 2018): 241–44. http://dx.doi.org/10.1109/lca.2018.2882384.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dumschott, Kathryn, Maximilian H.-W. Schmidt, Harmeet Singh Chawla, Rod Snowdon, and Björn Usadel. "Oxford Nanopore sequencing: new opportunities for plant genomics?" Journal of Experimental Botany 71, no. 18 (May 27, 2020): 5313–22. http://dx.doi.org/10.1093/jxb/eraa263.

Full text
Abstract:
Abstract DNA sequencing was dominated by Sanger’s chain termination method until the mid-2000s, when it was progressively supplanted by new sequencing technologies that can generate much larger quantities of data in a shorter time. At the forefront of these developments, long-read sequencing technologies (third-generation sequencing) can produce reads that are several kilobases in length. This greatly improves the accuracy of genome assemblies by spanning the highly repetitive segments that cause difficulty for second-generation short-read technologies. Third-generation sequencing is especially appealing for plant genomes, which can be extremely large with long stretches of highly repetitive DNA. Until recently, the low basecalling accuracy of third-generation technologies meant that accurate genome assembly required expensive, high-coverage sequencing followed by computational analysis to correct for errors. However, today’s long-read technologies are more accurate and less expensive, making them the method of choice for the assembly of complex genomes. Oxford Nanopore Technologies (ONT), a third-generation platform for the sequencing of native DNA strands, is particularly suitable for the generation of high-quality assemblies of highly repetitive plant genomes. Here we discuss the benefits of ONT, especially for the plant science community, and describe the issues that remain to be addressed when using ONT for plant genome sequencing.
APA, Harvard, Vancouver, ISO, and other styles
11

Rahube, Teddie O., Andrew D. S. Cameron, Nicole A. Lerminiaux, Supriya V. Bhat, and Kathleen A. Alexander. "Globally Disseminated Multidrug Resistance Plasmids Revealed by Complete Assembly of Multidrug Resistant Escherichia coli and Klebsiella pneumoniae Genomes from Diarrheal Disease in Botswana." Applied Microbiology 2, no. 4 (November 11, 2022): 934–49. http://dx.doi.org/10.3390/applmicrobiol2040071.

Full text
Abstract:
Antimicrobial resistance is a disseminated global health challenge because many of the genes that cause resistance can transfer horizontally between bacteria. Despite the central role of extrachromosomal DNA elements called plasmids in driving the spread of resistance, the detection and surveillance of plasmids remains a significant barrier in molecular epidemiology. We assessed two DNA sequencing platforms alone and in combination for laboratory diagnostics in Botswana by annotating antibiotic resistance genes and plasmids in extensively drug resistant bacteria from diarrhea in Botswana. Long-read Nanopore DNA sequencing and high accuracy basecalling effectively estimated the architecture and gene content of three plasmids in Escherichia coli HUM3355 and two plasmids in Klebsiella pneumoniae HUM7199. Polishing the assemblies with Illumina reads increased base calling precision with small improvements to gene prediction. All five plasmids encoded one or more antibiotic resistance genes, usually within gene islands containing multiple antibiotic and metal resistance genes, and four plasmids encoded genes associated with conjugative transfer. Two plasmids were almost identical to antibiotic resistance plasmids sequenced in Europe and North America from human infection and a pig farm. These One Health connections demonstrate how low-, middle-, and high-income countries collectively benefit from increased whole genome sequencing capacity for surveillance and tracking of infectious diseases and antibiotic resistance genes that can transfer between animal hosts and move across continents.
APA, Harvard, Vancouver, ISO, and other styles
12

Senol Cali, Damla, Jeremie S. Kim, Saugata Ghose, Can Alkan, and Onur Mutlu. "Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions." Briefings in Bioinformatics 20, no. 4 (April 2, 2018): 1542–59. http://dx.doi.org/10.1093/bib/bby017.

Full text
Abstract:
Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
APA, Harvard, Vancouver, ISO, and other styles
13

Sater, Mohamad, Remy Schwab, Ian Herriott, Tim Farrell, and Miriam Huntley. "807. Same-day Transmission Analysis of Nosocomial Transmission Using Nanopore Whole Genome Sequencing." Open Forum Infectious Diseases 8, Supplement_1 (November 1, 2021): S497—S498. http://dx.doi.org/10.1093/ofid/ofab466.1003.

Full text
Abstract:
Abstract Background Healthcare associated infections (HAIs) are a major contributor to patient morbidity and mortality worldwide. HAIs are increasingly important due to the rise of multidrug resistant pathogens which can lead to deadly nosocomial outbreaks. Current methods for investigating transmissions are slow, costly, or have poor detection resolution. A rapid, cost-effective and high-resolution method to identify transmission events is imperative to guide infection control. Whole genome sequencing of infecting pathogens paired with a single nucleotide polymorphism (SNP) analysis can provide high-resolution clonality determination, yet these methods typically have long turnaround times. Here we examined the utility of the Oxford Nanopore Technologies (ONT) platform, a rapid sequencing technology, for whole genome sequencing based transmission analysis. Methods We developed a SNP calling pipeline customized for ONT data, which exhibit higher sequencing error rates and can therefore be challenging for transmission analysis. The pipeline leverages the latest basecalling tools as well as a suite of custom variant calling and filtering algorithms to achieve highest accuracy in clonality calls compared to Illumina-based sequencing. We also capitalize on ONT long reads by assembling outbreak-specific genomes in order to overcome the need for an external reference genome. Results We examined 20 bacterial isolates from 5 HAI investigations previously performed at Day Zero Diagnostics as part of epiXact®, our commercialized Illumina-based HAI sequencing and analysis service. Using the ONT data and pipeline, we achieved greater than 90% SNP-calling sensitivity and precision, allowing 100% accuracy of clonality classification compared to Illumina-based results across common HAI species. We demonstrate the validity and increased resolution of our SNP analysis pipeline using assembled genomes from each outbreak. We also demonstrate that this ONT-based workflow can produce isolate to transmission determination (i.e. including WGS and analysis) in less than 24 hours. SNP calling performance ONT-based SNP calling sensitivity and precision compared to Illumina-based pipeline Conclusion We demonstrate the utility of ONT for HAI investigation, establishing the potential to transform healthcare epidemiology with same-day high-resolution transmission determination. Disclosures Mohamad Sater, PhD, Day Zero Diagnostics (Employee, Shareholder) Remy Schwab, MSc, Day Zero Diagnostics (Employee, Shareholder) Ian Herriott, BS, Day Zero Diagnostics (Employee, Shareholder) Tim Farrell, MS, Day Zero Diagnostics, Inc. (Employee, Shareholder) Miriam Huntley, PhD, Day Zero Diagnostics (Employee, Shareholder)
APA, Harvard, Vancouver, ISO, and other styles
14

Burns, Adam, David Robert Bruce, Pauline Robbe, Adele Timbs, Basile Stamatopoulos, Ruth Clifford, Maria Lopopolo, Duncan Parkes, Kate E. Ridout, and Anna Schuh. "Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing." Blood 132, Supplement 1 (November 29, 2018): 1847. http://dx.doi.org/10.1182/blood-2018-99-110948.

Full text
Abstract:
Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.
APA, Harvard, Vancouver, ISO, and other styles
15

Hughes, Andrew E. O., Maureen C. Montgomery, Chang Liu, and Eric T. Weimer. "Quantification of Allele-Specific HLA Expression with Nanopore Long-Read Sequencing." Blood 136, Supplement 1 (November 5, 2020): 42–43. http://dx.doi.org/10.1182/blood-2020-140902.

Full text
Abstract:
Human leukocyte antigen (HLA) typing plays a critical role in evaluating donor-recipient compatibility prior to hematopoietic cell transplantation (HCT) to minimize the risk of rejection and graft versus host disease (GVHD). Compared to traditional sequence-based methods for HLA typing, next-generation sequencing offers significant advantages in terms of accuracy, turnaround time, and cost (Weimer et al., JMD, 2016). Nevertheless, an intrinsic limitation of DNA-based typing is that it does not quantify HLA gene expression, which has been implicated in clinical outcomes (Petersdorf et al., Blood, 2014; Petersdorf et al., NEJM, 2015). Previously, we demonstrated simultaneous HLA class I genotyping and gene-level expression analysis by RNA-seq using nanopore long-read sequencing (Montgomery et al., JMD, 2020). Given that mismatches in both class I and class II HLA genes-as well as the relative expression of individual alleles-impact donor-recipient compatibility, we sought to build on our previous work by quantifying allele-specific expression of both class I and class II HLA loci in donor lymphocytes. For this study, mRNA was isolated from peripheral blood lymphocytes from 12 donors. Barcoded cDNA libraries were prepared and sequenced on MinION flow cells (R9.4.1) using MinKNOW (v3.1.13) to a median depth of 1.6x106reads. Basecalling and demultiplexing were performed with Albacore (v2.3.4) or Guppy (v2.3.1), and adapter trimming was performed with Porechop (v0.2.3). Processed reads were aligned to the international ImMunoGeneTics project (IMGT) HLA database (v3.41.0) using minimap2 (v2.17). Reads mapping to individual HLA loci were realigned to allele-specific references using subject HLA types determined by Athlon (v1.0) or Illumina sequencing. In parallel, library size factors were estimated by aligning reads to GRCh38, counting reads in genes with HTseq (v0.12.4), and using trimmed mean of M-values normalization. As shown in Fig. 1, we observed higher expression of HLA class I genes compared to class II (median 593 vs. 150, p < 0.001, Mann-Whitney U test), a pattern consistent with a mixture of primarily T cells, which express class I genes, as well as B cells, which express both class I and II. Within class I genes, we observed the highest expression of HLA-B, followed by HLA-A, and HLA-C (median 663, 578, and 459, respectively). Within class II, we observed the highest expression of HLA-DPB1, followed by HLA-DRB1, and HLA-DQB1 (median 281, 266, and 104, respectively). Importantly, we observed significant variation in expression both between and within alleles of individual HLA genes, suggesting that HLA type alone does not accurately predict HLA expression. We next analyzed HLA-DPB1 specifically, given reports that the risk of GVHD in HCT recipients with HLA-DPB1mismatched donors is modulated by HLA-DPB1 expression (Petersdorf et al., NEJM, 2015). Of note, HLA-DPB1 expression is linked to a single nucleotide polymorphism, rs9277534, which can be imputed from HLA-DPB1 type (Meurer et al., Front Immunol, 2018). Accordingly, we analyzed HLA-DPB1 expression conditioned on rs9277534 genotype. Although we observed lower HLA-DPB1 expression for the 'A' allele compared to 'G' (median 220 vs. 265), consistent with the reported association, this difference was not statistically significant (p = 0.22, Mann-Whitney U test). Furthermore, we observed significant variation in expression among 'A' alleles, with normalized counts ranging from 57 to 408 (vs. 191 to 367 for 'G' alleles). In this study, we demonstrate the feasibility of quantifying allele-specific expression of both class I and class II HLA genes with nanopore long-read sequencing. Taken together, our results reveal extensive variation in the expression of class I and class II HLA loci, even after accounting for individual allele types and known markers of expression. These results emphasize the potential value of methods, such as nanopore sequencing, for directly quantifying allele-specific HLA expression to develop improved risk prediction models that can inform the evaluation of donor-recipient immunocompatibility. Disclosures No relevant conflicts of interest to declare.
APA, Harvard, Vancouver, ISO, and other styles
16

Tan, Kar-Tong, Michael K. Slevin, Matthew Meyerson, and Heng Li. "Identifying and correcting repeat-calling errors in nanopore sequencing of telomeres." Genome Biology 23, no. 1 (August 26, 2022). http://dx.doi.org/10.1186/s13059-022-02751-6.

Full text
Abstract:
AbstractNanopore long-read sequencing is an emerging approach for studying genomes, including long repetitive elements like telomeres. Here, we report extensive basecalling induced errors at telomere repeats across nanopore datasets, sequencing platforms, basecallers, and basecalling models. We find that telomeres in many organisms are frequently miscalled. We demonstrate that tuning of nanopore basecalling models leads to improved recovery and analysis of telomeric regions, with minimal negative impact on other genomic regions. We highlight the importance of verifying nanopore basecalls in long, repetitive, and poorly defined regions, and showcase how artefacts can be resolved by improvements in nanopore basecalling models.
APA, Harvard, Vancouver, ISO, and other styles
17

Neumann, Don, Anireddy S. N. Reddy, and Asa Ben-Hur. "RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data." BMC Bioinformatics 23, no. 1 (April 20, 2022). http://dx.doi.org/10.1186/s12859-022-04686-y.

Full text
Abstract:
Abstract Background Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. Results We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore’s RNA basecallers. Availability The source code for our basecaller is available at: https://github.com/biodlab/RODAN.
APA, Harvard, Vancouver, ISO, and other styles
18

Ferguson, Scott, Todd McLay, Rose L. Andrew, Jeremy J. Bruhl, Benjamin Schwessinger, Justin Borevitz, and Ashley Jones. "Species-specific basecallers improve actual accuracy of nanopore sequencing in plants." Plant Methods 18, no. 1 (December 14, 2022). http://dx.doi.org/10.1186/s13007-022-00971-2.

Full text
Abstract:
Abstract Background Long-read sequencing platforms offered by Oxford Nanopore Technologies (ONT) allow native DNA containing epigenetic modifications to be directly sequenced, but can be limited by lower per-base accuracies. A key step post-sequencing is basecalling, the process of converting raw electrical signals produced by the sequencing device into nucleotide sequences. This is challenging as current basecallers are primarily based on mixtures of model species for training. Here we utilise both ONT PromethION and higher accuracy PacBio Sequel II HiFi sequencing on two plants, Phebalium stellatum and Xanthorrhoea johnsonii, to train species-specific basecaller models with the aim of improving per-base accuracy. We investigate sequencing accuracies achieved by ONT basecallers and assess accuracy gains by training single-species and species-specific basecaller models. We also evaluate accuracy gains from ONT’s improved flowcells (R10.4, FLO-PRO112) and sequencing kits (SQK-LSK112). For the truth dataset for both model training and accuracy assessment, we developed highly accurate, contiguous diploid reference genomes with PacBio Sequel II HiFi reads. Results Basecalling with ONT Guppy 5 and 6 super-accurate gave almost identical results, attaining read accuracies of 91.96% and 94.15%. Guppy’s plant-specific model gave highly mixed results, attaining read accuracies of 91.47% and 96.18%. Species-specific basecalling models improved read accuracy, attaining 93.24% and 95.16% read accuracies. R10.4 sequencing kits also improve sequencing accuracy, attaining read accuracies of 95.46% (super-accurate) and 96.87% (species-specific). Conclusions The use of a single mixed-species basecaller model, such as ONT Guppy super-accurate, may be reducing the accuracy of nanopore sequencing, due to conflicting genome biology within the training dataset and study species. Training of single-species and genome-specific basecaller models improves read accuracy. Studies that aim to do large-scale long-read genotyping would primarily benefit from training their own basecalling models. Such studies could use sequencing accuracy gains and improving bioinformatics tools to improve study outcomes.
APA, Harvard, Vancouver, ISO, and other styles
19

Zeng, Jingwen, Hongmin Cai, Hong Peng, Haiyan Wang, Yue Zhang, and Tatsuya Akutsu. "Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network." Frontiers in Genetics 10 (January 20, 2020). http://dx.doi.org/10.3389/fgene.2019.01332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Zhang, Yao-zhong, Arda Akdemir, Georg Tremmel, Seiya Imoto, Satoru Miyano, Tetsuo Shibuya, and Rui Yamaguchi. "Nanopore basecalling from a perspective of instance segmentation." BMC Bioinformatics 21, S3 (April 2020). http://dx.doi.org/10.1186/s12859-020-3459-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Nykrynova, Marketa, Roman Jakubicek, Vojtech Barton, Matej Bezdicek, Martina Lengerova, and Helena Skutkova. "Using deep learning for gene detection and classification in raw nanopore signals." Frontiers in Microbiology 13 (September 15, 2022). http://dx.doi.org/10.3389/fmicb.2022.942179.

Full text
Abstract:
Recently, nanopore sequencing has come to the fore as library preparation is rapid and simple, sequencing can be done almost anywhere, and longer reads are obtained than with next-generation sequencing. The main bottleneck still lies in data postprocessing which consists of basecalling, genome assembly, and localizing significant sequences, which is time consuming and computationally demanding, thus prolonging delivery of crucial results for clinical practice. Here, we present a neural network-based method capable of detecting and classifying specific genomic regions already in raw nanopore signals—squiggles. Therefore, the basecalling process can be omitted entirely as the raw signals of significant genes, or intergenic regions can be directly analyzed, or if the nucleotide sequences are required, the identified squiggles can be basecalled, preferably to others. The proposed neural network could be included directly in the sequencing run, allowing real-time squiggle processing.
APA, Harvard, Vancouver, ISO, and other styles
22

Tavakoli, Sepideh, Mohammad Nabizadeh, Amr Makhamreh, Howard Gamper, Caroline A. McCormick, Neda K. Rezapour, Ya-Ming Hou, Meni Wanunu, and Sara H. Rouhanifard. "Semi-quantitative detection of pseudouridine modifications and type I/II hypermodifications in human mRNAs using direct long-read sequencing." Nature Communications 14, no. 1 (January 19, 2023). http://dx.doi.org/10.1038/s41467-023-35858-w.

Full text
Abstract:
AbstractHere, we develop and apply a semi-quantitative method for the high-confidence identification of pseudouridylated sites on mammalian mRNAs via direct long-read nanopore sequencing. A comparative analysis of a modification-free transcriptome reveals that the depth of coverage and specific k-mer sequences are critical parameters for accurate basecalling. By adjusting these parameters for high-confidence U-to-C basecalling errors, we identify many known sites of pseudouridylation and uncover previously unreported uridine-modified sites, many of which fall in k-mers that are known targets of pseudouridine synthases. Identified sites are validated using 1000-mer synthetic RNA controls bearing a single pseudouridine in the center position, demonstrating systematic under-calling using our approach. We identify mRNAs with up to 7 unique modification sites. Our workflow allows direct detection of low-, medium-, and high-occupancy pseudouridine modifications on native RNA molecules from nanopore sequencing data and multiple modifications on the same strand.
APA, Harvard, Vancouver, ISO, and other styles
23

Wick, Ryan R., Louise M. Judd, and Kathryn E. Holt. "Performance of neural network basecalling tools for Oxford Nanopore sequencing." Genome Biology 20, no. 1 (June 24, 2019). http://dx.doi.org/10.1186/s13059-019-1727-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Weinmaier, Thomas, Rick Conzemius, Yehudit Bergman, Shawna Lewis, Emily B. Jacobs, Pranita D. Tamma, Arne Materna, Johannes Weinberger, Stephan Beisken, and Patricia J. Simner. "Validation and Application of Long-Read Whole-Genome Sequencing for Antimicrobial Resistance Gene Detection and Antimicrobial Susceptibility Testing." Antimicrobial Agents and Chemotherapy, December 19, 2022. http://dx.doi.org/10.1128/aac.01072-22.

Full text
Abstract:
Next-generation sequencing applications are increasingly used for detection and characterization of antimicrobial-resistant pathogens in clinical settings. Oxford Nanopore Technologies (ONT) sequencing offers advantages for clinical use compared with other sequencing methodologies because it enables real-time basecalling, produces long sequencing reads that increase the ability to correctly assemble DNA fragments, provides short turnaround times, and requires relatively uncomplicated sample preparation.
APA, Harvard, Vancouver, ISO, and other styles
25

Silvestre-Ryan, Jordi, and Ian Holmes. "Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing." Genome Biology 22, no. 1 (January 19, 2021). http://dx.doi.org/10.1186/s13059-020-02255-1.

Full text
Abstract:
AbstractWe develop a general computational approach for improving the accuracy of basecalling with Oxford Nanopore’s 1D2 and related sequencing protocols. Our software PoreOver (https://github.com/jordisr/poreover) finds the consensus of two neural networks by aligning their probability profiles, and is compatible with multiple nanopore basecallers. When applied to the recently-released Bonito basecaller, our method reduces the median sequencing error by more than half.
APA, Harvard, Vancouver, ISO, and other styles
26

Rausch, Tobias, Markus Hsi-Yang Fritz, Andreas Untergasser, and Vladimir Benes. "Tracy: basecalling, alignment, assembly and deconvolution of sanger chromatogram trace files." BMC Genomics 21, no. 1 (March 14, 2020). http://dx.doi.org/10.1186/s12864-020-6635-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

You, Yupei, Michael B. Clark, and Heejung Shim. "NanoSplicer: Accurate identification of splice junctions using Oxford Nanopore sequencing." Bioinformatics, May 27, 2022. http://dx.doi.org/10.1093/bioinformatics/btac359.

Full text
Abstract:
Abstract Motivation Long read sequencing methods have considerable advantages for characterising RNA isoforms. Oxford nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilising matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. Results We developed “NanoSplicer” to identify splice junctions using raw nanopore signal (squiggles). For each splice junction the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using 1. synthetic mRNAs with known splice junctions 2. biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. Availability and Implementation NanoSplicer is freely available at https://github.com/shimlab/NanoSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.6403849. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
28

Rask, Thomas S., Bent Petersen, Donald S. Chen, Karen P. Day, and Anders Gorm Pedersen. "Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data." BMC Bioinformatics 17, no. 1 (April 22, 2016). http://dx.doi.org/10.1186/s12859-016-1032-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Chandak, Shubham, Kedar Tatwawadi, Srivatsan Sridhar, and Tsachy Weissman. "Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy." Bioinformatics, December 16, 2020. http://dx.doi.org/10.1093/bioinformatics/btaa1017.

Full text
Abstract:
Abstract Motivation Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requirements without adversely impacting performance of downstream applications. Results We explore the use of lossy compression for nanopore raw data using two state-of-the-art lossy time-series compressors, and evaluate the tradeoff between compressed size and basecalling/consensus accuracy. We test several basecallers and consensus tools on a variety of datasets at varying depths of coverage, and conclude that lossy compression can provide 35–50% further reduction in compressed size of raw data over the state-of-the-art lossless compressor with negligible impact on basecalling accuracy (≲0.2% reduction) and consensus accuracy (≲0.002% reduction). In addition, we evaluate the impact of lossy compression on methylation calling accuracy and observe that this impact is minimal for similar reductions in compressed size, although further evaluation with improved benchmark datasets is required for reaching a definite conclusion. The results suggest the possibility of using lossy compression, potentially on the nanopore sequencing device itself, to achieve significant reductions in storage and transmission costs while preserving the accuracy of downstream applications. Availabilityand implementation The code is available at https://github.com/shubhamchandak94/lossy_compression_evaluation. Supplementary information Supplementary data are available at Bioinformatics online.
APA, Harvard, Vancouver, ISO, and other styles
30

Sanderson, Nicholas D., Natalia Kapel, Gillian Rodger, Hermione Webster, Samuel Lipworth, Teresa L. Street, Timothy Peto, Derrick Crook, and Nicole Stoesser. "Comparison of R9.4.1/Kit10 and R10/Kit12 Oxford Nanopore flowcells and chemistries in bacterial genome reconstruction." Microbial Genomics 9, no. 1 (January 10, 2023). http://dx.doi.org/10.1099/mgen.0.000910.

Full text
Abstract:
Complete, accurate, cost-effective, and high-throughput reconstruction of bacterial genomes for large-scale genomic epidemiological studies is currently only possible with hybrid assembly, combining long- (typically using nanopore sequencing) and short-read (Illumina) datasets. Being able to use nanopore-only data would be a significant advance. Oxford Nanopore Technologies (ONT) have recently released a new flowcell (R10.4) and chemistry (Kit12), which reportedly generate per-read accuracies rivalling those of Illumina data. To evaluate this, we sequenced DNA extracts from four commonly studied bacterial pathogens, namely Escherichia coli , Klebsiella pneumoniae , Pseudomonas aeruginosa and Staphylococcus aureus , using Illumina and ONT’s R9.4.1/Kit10, R10.3/Kit12, R10.4/Kit12 flowcells/chemistries. We compared raw read accuracy and assembly accuracy for each modality, considering the impact of different nanopore basecalling models, commonly used assemblers, sequencing depth, and the use of duplex versus simplex reads. ‘Super accuracy’ (sup) basecalled R10.4 reads - in particular duplex reads - have high per-read accuracies and could be used to robustly reconstruct bacterial genomes without the use of Illumina data. However, the per-run yield of duplex reads generated in our hands with standard sequencing protocols was low (typically <10 %), with substantial implications for cost and throughput if relying on nanopore data only to enable bacterial genome reconstruction. In addition, recovery of small plasmids with the best-performing long-read assembler (Flye) was inconsistent. R10.4/Kit12 combined with sup basecalling holds promise as a singular sequencing technology in the reconstruction of commonly studied bacterial genomes, but hybrid assembly (Illumina+R9.4.1 hac) currently remains the highest throughput, most robust, and cost-effective approach to fully reconstruct these bacterial genomes.
APA, Harvard, Vancouver, ISO, and other styles
31

Fang, Li, Qian Liu, Alex Mas Monteys, Pedro Gonzalez-Alegre, Beverly L. Davidson, and Kai Wang. "DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing." Genome Biology 23, no. 1 (April 28, 2022). http://dx.doi.org/10.1186/s13059-022-02670-6.

Full text
Abstract:
AbstractDespite recent improvements in basecalling accuracy, nanopore sequencing still has higher error rates on short-tandem repeats (STRs). Instead of using basecalled reads, we developed DeepRepeat which converts ionic current signals into red-green-blue channels, thus transforming the repeat detection problem into an image recognition problem. DeepRepeat identifies and accurately quantifies telomeric repeats in the CHM13 cell line and achieves higher accuracy in quantifying repeats in long STRs than competing methods. We also evaluate DeepRepeat on genome-wide or candidate region datasets from seven different sources. In summary, DeepRepeat enables accurate quantification of long STRs and complements existing methods relying on basecalled reads.
APA, Harvard, Vancouver, ISO, and other styles
32

Vereecke, Nick, Jade Bokma, Freddy Haesebrouck, Hans Nauwynck, Filip Boyen, Bart Pardon, and Sebastiaan Theuns. "High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing." BMC Bioinformatics 21, no. 1 (November 11, 2020). http://dx.doi.org/10.1186/s12859-020-03856-0.

Full text
Abstract:
Abstract Background Implementation of Third-Generation Sequencing approaches for Whole Genome Sequencing (WGS) all-in-one diagnostics in human and veterinary medicine, requires the rapid and accurate generation of consensus genomes. Over the last years, Oxford Nanopore Technologies (ONT) released various new devices (e.g. the Flongle R9.4.1 flow cell) and bioinformatics tools (e.g. the in 2019-released Bonito basecaller), allowing cheap and user-friendly cost-efficient introduction in various NGS workflows. While single read, overall consensus accuracies, and completeness of genome sequences has been improved dramatically, further improvements are required when working with non-frequently sequenced organisms like Mycoplasma bovis. As an important primary respiratory pathogen in cattle, rapid M. bovis diagnostics is crucial to allow timely and targeted disease control and prevention. Current complete diagnostics (including identification, strain typing, and antimicrobial resistance (AMR) detection) require combined culture-based and molecular approaches, of which the first can take 1–2 weeks. At present, cheap and quick long read all-in-one WGS approaches can only be implemented if increased accuracies and genome completeness can be obtained. Results Here, a taxon-specific custom-trained Bonito v.0.1.3 basecalling model (custom-pg45) was implemented in various WGS assembly bioinformatics pipelines. Using MinION sequencing data, we showed improved consensus accuracies up to Q45.2 and Q46.7 for reference-based and Canu de novo assembled M. bovis genomes, respectively. Furthermore, the custom-pg45 model resulted in mean consensus accuracies of Q45.0 and genome completeness of 94.6% for nine M. bovis field strains. Improvements were also observed for the single-use Flongle sequencer (mean Q36.0 accuracies and 80.3% genome completeness). Conclusions These results implicate that taxon-specific basecalling of MinION and single-use Flongle Nanopore long reads are of great value to be implemented in rapid all-in-one WGS tools as evidenced for Mycoplasma bovis as an example.
APA, Harvard, Vancouver, ISO, and other styles
33

Doroschak, Kathryn, Karen Zhang, Melissa Queen, Aishwarya Mandyam, Karin Strauss, Luis Ceze, and Jeff Nivala. "Rapid and robust assembly and decoding of molecular tags with DNA-based nanopore signatures." Nature Communications 11, no. 1 (November 3, 2020). http://dx.doi.org/10.1038/s41467-020-19151-8.

Full text
Abstract:
Abstract Molecular tagging is an approach to labeling physical objects using DNA or other molecules that can be used when methods such as RFID tags and QR codes are unsuitable. No molecular tagging method exists that is inexpensive, fast and reliable to decode, and usable in minimal resource environments to create or read tags. To address this, we present Porcupine, an end-user molecular tagging system featuring DNA-based tags readable within seconds using a portable nanopore device. Porcupine’s digital bits are represented by the presence or absence of distinct DNA strands, called molecular bits (molbits). We classify molbits directly from raw nanopore signal, avoiding basecalling. To extend shelf life, decrease readout time, and make tags robust to environmental contamination, molbits are prepared for readout during tag assembly and can be stabilized by dehydration. The result is an extensible, real-time, high accuracy tagging system that includes an approach to developing highly separable barcodes.
APA, Harvard, Vancouver, ISO, and other styles
34

Bonenfant, Quentin, Laurent Noé, and Hélène Touzet. "Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming." Bioinformatics Advances, November 21, 2022. http://dx.doi.org/10.1093/bioadv/vbac085.

Full text
Abstract:
Abstract Motivation Oxford Nanopore Technologies (ONT) sequencing has become very popular over the past few years and offers a cost-effective solution for many genomic and transcriptomic projects. One distinctive feature of the technology is that the protocol includes ligation of adapters to both ends of each fragment. Those adapters should then be removed before downstream analyses, either during the basecalling step or by explicit trimming. This basic task may be tricky when the definition of the adapter sequence is not well-documented. Results We have developed a new method to scan a set of ONT reads to see if it contains adapters, without any prior knowledge on the sequence of the potential adapters, and then trim out those adapters. The algorithm is based on approximate k-mers and is able to discover adapter sequences based on their frequency alone. The method was successfully tested on a variety of ONT datasets with different flowcells, sequencing kits and basecallers. Availability The resulting software, named Porechop_ABI, is open-source and is available at https://github.com/bonsai-team/Porechop_ABI. Supplementary information Supplementary data are available at Bioinformatics advances online.
APA, Harvard, Vancouver, ISO, and other styles
35

De Vivo, Mattia, Hsin-Han Lee, Yu-Sin Huang, Niklas Dreyer, Chia-Ling Fong, Felipe Monteiro Gomes de Mattos, Dharmesh Jain, et al. "Utilisation of Oxford Nanopore sequencing to generate six complete gastropod mitochondrial genomes as part of a biodiversity curriculum." Scientific Reports 12, no. 1 (June 15, 2022). http://dx.doi.org/10.1038/s41598-022-14121-0.

Full text
Abstract:
AbstractHigh-throughput sequencing has enabled genome skimming approaches to produce complete mitochondrial genomes (mitogenomes) for species identification and phylogenomics purposes. In particular, the portable sequencing device from Oxford Nanopore Technologies (ONT) has the potential to facilitate hands-on training from sampling to sequencing and interpretation of mitogenomes. In this study, we present the results from sampling and sequencing of six gastropod mitogenomes (Aplysia argus, Cellana orientalis, Cellana toreuma, Conus ebraeus, Conus miles and Tylothais aculeata) from a graduate level biodiversity course. The students were able to produce mitogenomes from sampling to annotation using existing protocols and programs. Approximately 4 Gb of sequence was produced from 16 Flongle and one MinION flow cells, averaging 235 Mb and N50 = 4.4 kb per flow cell. Five of the six 14.1–18 kb mitogenomes were circlised containing all 13 core protein coding genes. Additional Illumina sequencing revealed that the ONT assemblies spanned over highly AT rich sequences in the control region that were otherwise missing in Illumina-assembled mitogenomes, but still contained a base error of one every 70.8–346.7 bp under the fast mode basecalling with the majority occurring at homopolymer regions. Our findings suggest that the portable MinION device can be used to rapidly produce low-cost mitogenomes onsite and tailored to genomics-based training in biodiversity research.
APA, Harvard, Vancouver, ISO, and other styles
36

Goldsmith, Chloe, Jesús Rafael Rodríguez-Aguilera, Ines El-Rifai, Adrien Jarretier-Yuste, Valérie Hervieu, Olivier Raineteau, Pierre Saintigny, et al. "Low biological fluctuation of mitochondrial CpG and non-CpG methylation at the single-molecule level." Scientific Reports 11, no. 1 (April 13, 2021). http://dx.doi.org/10.1038/s41598-021-87457-8.

Full text
Abstract:
AbstractMammalian cytosine DNA methylation (5mC) is associated with the integrity of the genome and the transcriptional status of nuclear DNA. Due to technical limitations, it has been less clear if mitochondrial DNA (mtDNA) is methylated and whether 5mC has a regulatory role in this context. Here, we used bisulfite-independent single-molecule sequencing of native human and mouse DNA to study mitochondrial 5mC across different biological conditions. We first validated the ability of long-read nanopore sequencing to detect 5mC in CpG (5mCpG) and non-CpG (5mCpH) context in nuclear DNA at expected genomic locations (i.e. promoters, gene bodies, enhancers, and cell type-specific transcription factor binding sites). Next, using high coverage nanopore sequencing we found low levels of mtDNA CpG and CpH methylation (with several exceptions) and little variation across biological processes: differentiation, oxidative stress, and cancer. 5mCpG and 5mCpH were overall higher in tissues compared to cell lines, with small additional variation between cell lines of different origin. Despite general low levels, global and single-base differences were found in cancer tissues compared to their adjacent counterparts, in particular for 5mCpG. In conclusion, nanopore sequencing is a useful tool for the detection of modified DNA bases on mitochondria that avoid the biases introduced by bisulfite and PCR amplification. Enhanced nanopore basecalling models will provide further resolution on the small size effects detected here, as well as rule out the presence of other DNA modifications such as oxidized forms of 5mC.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Qian, Yu Hu, Andres Stucky, Li Fang, Jiang F. Zhong, and Kai Wang. "LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing." BMC Genomics 21, S11 (December 2020). http://dx.doi.org/10.1186/s12864-020-07207-4.

Full text
Abstract:
Abstract Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.
APA, Harvard, Vancouver, ISO, and other styles
38

Murigneux, Valentine, Leah W. Roberts, Brian M. Forde, Minh-Duy Phan, Nguyen Thi Khanh Nhu, Adam D. Irwin, Patrick N. A. Harris, et al. "MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction." BMC Genomics 22, no. 1 (June 25, 2021). http://dx.doi.org/10.1186/s12864-021-07767-z.

Full text
Abstract:
Abstract Background Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing. Results We evaluated the performance of several popular tools used during genome reconstruction, including base-calling, filtering, assembly, and polishing. We also assessed overall genome accuracy using ONT both natively and with Illumina. All steps were validated using the high-quality complete reference genome for the Escherichia coli sequence type (ST)131 strain EC958. Software chosen at each stage were incorporated into our final pipeline, MicroPIPE. Further validation of MicroPIPE was carried out using 11 additional ST131 E. coli isolates, which demonstrated that complete circularised chromosomes and plasmids could be achieved without manual intervention. Twelve publicly available Gram-negative and Gram-positive bacterial genomes (with available raw ONT data and matched complete genomes) were also assembled using MicroPIPE. We found that revised basecalling and updated assembly of the majority of these genomes resulted in improved accuracy compared to the current publicly available complete genomes. Conclusions MicroPIPE is built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development. Overall, MicroPIPE provides an easy-access, end-to-end solution for attaining high-quality bacterial genomes. MicroPIPE is available at https://github.com/BeatsonLab-MicrobialGenomics/micropipe.
APA, Harvard, Vancouver, ISO, and other styles
39

Reddy, Shishir, Ling-Hong Hung, Olga Sala-Torra, Jerald P. Radich, Cecilia CS Yeung, and Ka Yee Yeung. "A graphical, interactive and GPU-enabled workflow to process long-read sequencing data." BMC Genomics 22, no. 1 (August 23, 2021). http://dx.doi.org/10.1186/s12864-021-07927-1.

Full text
Abstract:
Abstract Background Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Results We present a graphical cloud-enabled workflow for fast, interactive analysis of nanopore sequencing data using GPUs. Users customize parameters, monitor execution and visualize results through an accessible graphical interface. The workflow and its components are completely containerized to ensure reproducibility and facilitate installation of the GPU-enabled software. We also provide an Amazon Machine Image (AMI) with all software and drivers pre-installed for GPU computing on the cloud. Most importantly, we demonstrate the potential of applying our software tools to reduce the turnaround time of cancer diagnostics by generating blood cancer (NB4, K562, ME1, 238 MV4;11) cell line Nanopore data using the Flongle adapter. We observe a 29x speedup and a 93x reduction in costs for the rate-limiting basecalling step in the analysis of blood cancer cell line data. Conclusions Our interactive and efficient software tools will make analyses of Nanopore data using GPU and cloud computing accessible to biomedical and clinical scientists, thus facilitating the adoption of cost effective, fast, portable and real-time long-read sequencing.
APA, Harvard, Vancouver, ISO, and other styles
40

Boostrom, Ian, Edward A. R. Portal, Owen B. Spiller, Timothy R. Walsh, and Kirsty Sands. "Comparing Long-Read Assemblers to Explore the Potential of a Sustainable Low-Cost, Low-Infrastructure Approach to Sequence Antimicrobial Resistant Bacteria With Oxford Nanopore Sequencing." Frontiers in Microbiology 13 (March 3, 2022). http://dx.doi.org/10.3389/fmicb.2022.796465.

Full text
Abstract:
Long-read sequencing (LRS) can resolve repetitive regions, a limitation of short read (SR) data. Reduced cost and instrument size has led to a steady increase in LRS across diagnostics and research. Here, we re-basecalled FAST5 data sequenced between 2018 and 2021 and analyzed the data in relation to gDNA across a large dataset (n = 200) spanning a wide GC content (25–67%). We examined whether re-basecalled data would improve the hybrid assembly, and, for a smaller cohort, compared long read (LR) assemblies in the context of antimicrobial resistance (AMR) genes and mobile genetic elements. We included a cost analysis when comparing SR and LR instruments. We compared the R9 and R10 chemistries and reported not only a larger yield but increased read quality with R9 flow cells. There were often discrepancies with ARG presence/absence and/or variant detection in LR assemblies. Flye-based assemblies were generally efficient at detecting the presence of ARG on both the chromosome and plasmids. Raven performed more quickly but inconsistently recovered small plasmids, notably a ∼15-kb Col-like plasmid harboring blaKPC. Canu assemblies were the most fragmented, with genome sizes larger than expected. LR assemblies failed to consistently determine multiple copies of the same ARG as identified by the Unicycler reference. Even with improvements to ONT chemistry and basecalling, long-read assemblies can lead to misinterpretation of data. If LR data are currently being relied upon, it is necessary to perform multiple assemblies, although this is resource (computing) intensive and not yet readily available/useable.
APA, Harvard, Vancouver, ISO, and other styles
41

Cuscó, Anna, Daniel Pérez, Joaquim Viñes, Norma Fàbregas, and Olga Francino. "Long-read metagenomics retrieves complete single-contig bacterial genomes from canine feces." BMC Genomics 22, no. 1 (May 6, 2021). http://dx.doi.org/10.1186/s12864-021-07607-0.

Full text
Abstract:
Abstract Background Long-read sequencing in metagenomics facilitates the assembly of complete genomes out of complex microbial communities. These genomes include essential biologic information such as the ribosomal genes or the mobile genetic elements, which are usually missed with short-reads. We applied long-read metagenomics with Nanopore sequencing to retrieve high-quality metagenome-assembled genomes (HQ MAGs) from a dog fecal sample. Results We used nanopore long-read metagenomics and frameshift aware correction on a canine fecal sample and retrieved eight single-contig HQ MAGs, which were > 90% complete with < 5% contamination, and contained most ribosomal genes and tRNAs. At the technical level, we demonstrated that a high-molecular-weight DNA extraction improved the metagenomics assembly contiguity, the recovery of the rRNA operons, and the retrieval of longer and circular contigs that are potential HQ MAGs. These HQ MAGs corresponded to Succinivibrio, Sutterella, Prevotellamassilia, Phascolarctobacterium, Catenibacterium, Blautia, and Enterococcus genera. Linking our results to previous gastrointestinal microbiome reports (metagenome or 16S rRNA-based), we found that some bacterial species on the gastrointestinal tract seem to be more canid-specific –Succinivibrio, Prevotellamassilia, Phascolarctobacterium, Blautia_A sp900541345–, whereas others are more broadly distributed among animal and human microbiomes –Sutterella, Catenibacterium, Enterococcus, and Blautia sp003287895. Sutterella HQ MAG is potentially the first reported genome assembly for Sutterella stercoricanis, as assigned by 16S rRNA gene similarity. Moreover, we show that long reads are essential to detect mobilome functions, usually missed in short-read MAGs. Conclusions We recovered eight single-contig HQ MAGs from canine feces of a healthy dog with nanopore long-reads. We also retrieved relevant biological insights from these specific bacterial species previously missed in public databases, such as complete ribosomal operons and mobilome functions. The high-molecular-weight DNA extraction improved the assembly’s contiguity, whereas the high-accuracy basecalling, the raw read error correction, the assembly polishing, and the frameshift correction reduced the insertion and deletion errors. Both experimental and analytical steps ensured the retrieval of complete bacterial genomes.
APA, Harvard, Vancouver, ISO, and other styles
42

Galanti, Lior, Dennis Shasha, and Kristin C. Gunsalus. "Pheniqs 2.0: accurate, high-performance Bayesian decoding and confidence estimation for combinatorial barcode indexing." BMC Bioinformatics 22, no. 1 (July 2, 2021). http://dx.doi.org/10.1186/s12859-021-04267-5.

Full text
Abstract:
Abstract Background Systems biology increasingly relies on deep sequencing with combinatorial index tags to associate biological sequences with their sample, cell, or molecule of origin. Accurate data interpretation depends on the ability to classify sequences based on correct decoding of these combinatorial barcodes. The probability of correct decoding is influenced by both sequence quality and the number and arrangement of barcodes. The rising complexity of experimental designs calls for a probability model that accounts for both sequencing errors and random noise, generalizes to multiple combinatorial tags, and can handle any barcoding scheme. The needs for reproducibility and community benchmark standards demand a peer-reviewed tool that preserves decoding quality scores and provides tunable control over classification confidence that balances precision and recall. Moreover, continuous improvements in sequencing throughput require a fast, parallelized and scalable implementation. Results and discussion We developed a flexible, robustly engineered software that performs probabilistic decoding and supports arbitrarily complex barcoding designs. Pheniqs computes the full posterior decoding error probability of observed barcodes by consulting basecalling quality scores and prior distributions, and reports sequences and confidence scores in Sequence Alignment/Map (SAM) fields. The product of posteriors for multiple independent barcodes provides an overall confidence score for each read. Pheniqs achieves greater accuracy than minimum edit distance or simple maximum likelihood estimation, and it scales linearly with core count to enable the classification of > 11 billion reads in 1 h 15 m using < 50 megabytes of memory. Pheniqs has been in production use for seven years in our genomics core facility. Conclusion We introduce a computationally efficient software that implements both probabilistic and minimum distance decoders and show that decoding barcodes using posterior probabilities is more accurate than available methods. Pheniqs allows fine-tuning of decoding sensitivity using intuitive confidence thresholds and is extensible with alternative decoders and new error models. Any arbitrary arrangement of barcodes is easily configured, enabling computation of combinatorial confidence scores for any barcoding strategy. An optimized multithreaded implementation assures that Pheniqs is faster and scales better with complex barcode sets than existing tools. Support for POSIX streams and multiple sequencing formats enables easy integration with automated analysis pipelines.
APA, Harvard, Vancouver, ISO, and other styles
43

Forghani, Fereidoun, Shaoting Li, Shaokang Zhang, David A. Mann, Xiangyu Deng, Henk C. den Bakker, and Francisco Diez-Gonzalez. "Salmonella enterica and Escherichia coli in Wheat Flour: Detection and Serotyping by a Quasimetagenomic Approach Assisted by Magnetic Capture, Multiple-Displacement Amplification, and Real-Time Sequencing." Applied and Environmental Microbiology 86, no. 13 (May 1, 2020). http://dx.doi.org/10.1128/aem.00097-20.

Full text
Abstract:
ABSTRACT Food safety is a new area for novel applications of metagenomics analysis, which not only can detect and subtype foodborne pathogens in a single workflow but may also produce additional information with in-depth analysis capabilities. In this study, we applied a quasimetagenomic approach by combining short-term enrichment, immunomagnetic separation (IMS), multiple-displacement amplification (MDA), and nanopore sequencing real-time analysis for simultaneous detection of Salmonella and Escherichia coli in wheat flour. Tryptic soy broth was selected for the 12-h enrichment of samples at 42°C. Enrichments were subjected to IMS using beads capable of capturing both Salmonella and E. coli. MDA was performed on harvested beads, and amplified DNA fragments were subjected to DNA library preparation for sequencing. Sequencing was performed on a portable device with real-time basecalling adaptability, and resulting sequences were subjected to two parallel pipelines for further analysis. After 1 h of sequencing, the quasimetagenomic approach could detect all targets inoculated at approximately 1 CFU/g flour to the species level. Discriminatory power was determined by simultaneous detection of dual inoculums of Salmonella and E. coli, absence of detection in control samples, and consistency in microbial flora composition of the same flour samples over several rounds of experiments. The total turnaround time for detection was approximately 20 h. Longer sequencing for up to 15 h enabled serotyping for many of the samples with more than 99% genome coverage, which could be subjected to other appropriate genetic analysis pipelines in less than a total of 36 h. IMPORTANCE Enterohemorrhagic Escherichia coli (EHEC) and Salmonella are of serious concern in low-moisture foods, including wheat flour and its related products, causing illnesses, outbreaks, and recalls. The development of advanced detection methods based on molecular principles of analysis is essential to incorporate into interventions intended to reduce the risk from these pathogens. In this work, a quasimetagenomic method based on real-time sequencing analysis and assisted by magnetic capture and DNA amplification was developed. This protocol is capable of detecting multiple Salmonella and/or E. coli organisms in the sample within less than a day, and it can also generate sufficient whole-genome sequences of the target organisms suitable for subsequent bioinformatics analysis. Multiplex detection and identification were accomplished in less than 20 h and additional whole-genome analyses of different nature were attained within 36 h, in contrast to the several days required in previous sequencing pipelines.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography