Siga este enlace para ver otros tipos de publicaciones sobre el tema: Short read and long read sequencing.

Artículos de revistas sobre el tema "Short read and long read sequencing"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Short read and long read sequencing".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Shumate, Alaina, Brandon Wong, Geo Pertea y Mihaela Pertea. "Improved transcriptome assembly using a hybrid of long and short reads with StringTie". PLOS Computational Biology 18, n.º 6 (1 de junio de 2022): e1009730. http://dx.doi.org/10.1371/journal.pcbi.1009730.

Texto completo
Resumen
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Stapleton, James A., Jeongwoon Kim, John P. Hamilton, Ming Wu, Luiz C. Irber, Rohan Maddamsetti, Bryan Briney et al. "Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing". PLOS ONE 11, n.º 1 (20 de enero de 2016): e0147229. http://dx.doi.org/10.1371/journal.pone.0147229.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Nguyen, Son Hoang, Minh Duc Cao y Lachlan J. M. Coin. "Real-time resolution of short-read assembly graph using ONT long reads". PLOS Computational Biology 17, n.º 1 (20 de enero de 2021): e1008586. http://dx.doi.org/10.1371/journal.pcbi.1008586.

Texto completo
Resumen
A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph, a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at https://github.com/hsnguyen/assembly.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Greenman, Noah, Sayf Al-Deen Hassouneh, Latifa S. Abdelli, Catherine Johnston y Taj Azarian. "Improving Bacterial Metagenomic Research through Long-Read Sequencing". Microorganisms 12, n.º 5 (4 de mayo de 2024): 935. http://dx.doi.org/10.3390/microorganisms12050935.

Texto completo
Resumen
Metagenomic sequencing analysis is central to investigating microbial communities in clinical and environmental studies. Short-read sequencing remains the primary approach for metagenomic research; however, long-read sequencing may offer advantages of improved metagenomic assembly and resolved taxonomic identification. To compare the relative performance for metagenomic studies, we simulated short- and long-read datasets using increasingly complex metagenomes comprising 10, 20, and 50 microbial taxa. Additionally, we used an empirical dataset of paired short- and long-read data generated from mouse fecal pellets to assess real-world performance. We compared metagenomic assembly quality, taxonomic classification, and metagenome-assembled genome (MAG) recovery rates. We show that long-read sequencing data significantly improve taxonomic classification and assembly quality. Metagenomic assemblies using simulated long reads were more complete and more contiguous with higher rates of MAG recovery. This resulted in more precise taxonomic classifications. Principal component analysis of empirical data demonstrated that sequencing technology affects compositional results as samples clustered by sequence type, not sample type. Overall, we highlight strengths of long-read metagenomic sequencing for microbiome studies, including improving the accuracy of classification and relative abundance estimates. These results will aid researchers when considering which sequencing approaches to use for metagenomic projects.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Craddock, Hillary A., Yair Motro, Bar Zilberman, Boris Khalfin, Svetlana Bardenstein y Jacob Moran-Gilad. "Long-Read Sequencing and Hybrid Assembly for Genomic Analysis of Clinical Brucella melitensis Isolates". Microorganisms 10, n.º 3 (14 de marzo de 2022): 619. http://dx.doi.org/10.3390/microorganisms10030619.

Texto completo
Resumen
Brucella melitensis is a key etiological agent of brucellosis and has been increasingly subject to characterization using sequencing methodologies. This study aimed to investigate and compare short-read, long-read, and hybrid assemblies of B. melitensis. Eighteen B. melitensis isolates from Southern Israel were sequenced using Illumina and the Oxford Nanopore (ONP) MinION, and hybrid assemblies were generated with ONP long reads scaffolded on Illumina short reads. Short reads were assembled with INNUca with SPADes, long reads and hybrid with dragonflye. Abricate with the virulence factor database (VFDB) and in silico PCR (for the genes BetB, BPE275, BSPB, manA, mviN, omp19, perA, PrpA, VceC, and ureI) were used for identifying virulence genes, and a total of 61 virulence genes were identified in short-read, long-read, and hybrid assemblies of all 18 isolates. The phylogenetic analysis using long-read assemblies revealed several inconsistencies in cluster assignment as compared to using hybrid and short-read assemblies. Overall, hybrid assembly provided the most comprehensive data, and stand-alone short-read sequencing provided comparable data to stand-alone long-read sequencing regarding virulence genes. For genomic epidemiology studies, stand-alone ONP sequencing may require further refinement in order to be useful in endemic settings.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Botton, Mariana R., Yao Yang, Erick R. Scott, Robert J. Desnick y Stuart A. Scott. "Phased Haplotype Resolution of the SLC6A4 Promoter Using Long-Read Single Molecule Real-Time (SMRT) Sequencing". Genes 11, n.º 11 (12 de noviembre de 2020): 1333. http://dx.doi.org/10.3390/genes11111333.

Texto completo
Resumen
The SLC6A4 gene has been implicated in psychiatric disorder susceptibility and antidepressant response variability. The SLC6A4 promoter is defined by a variable number of homologous 20–24 bp repeats (5-HTTLPR), and long (L) and short (S) alleles are associated with higher and lower expression, respectively. However, this insertion/deletion variant is most informative when considered as a haplotype with the rs25531 and rs25532 variants. Therefore, we developed a long-read single molecule real-time (SMRT) sequencing method to interrogate the SLC6A4 promoter region. A total of 120 samples were subjected to SLC6A4 long-read SMRT sequencing, primarily selected based on available short-read sequencing data. Short-read genome sequencing from the 1000 Genomes (1KG) Project (~5X) and the Genetic Testing Reference Material Coordination Program (~45X), as well as high-depth short-read capture-based sequencing (~330X), could not identify the 5-HTTLPR short (S) allele, nor could short-read sequencing phase any identified variants. In contrast, long-read SMRT sequencing unambiguously identified the 5-HTTLPR short (S) allele (frequency of 0.467) and phased SLC6A4 promoter haplotypes. Additionally, discordant rs25531 genotypes were reviewed and determined to be short-read errors. Taken together, long-read SMRT sequencing is an innovative and robust method for phased resolution of the SLC6A4 promoter, which could enable more accurate pharmacogenetic testing for both research and clinical applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Volden, Roger, Theron Palmer, Ashley Byrne, Charles Cole, Robert J. Schmitz, Richard E. Green y Christopher Vollmers. "Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly multiplexed full-length single-cell cDNA". Proceedings of the National Academy of Sciences 115, n.º 39 (10 de septiembre de 2018): 9726–31. http://dx.doi.org/10.1073/pnas.1806447115.

Texto completo
Resumen
High-throughput short-read sequencing has revolutionized how transcriptomes are quantified and annotated. However, while Illumina short-read sequencers can be used to analyze entire transcriptomes down to the level of individual splicing events with great accuracy, they fall short of analyzing how these individual events are combined into complete RNA transcript isoforms. Because of this shortfall, long-distance information is required to complement short-read sequencing to analyze transcriptomes on the level of full-length RNA transcript isoforms. While long-read sequencing technology can provide this long-distance information, there are issues with both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) long-read sequencing technologies that prevent their widespread adoption. Briefly, PacBio sequencers produce low numbers of reads with high accuracy, while ONT sequencers produce higher numbers of reads with lower accuracy. Here, we introduce and validate a long-read ONT-based sequencing method. At the same cost, our Rolling Circle Amplification to Concatemeric Consensus (R2C2) method generates more accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method. These reads can then be used to generate isoform-level transcriptomes for both genome annotation and differential expression analysis in bulk or single-cell samples.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Iyer, Shruti V., Sara Goodwin y William Richard McCombie. "Leveraging the power of long reads for targeted sequencing". Genome Research 34, n.º 11 (noviembre de 2024): 1701–18. http://dx.doi.org/10.1101/gr.279168.124.

Texto completo
Resumen
Long-read sequencing technologies have improved the contiguity and, as a result, the quality of genome assemblies by generating reads long enough to span and resolve complex or repetitive regions of the genome. Several groups have shown the power of long reads in detecting thousands of genomic and epigenomic features that were previously missed by short-read sequencing approaches. While these studies demonstrate how long reads can help resolve repetitive and complex regions of the genome, they also highlight the throughput and coverage requirements needed to accurately resolve variant alleles across large populations using these platforms. At the time of this review, whole-genome long-read sequencing is more expensive than short-read sequencing on the highest throughput short-read instruments; thus, achieving sufficient coverage to detect low-frequency variants (such as somatic variation) in heterogenous samples remains challenging. Targeted sequencing, on the other hand, provides the depth necessary to detect these low-frequency variants in heterogeneous populations. Here, we review currently used and recently developed targeted sequencing strategies that leverage existing long-read technologies to increase the resolution with which we can look at nucleic acids in a variety of biological contexts.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Wick, Ryan R., Louise M. Judd y Kathryn E. Holt. "Assembling the perfect bacterial genome using Oxford Nanopore and Illumina sequencing". PLOS Computational Biology 19, n.º 3 (2 de marzo de 2023): e1010905. http://dx.doi.org/10.1371/journal.pcbi.1010905.

Texto completo
Resumen
A perfect bacterial genome assembly is one where the assembled sequence is an exact match for the organism’s genome—each replicon sequence is complete and contains no errors. While this has been difficult to achieve in the past, improvements in long-read sequencing, assemblers, and polishers have brought perfect assemblies within reach. Here, we describe our recommended approach for assembling a bacterial genome to perfection using a combination of Oxford Nanopore Technologies long reads and Illumina short reads: Trycycler long-read assembly, Medaka long-read polishing, Polypolish short-read polishing, followed by other short-read polishing tools and manual curation. We also discuss potential pitfalls one might encounter when assembling challenging genomes, and we provide an online tutorial with sample data (github.com/rrwick/perfect-bacterial-genome-tutorial).
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Eisenstein, Michael. "Startups use short-read data to expand long-read sequencing market". Nature Biotechnology 33, n.º 5 (mayo de 2015): 433–35. http://dx.doi.org/10.1038/nbt0515-433.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Page, Andrew J. y Jacqueline A. Keane. "Rapid multi-locus sequence typing direct from uncorrected long reads using Krocus". PeerJ 6 (31 de julio de 2018): e5233. http://dx.doi.org/10.7717/peerj.5233.

Texto completo
Resumen
Genome sequencing is rapidly being adopted in reference labs and hospitals for bacterial outbreak investigation and diagnostics where time is critical. Seven gene multi-locus sequence typing is a standard tool for broadly classifying samples into sequence types (STs), allowing, in many cases, to rule a sample out of an outbreak, or allowing for general characteristics about a bacterial strain to be inferred. Long-read sequencing technologies, such as from Oxford Nanopore, can produce read data within minutes of an experiment starting, unlike short-read sequencing technologies which require many hours/days. However, the error rates of raw uncorrected long read data are very high. We present Krocus which can predict a ST directly from uncorrected long reads, and which was designed to consume read data as it is produced, providing results in minutes. It is the only tool which can do this from uncorrected long reads. We tested Krocus on over 700 isolates sequenced using long-read sequencing technologies from Pacific Biosciences and Oxford Nanopore. It provides STs for isolates on average within 90 s, with a sensitivity of 94% and specificity of 97% on real sample data, directly from uncorrected raw sequence reads. The software is written in Python and is available under the open source license GNU GPL version 3.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Prodanov, Timofey y Vikas Bansal. "Sensitive alignment using paralogous sequence variants improves long-read mapping and variant calling in segmental duplications". Nucleic Acids Research 48, n.º 19 (9 de octubre de 2020): e114-e114. http://dx.doi.org/10.1093/nar/gkaa829.

Texto completo
Resumen
Abstract The ability to characterize repetitive regions of the human genome is limited by the read lengths of short-read sequencing technologies. Although long-read sequencing technologies such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies can potentially overcome this limitation, long segmental duplications with high sequence identity pose challenges for long-read mapping. We describe a probabilistic method, DuploMap, designed to improve the accuracy of long-read mapping in segmental duplications. It analyzes reads mapped to segmental duplications using existing long-read aligners and leverages paralogous sequence variants (PSVs)—sequence differences between paralogous sequences—to distinguish between multiple alignment locations. On simulated datasets, DuploMap increased the percentage of correctly mapped reads with high confidence for multiple long-read aligners including Minimap2 (74.3–90.6%) and BLASR (82.9–90.7%) while maintaining high precision. Across multiple whole-genome long-read datasets, DuploMap aligned an additional 8–21% of the reads in segmental duplications with high confidence relative to Minimap2. Using DuploMap-aligned PacBio circular consensus sequencing reads, an additional 8.9 Mb of DNA sequence was mappable, variant calling achieved a higher F1 score and 14 713 additional variants supported by linked-read data were identified. Finally, we demonstrate that a significant fraction of PSVs in segmental duplications overlaps with variants and adversely impacts short-read variant calling.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Kumar, Venkatesh, Thomas Vollbrecht, Mark Chernyshev, Sanjay Mohan, Brian Hanst, Nicholas Bavafa, Antonia Lorenzo et al. "Long-read amplicon denoising". Nucleic Acids Research 47, n.º 18 (16 de agosto de 2019): e104-e104. http://dx.doi.org/10.1093/nar/gkz657.

Texto completo
Resumen
Abstract Long-read next-generation amplicon sequencing shows promise for studying complete genes or genomes from complex and diverse populations. Current long-read sequencing technologies have challenging error profiles, hindering data processing and incorporation into downstream analyses. Here we consider the problem of how to reconstruct, free of sequencing error, the true sequence variants and their associated frequencies from PacBio reads. Called ‘amplicon denoising’, this problem has been extensively studied for short-read sequencing technologies, but current solutions do not always successfully generalize to long reads with high indel error rates. We introduce two methods: one that runs nearly instantly and is very accurate for medium length reads and high template coverage, and another, slower method that is more robust when reads are very long or coverage is lower. On two Mock Virus Community datasets with ground truth, each sequenced on a different PacBio instrument, and on a number of simulated datasets, we compare our two approaches to each other and to existing algorithms. We outperform all tested methods in accuracy, with competitive run times even for our slower method, successfully discriminating templates that differ by a just single nucleotide. Julia implementations of Fast Amplicon Denoising (FAD) and Robust Amplicon Denoising (RAD), and a webserver interface, are freely available.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Gao, Yahui, Li Ma y George E. Liu. "Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods". Genes 13, n.º 5 (6 de mayo de 2022): 828. http://dx.doi.org/10.3390/genes13050828.

Texto completo
Resumen
Structural variations (SVs), as a great source of genetic variation, are widely distributed in the genome. SVs involve longer genomic sequences and potentially have stronger effects than SNPs, but they are not well captured by short-read sequencing owing to their size and relevance to repeats. Improved characterization of SVs can provide more advanced insight into complex traits. With the availability of long-read sequencing, it has become feasible to uncover the full range of SVs. Here, we sequenced one cattle individual using 10× Genomics (10 × G) linked read, Pacific Biosciences (PacBio) continuous long reads (CLR) and circular consensus sequencing (CCS), as well as Oxford Nanopore Technologies (ONT) PromethION. We evaluated the ability of various methods for SV detection. We identified 21,164 SVs, which amount to 186 Mb covering 7.07% of the whole genome. The number of SVs inferred from long-read-based inferences was greater than that from short reads. The PacBio CLR identified the most of large SVs and covered the most genomes. SVs called with PacBio CCS and ONT data showed high uniformity. The one with the most overlap with the results obtained by short-read data was PB CCS. Together, we found that long reads outperformed short reads in terms of SV detections.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Zhang, Pengfei, Dike Jiang, Yin Wang, Xueping Yao, Yan Luo y Zexiao Yang. "Comparison of De Novo Assembly Strategies for Bacterial Genomes". International Journal of Molecular Sciences 22, n.º 14 (17 de julio de 2021): 7668. http://dx.doi.org/10.3390/ijms22147668.

Texto completo
Resumen
(1) Background: Short-read sequencing allows for the rapid and accurate analysis of the whole bacterial genome but does not usually enable complete genome assembly. Long-read sequencing greatly assists with the resolution of complex bacterial genomes, particularly when combined with short-read Illumina data. However, it is not clear how different assembly strategies affect genomic accuracy, completeness, and protein prediction. (2) Methods: we compare different assembly strategies for Haemophilus parasuis, which causes Glässer’s disease, characterized by fibrinous polyserositis and arthritis, in swine by using Illumina sequencing and long reads from the sequencing platforms of either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio). (3) Results: Assembly with either PacBio or ONT reads, followed by polishing with Illumina reads, facilitated high-quality genome reconstruction and was superior to the long-read-only assembly and hybrid-assembly strategies when evaluated in terms of accuracy and completeness. An equally excellent method was correction with Homopolish after the ONT-only assembly, which had the advantage of avoiding hybrid sequencing with Illumina. Furthermore, by aligning transcripts to assembled genomes and their predicted CDSs, the sequencing errors of the ONT assembly were mainly indels that were generated when homopolymer regions were sequenced, thus critically affecting protein prediction. Polishing can fill indels and correct mistakes. (4) Conclusions: The assembly of bacterial genomes can be directly achieved by using long-read sequencing techniques. To maximize assembly accuracy, it is essential to polish the assembly with homologous sequences of related genomes or sequencing data from short-read technology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Yu, Xiaoling, Wenqian Jiang, Xinhui Huang, Jun Lin, Hanhui Ye y Baorong Liu. "rRNA Analysis Based on Long-Read High-Throughput Sequencing Reveals a More Accurate Diagnostic for the Bacterial Infection of Ascites". BioMed Research International 2021 (17 de noviembre de 2021): 1–8. http://dx.doi.org/10.1155/2021/6287280.

Texto completo
Resumen
Traditional pathogenic diagnosis presents defects such as a low positivity rate, inability to identify uncultured microorganisms, and time-consuming nature. Clinical metagenomics next-generation sequencing can be used to detect any pathogen, compensating for the shortcomings of traditional pathogenic diagnosis. We report third-generation long-read sequencing results and second-generation short-read sequencing results for ascitic fluid from a patient with liver ascites and compared the two types of sequencing results with the results of traditional clinical microbial culture. The distribution of pathogenic microbial species revealed by the two types of sequencing results was quite different, and the third-generation sequencing results were consistent with the results of traditional microbial culture, which can effectively guide subsequent treatment. Short reads, the lack of amplification, and enrichment to amplify signals from trace pathogens, and host background noise may be the reasons for the high error in the second-generation short-read sequencing results. Therefore, we propose that long-read-based rRNA analysis technology is superior to the short-read shotgun-based metagenomics method in the identification of pathogenic bacteria.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Cechova, Monika. "Probably Correct: Rescuing Repeats with Short and Long Reads". Genes 12, n.º 1 (31 de diciembre de 2020): 48. http://dx.doi.org/10.3390/genes12010048.

Texto completo
Resumen
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome—estimated 50–69%—is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from “telomere to telomere”. Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Kainth, Amoldeep S., Gabriela A. Haddad, Johnathon M. Hall y Alexander J. Ruthenburg. "Merging short and stranded long reads improves transcript assembly". PLOS Computational Biology 19, n.º 10 (26 de octubre de 2023): e1011576. http://dx.doi.org/10.1371/journal.pcbi.1011576.

Texto completo
Resumen
Long-read RNA sequencing has arisen as a counterpart to short-read sequencing, with the potential to capture full-length isoforms, albeit at the cost of lower depth. Yet this potential is not fully realized due to inherent limitations of current long-read assembly methods and underdeveloped approaches to integrate short-read data. Here, we critically compare the existing methods and develop a new integrative approach to characterize a particularly challenging pool of low-abundance long noncoding RNA (lncRNA) transcripts from short- and long-read sequencing in two distinct cell lines. Our analysis reveals severe limitations in each of the sequencing platforms. For short-read assemblies, coverage declines at transcript termini resulting in ambiguous ends, and uneven low coverage results in segmentation of a single transcript into multiple transcripts. Conversely, long-read sequencing libraries lack depth and strand-of-origin information in cDNA-based methods, culminating in erroneous assembly and quantitation of transcripts. We also discover a cDNA synthesis artifact in long-read datasets that markedly impacts the identity and quantitation of assembled transcripts. Towards remediating these problems, we develop a computational pipeline to “strand” long-read cDNA libraries that rectifies inaccurate mapping and assembly of long-read transcripts. Leveraging the strengths of each platform and our computational stranding, we also present and benchmark a hybrid assembly approach that drastically increases the sensitivity and accuracy of full-length transcript assembly on the correct strand and improves detection of biological features of the transcriptome. When applied to a challenging set of under-annotated and cell-type variable lncRNA, our method resolves the segmentation problem of short-read sequencing and the depth problem of long-read sequencing, resulting in the assembly of coherent transcripts with precise 5’ and 3’ ends. Our workflow can be applied to existing datasets for superior demarcation of transcript ends and refined isoform structure, which can enable better differential gene expression analyses and molecular manipulations of transcripts.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Zablocki, Olivier, Michelle Michelsen, Marie Burris, Natalie Solonenko, Joanna Warwick-Dugdale, Romik Ghosh, Jennifer Pett-Ridge, Matthew B. Sullivan y Ben Temperton. "VirION2: a short- and long-read sequencing and informatics workflow to study the genomic diversity of viruses in nature". PeerJ 9 (30 de marzo de 2021): e11088. http://dx.doi.org/10.7717/peerj.11088.

Texto completo
Resumen
Microbes play fundamental roles in shaping natural ecosystem properties and functions, but do so under constraints imposed by their viral predators. However, studying viruses in nature can be challenging due to low biomass and the lack of universal gene markers. Though metagenomic short-read sequencing has greatly improved our virus ecology toolkit—and revealed many critical ecosystem roles for viruses—microdiverse populations and fine-scale genomic traits are missed. Some of these microdiverse populations are abundant and the missed regions may be of interest for identifying selection pressures that underpin evolutionary constraints associated with hosts and environments. Though long-read sequencing promises complete virus genomes on single reads, it currently suffers from high DNA requirements and sequencing errors that limit accurate gene prediction. Here we introduce VirION2, an integrated short- and long-read metagenomic wet-lab and informatics pipeline that updates our previous method (VirION) to further enhance the utility of long-read viral metagenomics. Using a viral mock community, we first optimized laboratory protocols (polymerase choice, DNA shearing size, PCR cycling) to enable 76% longer reads (now median length of 6,965 bp) from 100-fold less input DNA (now 1 nanogram). Using a virome from a natural seawater sample, we compared viromes generated with VirION2 against other library preparation options (unamplified, original VirION, and short-read), and optimized downstream informatics for improved long-read error correction and assembly. VirION2 assemblies combined with short-read based data (‘enhanced’ viromes), provided significant improvements over VirION libraries in the recovery of longer and more complete viral genomes, and our optimized error-correction strategy using long- and short-read data achieved 99.97% accuracy. In the seawater virome, VirION2 assemblies captured 5,161 viral populations (including all of the virus populations observed in the other assemblies), 30% of which were uniquely assembled through inclusion of long-reads, and 22% of the top 10% most abundant virus populations derived from assembly of long-reads. Viral populations unique to VirION2 assemblies had significantly higher microdiversity means, which may explain why short-read virome approaches failed to capture them. These findings suggest the VirION2 sample prep and workflow can help researchers better investigate the virosphere, even from challenging low-biomass samples. Our new protocols are available to the research community on protocols.io as a ‘living document’ to facilitate dissemination of updates to keep pace with the rapid evolution of long-read sequencing technology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Lecompte, Lolita, Pierre Peterlongo, Dominique Lavenier y Claire Lemaitre. "SVJedi: genotyping structural variations with long reads". Bioinformatics 36, n.º 17 (21 de mayo de 2020): 4568–75. http://dx.doi.org/10.1093/bioinformatics/btaa527.

Texto completo
Resumen
Abstract Motivation Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. Results We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. Availability and implementation https://github.com/llecompte/SVJedi.git Supplementary information Supplementary data are available at Bioinformatics online.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Wommack, K. Eric, Jaysheel Bhavsar y Jacques Ravel. "Metagenomics: Read Length Matters". Applied and Environmental Microbiology 74, n.º 5 (11 de enero de 2008): 1453–63. http://dx.doi.org/10.1128/aem.02181-07.

Texto completo
Resumen
ABSTRACT Obtaining an unbiased view of the phylogenetic composition and functional diversity within a microbial community is one central objective of metagenomic analysis. New technologies, such as 454 pyrosequencing, have dramatically reduced sequencing costs, to a level where metagenomic analysis may become a viable alternative to more-focused assessments of the phylogenetic (e.g., 16S rRNA genes) and functional diversity of microbial communities. To determine whether the short (∼100 to 200 bp) sequence reads obtained from pyrosequencing are appropriate for the phylogenetic and functional characterization of microbial communities, the results of BLAST and COG analyses were compared for long (∼750 bp) and randomly derived short reads from each of two microbial and one virioplankton metagenome libraries. Overall, BLASTX searches against the GenBank nr database found far fewer homologs within the short-sequence libraries. This was especially pronounced for a Chesapeake Bay virioplankton metagenome library. Increasing the short-read sampling depth or the length of derived short reads (up to 400 bp) did not completely resolve the discrepancy in BLASTX homolog detection. Only in cases where the long-read sequence had a close homolog (low BLAST E-score) did the derived short-read sequence also find a significant homolog. Thus, more-distant homologs of microbial and viral genes are not detected by short-read sequences. Among COG hits, derived short reads sampled at a depth of two short reads per long read missed up to 72% of the COG hits found using long reads. Noting the current limitation in computational approaches for the analysis of short sequences, the use of short-read-length libraries does not appear to be an appropriate tool for the metagenomic characterization of microbial communities.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Sierra, Roberto, Mélanie Roch, Milo Moraz, Julien Prados, Nicolas Vuilleumier, Stéphane Emonet y Diego O. Andrey. "Contributions of Long-Read Sequencing for the Detection of Antimicrobial Resistance". Pathogens 13, n.º 9 (28 de agosto de 2024): 730. http://dx.doi.org/10.3390/pathogens13090730.

Texto completo
Resumen
Background. In the context of increasing antimicrobial resistance (AMR), whole-genome sequencing (WGS) of bacteria is considered a highly accurate and comprehensive surveillance method for detecting and tracking the spread of resistant pathogens. Two primary sequencing technologies exist: short-read sequencing (50–300 base pairs) and long-read sequencing (thousands of base pairs). The former, based on Illumina sequencing platforms (ISPs), provides extensive coverage and high accuracy for detecting single nucleotide polymorphisms (SNPs) and small insertions/deletions, but is limited by its read length. The latter, based on platforms such as Oxford Nanopore Technologies (ONT), enables the assembly of genomes, particularly those with repetitive regions and structural variants, although its accuracy has historically been lower. Results. We performed a head-to-head comparison of these techniques to sequence the K. pneumoniae VS17 isolate, focusing on blaNDM resistance gene alleles in the context of a surveillance program. Discrepancies between the ISP (blaNDM-4 allele identified) and ONT (blaNDM-1 and blaNDM-5 alleles identified) were observed. Conjugation assays and Sanger sequencing, used as the gold standard, confirmed the validity of ONT results. This study demonstrates the importance of long-read or hybrid assemblies for accurate carbapenemase resistance gene identification and highlights the limitations of short reads in the context of gene duplications or multiple alleles. Conclusions. In this proof-of-concept study, we conclude that recent long-read sequencing technology may outperform standard short-read sequencing for the accurate identification of carbapenemase alleles. Such information is crucial given the rising prevalence of strains producing multiple carbapenemases, especially as WGS is increasingly used for epidemiological surveillance and infection control.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Wei, Po-Li, Ching-Sheng Hung, Yi-Wei Kao, Ying-Chin Lin, Cheng-Yang Lee, Tzu-Hao Chang, Ben-Chang Shia y Jung-Chun Lin. "Characterization of Fecal Microbiota with Clinical Specimen Using Long-Read and Short-Read Sequencing Platform". International Journal of Molecular Sciences 21, n.º 19 (26 de septiembre de 2020): 7110. http://dx.doi.org/10.3390/ijms21197110.

Texto completo
Resumen
Accurate and rapid identification of microbiotic communities using 16S ribosomal (r)RNA sequencing is a critical task for expanding medical and clinical applications. Next-generation sequencing (NGS) is widely considered a practical approach for direct application to communities without the need for in vitro culturing. In this report, a comparative evaluation of short-read (Illumina) and long-read (Oxford Nanopore Technologies (ONT)) platforms toward 16S rRNA sequencing with the same batch of total genomic DNA extracted from fecal samples is presented. Different 16S gene regions were amplified, bar-coded, and sequenced using the Illumina MiSeq and ONT MinION sequencers and corresponding kits. Mapping of the sequenced amplicon using MinION to the entire 16S rRNA gene was analyzed with the cloud-based EPI2ME algorithm. V3–V4 reads generated using MiSeq were aligned by applying the CLC genomics workbench. More than 90% of sequenced reads generated using distinct sequencers were accurately classified at the genus or species level. The misclassification of sequenced reads at the species level between the two approaches was less substantial as expected. Taken together, the comparative results demonstrate that MinION sequencing platform coupled with the corresponding algorithm could function as a practicable strategy in classifying bacterial community to the species level.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Holmqvist, Isak, Alan Bäckerholm, Yarong Tian, Guojiang Xie, Kaisa Thorell y Ka-Wei Tang. "FLAME: long-read bioinformatics tool for comprehensive spliceome characterization". RNA 27, n.º 10 (12 de julio de 2021): 1127–39. http://dx.doi.org/10.1261/rna.078800.121.

Texto completo
Resumen
Comprehensive characterization of differentially spliced RNA transcripts with nanopore sequencing is limited by bioinformatics tools that are reliant on existing annotations. We have developed FLAME, a bioinformatics pipeline for alternative splicing analysis of gene-specific or transcriptome-wide long-read sequencing data. FLAME is a Python-based tool aimed at providing comprehensible quantification of full-length splice variants, reliable de novo recognition of splice sites and exons, and representation of consecutive exon connectivity in the form of a weighted adjacency matrix. Notably, this workflow circumvents issues related to inadequate reference annotations and allows for incorporation of short-read sequencing data to improve the confidence of nanopore sequencing reads. In this study, the Epstein-Barr virus long noncoding RNA RPMS1 was used to demonstrate the utility of the pipeline. RPMS1 is ubiquitously expressed in Epstein-Barr virus associated cancer and known to undergo ample differential splicing. To fully resolve the RPMS1 spliceome, we combined gene-specific nanopore sequencing reads from a primary gastric adenocarcinoma and a nasopharyngeal carcinoma cell line with matched publicly available short-read sequencing data sets. All previously reported splice variants, including putative ORFs, were detected using FLAME. In addition, 32 novel exons, including two intron retentions and a cassette exon, were discovered within the RPMS1 gene.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Gouil, Quentin y Andrew Keniry. "Latest techniques to study DNA methylation". Essays in Biochemistry 63, n.º 6 (22 de noviembre de 2019): 639–48. http://dx.doi.org/10.1042/ebc20190027.

Texto completo
Resumen
Abstract Bisulfite sequencing is a powerful technique to detect 5-methylcytosine in DNA that has immensely contributed to our understanding of epigenetic regulation in plants and animals. Meanwhile, research on other base modifications, including 6-methyladenine and 4-methylcytosine that are frequent in prokaryotes, has been impeded by the lack of a comparable technique. Bisulfite sequencing also suffers from a number of drawbacks that are difficult to surmount, among which DNA degradation, lack of specificity, or short reads with low sequence diversity. In this review, we explore the recent refinements to bisulfite sequencing protocols that enable targeting genomic regions of interest, detecting derivatives of 5-methylcytosine, and mapping single-cell methylomes. We then present the unique advantage of long-read sequencing in detecting base modifications in native DNA and highlight the respective strengths and weaknesses of PacBio and Nanopore sequencing for this application. Although analysing epigenetic data from long-read platforms remains challenging, the ability to detect various modified bases from a universal sample preparation, in addition to the mapping and phasing advantages of the longer read lengths, provide long-read sequencing with a decisive edge over short-read bisulfite sequencing for an expanding number of applications across kingdoms.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Chaux, Frédéric, Nicolas Agier, Stephan Eberhard y Zhou Xu. "Extraction and selection of high-molecular-weight DNA for long-read sequencing from Chlamydomonas reinhardtii". PLOS ONE 19, n.º 2 (8 de febrero de 2024): e0297014. http://dx.doi.org/10.1371/journal.pone.0297014.

Texto completo
Resumen
Recent advances in long-read sequencing technologies have enabled the complete assembly of eukaryotic genomes from telomere to telomere by allowing repeated regions to be fully sequenced and assembled, thus filling the gaps left by previous short-read sequencing methods. Furthermore, long-read sequencing can also help characterizing structural variants, with applications in the fields of genome evolution or cancer genomics. For many organisms, the main bottleneck to sequence long reads remains the lack of robust methods to obtain high-molecular-weight (HMW) DNA. For this purpose, we developed an optimized protocol to extract DNA suitable for long-read sequencing from the unicellular green alga Chlamydomonas reinhardtii, based on CTAB/phenol extraction followed by a size selection step for long DNA molecules. We provide validation results for the extraction protocol, as well as statistics obtained with Oxford Nanopore Technologies sequencing.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Su, Yun, Liyuan Fan, Changhe Shi, Tai Wang, Huimin Zheng, Haiyang Luo, Shuo Zhang et al. "Deciphering Neurodegenerative Diseases Using Long-Read Sequencing". Neurology 97, n.º 9 (13 de agosto de 2021): 423–33. http://dx.doi.org/10.1212/wnl.0000000000012466.

Texto completo
Resumen
Neurodegenerative diseases exhibit chronic progressive lesions in the central and peripheral nervous systems with unclear causes. The search for pathogenic mutations in human neurodegenerative diseases has benefited from massively parallel short-read sequencers. However, genomic regions, including repetitive elements, especially with high/low GC content, are far beyond the capability of conventional approaches. Recently, long-read single-molecule DNA sequencing technologies have emerged and enabled researchers to study genomes, transcriptomes, and metagenomes at unprecedented resolutions. The identification of novel mutations in unresolved neurodegenerative disorders, the characterization of causative repeat expansions, and the direct detection of epigenetic modifications on naive DNA by virtue of long-read sequencers will further expand our understanding of neurodegenerative diseases. In this article, we review and compare 2 prevailing long-read sequencing technologies, Pacific Biosciences and Oxford Nanopore Technologies, and discuss their applications in neurodegenerative diseases.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Heller, David y Martin Vingron. "SVIM: structural variant identification using mapped long reads". Bioinformatics 35, n.º 17 (21 de enero de 2019): 2907–15. http://dx.doi.org/10.1093/bioinformatics/btz041.

Texto completo
Resumen
Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Pushel, Irina, Lisa A. Lansdon, Byunggil Yoo, Rebecca Biswell, Tomi Pastinen, Midhat S. Farooqi y Keith J. August. "Short- and Long-Read RNA Sequencing Improve Molecular Profiling of Pediatric T-Cell Acute Lymphoblastic Leukemia". Blood 144, Supplement 1 (5 de noviembre de 2024): 5921. https://doi.org/10.1182/blood-2024-208984.

Texto completo
Resumen
Acute lymphoblastic leukemia (ALL) is one of the most common pediatric cancers, accounting for approximately 1/3 of childhood cancer diagnoses. Of these patients, ~15% are diagnosed with T-cell ALL (T-ALL). Pediatric T-ALL is less well-characterized and has a worse prognosis than its B-cell counterpart, particularly after relapse. Although many genetic drivers for pediatric T-ALL have been characterized, this information does not currently inform treatment selection for patients. By integrating transcriptional profiling data with genomic findings from molecular and cytogenetic assays, we aim to better characterize the cellular landscape driving distinct subtypes of pediatric T-ALL. In this study, we performed short-read (n = 5) and long-read (n = 3) RNA sequencing (RNAseq) on frozen bone marrow aspirate samples collected at diagnosis from pediatric T-ALL patients. For short-read RNAseq, total RNA was isolated and libraries prepared using the Illumina Stranded Total RNA Prep kit, then sequenced on the Illumina NovaSeq. Alignment was performed to GRCh38 using STAR 2.6.0 and read counts estimated using kallisto 0.46.2. For long-read RNAseq, libraries were prepared using Pacific Biosciences IsoSeq method for cDNA synthesis and library construction with SMRTbell Prep Kit 3.0 and sequenced on the Pacific Biosciences Sequel IIe or Revio systems. HiFi reads were processed using isoseq 4.0.0, with FLNC reads aligned to GRCh38 using minimap2 2.28-r1209. Fusion detection was performed using FLNC reads as input to FusionSeeker 1.0.1. Downstream analysis and visualization were performed in R 4.3.3. We found that integrating both long- and short-read RNAseq data with genomic findings was most informative. For one patient, we were able to use long-read RNAseq to confirm a PICALM::MLLT10 fusion that had been suspected based on microarray data. This fusion is known to drive upregulation of HOXA family genes, and we observed elevated expression of both HOXA9 and HOXA10 (compared to other patients in the cohort) in the short-read RNAseq data. In a different case, a STIL::TAL1 fusion had been suspected based on clinical microarray data. Although this fusion was not called by the fusion detection algorithm we used, manual inspection revealed numerous supporting reads encompassing the STIL promoter and the TAL1 gene. The expected corresponding upregulation of the TAL1 gene compared to other patients was also observed in this patient sample using short-read RNAseq. Short-read RNAseq further supported previously identified loss of CDKN2A in 3/5 patients, and enabled comparison of gene expression known to be implicated in T-ALL pathogenesis such as BCL11B, MEF2C, and LMO2 across the cohort. Long-read sequencing presented a unique opportunity to explore the expression of specific isoforms, such as the TAL1-short isoform, which we uniquely observe in one patient and has been identified as a putative tumor suppressor. Each RNAseq approach provided distinct insights into T-ALL biology. By integrating short- and long-read RNAseq data with genomic microarray profiling, we were able to not only confirm but also extend and contextualize observations about the cellular state of T-ALL leukemic blasts. We find that a combination of all three data types gives a comprehensive picture of the downstream effects of genetic lesions and suggests mechanisms through which distinct subtypes of pediatric T-ALL may drive cancer progression. Future work extending to high-throughput concatenated long-read RNAseq (Pacific Biosciences Kinnex), using this multi-modal profiling approach across a larger cohort will facilitate improved understanding of genomic drivers, and may improve individualized treatment selection and outcomes for pediatric patients with T-ALL.
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Broseus, Lucile, Aubin Thomas, Andrew J. Oldfield, Dany Severac, Emeric Dubois y William Ritchie. "TALC: Transcript-level Aware Long-read Correction". Bioinformatics 36, n.º 20 (16 de julio de 2020): 5000–5006. http://dx.doi.org/10.1093/bioinformatics/btaa634.

Texto completo
Resumen
Abstract Motivation Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous ‘hybrid correction’ algorithms have been developed for genomic data that correct long reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data. Results We have created a novel reference-free algorithm called Transcript-level Aware Long-Read Correction (TALC) which models changes in RNA expression and isoform representation in a weighted De Bruijn graph to correct long reads from transcriptome studies. We show that transcript-level aware correction by TALC improves the accuracy of the whole spectrum of downstream RNA-seq applications and is thus necessary for transcriptome analyses that use long read technology. Availability and implementation TALC is implemented in C++ and available at https://github.com/lbroseus/TALC. Supplementary information Supplementary data are available at Bioinformatics online.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Wick, Ryan R. y Kathryn E. Holt. "Benchmarking of long-read assemblers for prokaryote whole genome sequencing". F1000Research 8 (23 de diciembre de 2019): 2138. http://dx.doi.org/10.12688/f1000research.21782.1.

Texto completo
Resumen
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of six long-read assemblers (Canu, Flye, Miniasm/Minipolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.6 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 was the only assembler which consistently produced clean contig circularisation. Raven v0.0.5 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.3.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Wick, Ryan R. y Kathryn E. Holt. "Benchmarking of long-read assemblers for prokaryote whole genome sequencing". F1000Research 8 (22 de abril de 2020): 2138. http://dx.doi.org/10.12688/f1000research.21782.2.

Texto completo
Resumen
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled – one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of seven long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v1.9 produced moderately reliable assemblies but had the longest runtimes of all assemblers tested. Flye v2.7 was more reliable and did particularly well with plasmid assembly. Miniasm/Minipolish v0.3 and NECAT v20200119 were the most likely to produce clean contig circularisation. Raven v0.0.8 was the most reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.4.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Thibodeau, My Linh, Kieran O’Neill, Katherine Dixon, Caralyn Reisle, Karen L. Mungall, Martin Krzywinski, Yaoqing Shen et al. "Improved structural variant interpretation for hereditary cancer susceptibility using long-read sequencing". Genetics in Medicine 22, n.º 11 (6 de julio de 2020): 1892–97. http://dx.doi.org/10.1038/s41436-020-0880-8.

Texto completo
Resumen
Abstract Purpose Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. Methods Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. Results Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. Conclusion Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Anantharam, Raghavendran, Dylan Duchen, Andrea L. Cox, Winston Timp, David L. Thomas, Steven J. Clipman y Abraham J. Kandathil. "Long-Read Nanopore-Based Sequencing of Anelloviruses". Viruses 16, n.º 5 (2 de mayo de 2024): 723. http://dx.doi.org/10.3390/v16050723.

Texto completo
Resumen
Routinely used metagenomic next-generation sequencing (mNGS) techniques often fail to detect low-level viremia (<104 copies/mL) and appear biased towards viruses with linear genomes. These limitations hinder the capacity to comprehensively characterize viral infections, such as those attributed to the Anelloviridae family. These near ubiquitous non-pathogenic components of the human virome have circular single-stranded DNA genomes that vary in size from 2.0 to 3.9 kb and exhibit high genetic diversity. Hence, species identification using short reads can be challenging. Here, we introduce a rolling circle amplification (RCA)-based metagenomic sequencing protocol tailored for circular single-stranded DNA genomes, utilizing the long-read Oxford Nanopore platform. The approach was assessed by sequencing anelloviruses in plasma drawn from people who inject drugs (PWID) in two geographically distinct cohorts. We detail the methodological adjustments implemented to overcome difficulties inherent in sequencing circular genomes and describe a computational pipeline focused on anellovirus detection. We assessed our protocol across various sample dilutions and successfully differentiated anellovirus sequences in conditions simulating mixed infections. This method provides a robust framework for the comprehensive characterization of circular viruses within the human virome using the Oxford Nanopore.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Das, Arghya Kusum, Sayan Goswami, Kisung Lee y Seung-Jong Park. "A hybrid and scalable error correction algorithm for indel and substitution errors of long reads". BMC Genomics 20, S11 (diciembre de 2019). http://dx.doi.org/10.1186/s12864-019-6286-9.

Texto completo
Resumen
Abstract Background Long-read sequencing has shown the promises to overcome the short length limitations of second-generation sequencing by providing more complete assembly. However, the computation of the long sequencing reads is challenged by their higher error rates (e.g., 13% vs. 1%) and higher cost ($0.3 vs. $0.03 per Mbp) compared to the short reads. Methods In this paper, we present a new hybrid error correction tool, called ParLECH (Parallel Long-read Error Correction using Hybrid methodology). The error correction algorithm of ParLECH is distributed in nature and efficiently utilizes the k-mer coverage information of high throughput Illumina short-read sequences to rectify the PacBio long-read sequences.ParLECH first constructs a de Bruijn graph from the short reads, and then replaces the indel error regions of the long reads with their corresponding widest path (or maximum min-coverage path) in the short read-based de Bruijn graph. ParLECH then utilizes the k-mer coverage information of the short reads to divide each long read into a sequence of low and high coverage regions, followed by a majority voting to rectify each substituted error base. Results ParLECH outperforms latest state-of-the-art hybrid error correction methods on real PacBio datasets. Our experimental evaluation results demonstrate that ParLECH can correct large-scale real-world datasets in an accurate and scalable manner. ParLECH can correct the indel errors of human genome PacBio long reads (312 GB) with Illumina short reads (452 GB) in less than 29 h using 128 compute nodes. ParLECH can align more than 92% bases of an E. coli PacBio dataset with the reference genome, proving its accuracy. Conclusion ParLECH can scale to over terabytes of sequencing data using hundreds of computing nodes. The proposed hybrid error correction methodology is novel and rectifies both indel and substitution errors present in the original long reads or newly introduced by the short reads.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Babarinde, Isaac Adeyemi y Andrew Paul Hutchins. "The effects of sequencing depth on the assembly of coding and noncoding transcripts in the human genome". BMC Genomics 23, n.º 1 (4 de julio de 2022). http://dx.doi.org/10.1186/s12864-022-08717-z.

Texto completo
Resumen
AbstractInvestigating the functions and activities of genes requires proper annotation of the transcribed units. However, transcript assembly efforts have produced a surprisingly large variation in the number of transcripts, and especially so for noncoding transcripts. This heterogeneity in assembled transcript sets might be partially explained by sequencing depth. Here, we used real and simulated short-read sequencing data as well as long-read data to systematically investigate the impact of sequencing depths on the accuracy of assembled transcripts. We assembled and analyzed transcripts from 671 human short-read data sets and four long-read data sets. At the first level, there is a positive correlation between the number of reads and the number of recovered transcripts. However, the effect of the sequencing depth varied based on cell or tissue type, the type of read and the nature and expression levels of the transcripts. The detection of coding transcripts saturated rapidly with both short and long-reads, however, there was no sign of early saturation for noncoding transcripts at any sequencing depth. Increasing long-read sequencing depth specifically benefited transcripts containing transposable elements. Finally, we show how single-cell RNA-seq can be guided by transcripts assembled from bulk long-read samples, and demonstrate that noncoding transcripts are expressed at similar levels to coding transcripts but are expressed in fewer cells. This study highlights the impact of sequencing depth on transcript assembly.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Kovaka, Sam, Aleksey V. Zimin, Geo M. Pertea, Roham Razaghi, Steven L. Salzberg y Mihaela Pertea. "Transcriptome assembly from long-read RNA-seq alignments with StringTie2". Genome Biology 20, n.º 1 (diciembre de 2019). http://dx.doi.org/10.1186/s13059-019-1910-1.

Texto completo
Resumen
AbstractRNA sequencing using the latest single-molecule sequencing instruments produces reads that are thousands of nucleotides long. The ability to assemble these long reads can greatly improve the sensitivity of long-read analyses. Here we present StringTie2, a reference-guided transcriptome assembler that works with both short and long reads. StringTie2 includes new methods to handle the high error rate of long reads and offers the ability to work with full-length super-reads assembled from short reads, which further improves the quality of short-read assemblies. StringTie2 is more accurate and faster and uses less memory than all comparable short-read and long-read analysis tools.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Yang, Chao, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang y Lu Zhang. "LRTK: a platform agnostic toolkit for linked-read analysis of both human genome and metagenome". GigaScience 13 (2024). http://dx.doi.org/10.1093/gigascience/giae028.

Texto completo
Resumen
Abstract Background Linked-read sequencing technologies generate high-base quality short reads that contain extrapolative information on long-range DNA connectedness. These advantages of linked-read technologies are well known and have been demonstrated in many human genomic and metagenomic studies. However, existing linked-read analysis pipelines (e.g., Long Ranger) were primarily developed to process sequencing data from the human genome and are not suited for analyzing metagenomic sequencing data. Moreover, linked-read analysis pipelines are typically limited to 1 specific sequencing platform. Findings To address these limitations, we present the Linked-Read ToolKit (LRTK), a unified and versatile toolkit for platform agnostic processing of linked-read sequencing data from both human genome and metagenome. LRTK provides functions to perform linked-read simulation, barcode sequencing error correction, barcode-aware read alignment and metagenome assembly, reconstruction of long DNA fragments, taxonomic classification and quantification, and barcode-assisted genomic variant calling and phasing. LRTK has the ability to process multiple samples automatically and provides users with the option to generate reproducible reports during processing of raw sequencing data and at multiple checkpoints throughout downstream analysis. We applied LRTK on linked reads from simulation, mock community, and real datasets for both human genome and metagenome. We showcased LRTK’s ability to generate comparative performance results from preceding benchmark studies and to report these results in publication-ready HTML document plots. Conclusions LRTK provides comprehensive and flexible modules along with an easy-to-use Python-based workflow for processing linked-read sequencing datasets, thereby filling the current gap in the field caused by platform-centric genome-specific linked-read data analysis tools.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Kallenborn, Felix y Bertil Schmidt. "CAREx: context-aware read extension of paired-end sequencing data". BMC Bioinformatics 25, n.º 1 (10 de mayo de 2024). http://dx.doi.org/10.1186/s12859-024-05802-w.

Texto completo
Resumen
Abstract Background Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. Results We present CAREx—an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to $$99\%$$ 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. Conclusion CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at (https://github.com/fkallen/CAREx).
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Zee, Alexander, Dori Zhi Qian Deng, Matthew Adams, Kayla D. Schimke, Russell Corbett-Detig, Shelbi L. Russell, Xuan Zhang, Robert J. Schmitz y Christopher Vollmers. "Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2". Genome Research, 9 de noviembre de 2022, gr.277031.122. http://dx.doi.org/10.1101/gr.277031.122.

Texto completo
Resumen
High-throughput short-read sequencing has taken on a central role in research and diagnostics. Hundreds of different assays exist today to take advantage of Illumina short-read sequencers, the predominant short-read sequencing technology available today. Although other short-read sequencing technologies exist, the ubiquity of Illumina sequencers in sequencing core facilities, and the high capital costs of these technologies have limited their adoption. Among a new generation of sequencing technologies, Oxford Nanopore Technologies (ONT) holds a unique position because the ONT MinION, an error-prone long-read sequencer, is associated with little to no capital cost. Here we show that we can make short-read Illumina libraries compatible with the ONT MinION by using the R2C2 method to circularize and amplify the short library molecules. This results in longer DNA molecules containing tandem repeats of the original short library molecules. This longer DNA is ideally suited for the ONT MinION, and after sequencing, the tandem repeats in the resulting raw reads can be converted into high-accuracy consensus reads with similar error rates to that of the Illumina MiSeq. We highlight this capability by producing and benchmarking RNA-seq, ChIP-seq, as well as regular and target-enriched Tn5 libraries. We also explore the use of this approach for rapid evaluation of sequencing library metrics by implementing a real-time analysis workflow.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Meleshko, Dmitry, Andrey D. Prjbelski, Mikhail Raiko, Alexandru I. Tomescu, Hagen Tilgner y Iman Hajirasouliha. "cloudrnaSPAdes: Isoform assembly using bulk barcoded RNA sequencing data". Bioinformatics, 23 de enero de 2024. http://dx.doi.org/10.1093/bioinformatics/btad781.

Texto completo
Resumen
Abstract Motivation Recent advancements in long-read RNA sequencing have enabled the examination of full-length isoforms, previously uncaptured by short-read sequencing methods. An alternative powerful method for studying isoforms is through the use of barcoded short-read RNA reads, for which a barcode indicates whether two short-reads arise from the same molecule or not. Such techniques included the 10x Genomics linked-read based SParse Isoform Sequencing (SPIso-seq), as well as Loop-Seq, or Tell-Seq. Some applications, such as novel-isoform discovery, require very high coverage. Obtaining high coverage using long reads can be difficult, making barcoded RNA-seq data a valuable alternative for this task. However, most annotation pipelines are not able to work with a set of short reads instead of a single transcript, also not able to work with coverage gaps within a molecule if any. In order to overcome this challenge, we present an RNA-seq assembler that allows the determination of the expressed isoform per barcode. Results In this paper, we present cloudrnaSPAdes, a tool for assembling full-length isoforms from barcoded RNA-seq linked-read data in a reference-free fashion. Evaluating it on simulated and real human data, we found that cloudrnaSPAdes accurately assembles isoforms, even for genes with high isoform diversity. Availability cloudrnaSPAdes is a feature release of a SPAdes assembler and version used for this paper is available at https://cab.spbu.ru/software/cloudrnaspades/ and https://github.com/ablab/spades/releases/tag/rnacloudSPAdes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Karaoglanoglu, Fatih, Cedric Chauve y Faraz Hach. "Genion, an accurate tool to detect gene fusion from long transcriptomics reads". BMC Genomics 23, n.º 1 (14 de febrero de 2022). http://dx.doi.org/10.1186/s12864-022-08339-5.

Texto completo
Resumen
Abstract Background The advent of next-generation sequencing technologies empowered a wide variety of transcriptomics studies. A widely studied topic is gene fusion which is observed in many cancer types and suspected of having oncogenic properties. Gene fusions are the result of structural genomic events that bring two genes closely located and result in a fused transcript. This is different from fusion transcripts created during or after the transcription process. These chimeric transcripts are also known as read-through and trans-splicing transcripts. Gene fusion discovery with short reads is a well-studied problem, and many methods have been developed. But the sensitivity of these methods is limited by the technology, especially the short read length. Advances in long-read sequencing technologies allow the generation of long transcriptomics reads at a low cost. Transcriptomic long-read sequencing presents unique opportunities to overcome the shortcomings of short-read technologies for gene fusion detection while introducing new challenges. Results We present Genion, a sensitive and fast gene fusion detection method that can also detect read-through events. We compare Genion against a recently introduced long-read gene fusion discovery method, LongGF, both on simulated and real datasets. On simulated data, Genion accurately identifies the gene fusions and its clustering accuracy for detecting fusion reads is better than LongGF. Furthermore, our results on the breast cancer cell line MCF-7 show that Genion correctly identifies all the experimentally validated gene fusions. Conclusions Genion is an accurate gene fusion caller. Genion is implemented in C++ and is available at https://github.com/vpc-ccg/genion.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Commichaux, Seth, Kiran Javkar, Padmini Ramachandran, Niranjan Nagarajan, Denis Bertrand, Yi Chen, Elizabeth Reed et al. "Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads". BMC Genomics 22, n.º 1 (26 de mayo de 2021). http://dx.doi.org/10.1186/s12864-021-07702-2.

Texto completo
Resumen
Abstract Background Whole genome sequencing of cultured pathogens is the state of the art public health response for the bioinformatic source tracking of illness outbreaks. Quasimetagenomics can substantially reduce the amount of culturing needed before a high quality genome can be recovered. Highly accurate short read data is analyzed for single nucleotide polymorphisms and multi-locus sequence types to differentiate strains but cannot span many genomic repeats, resulting in highly fragmented assemblies. Long reads can span repeats, resulting in much more contiguous assemblies, but have lower accuracy than short reads. Results We evaluated the accuracy of Listeria monocytogenes assemblies from enrichments (quasimetagenomes) of naturally-contaminated ice cream using long read (Oxford Nanopore) and short read (Illumina) sequencing data. Accuracy of ten assembly approaches, over a range of sequencing depths, was evaluated by comparing sequence similarity of genes in assemblies to a complete reference genome. Long read assemblies reconstructed a circularized genome as well as a 71 kbp plasmid after 24 h of enrichment; however, high error rates prevented high fidelity gene assembly, even at 150X depth of coverage. Short read assemblies accurately reconstructed the core genes after 28 h of enrichment but produced highly fragmented genomes. Hybrid approaches demonstrated promising results but had biases based upon the initial assembly strategy. Short read assemblies scaffolded with long reads accurately assembled the core genes after just 24 h of enrichment, but were highly fragmented. Long read assemblies polished with short reads reconstructed a circularized genome and plasmid and assembled all the genes after 24 h enrichment but with less fidelity for the core genes than the short read assemblies. Conclusion The integration of long and short read sequencing of quasimetagenomes expedited the reconstruction of a high quality pathogen genome compared to either platform alone. A new and more complete level of information about genome structure, gene order and mobile elements can be added to the public health response by incorporating long read analyses with the standard short read WGS outbreak response.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Liu, Silvia, Caroline Obert, Yan-Ping Yu, Junhua Zhao, Bao-Guo Ren, Jia-Jun Liu, Kelly Wiseman et al. "Utility analyses of AVITI sequencing chemistry". BMC Genomics 25, n.º 1 (10 de agosto de 2024). http://dx.doi.org/10.1186/s12864-024-10686-4.

Texto completo
Resumen
Abstract Background DNA sequencing is a critical tool in modern biology. Over the last two decades, it has been revolutionized by the advent of massively parallel sequencing, leading to significant advances in the genome and transcriptome sequencing of various organisms. Nevertheless, challenges with accuracy, lack of competitive options and prohibitive costs associated with high throughput parallel short-read sequencing persist. Results Here, we conduct a comparative analysis using matched DNA and RNA short-reads assays between Element Biosciences’ AVITI and Illumina’s NextSeq 550 chemistries. Similar comparisons were evaluated for synthetic long-read sequencing for RNA and targeted single-cell transcripts between the AVITI and Illumina’s NovaSeq 6000. For both DNA and RNA short-read applications, the study found that the AVITI produced significantly higher per sequence quality scores. For PCR-free DNA libraries, we observed an average 89.7% lower experimentally determined error rate when using the AVITI chemistry, compared to the NextSeq 550. For short-read RNA quantification, AVITI platform had an average of 32.5% lower error rate than that for NextSeq 550. With regards to synthetic long-read mRNA and targeted synthetic long read single cell mRNA sequencing, both platforms’ respective chemistries performed comparably in quantification of genes and isoforms. The AVITI displayed a marginally lower error rate for long reads, with fewer chemistry-specific errors and a higher mutation detection rate. Conclusion These results point to the potential of the AVITI platform as a competitive candidate in high-throughput short read sequencing analyses when juxtaposed with the Illumina NextSeq 550.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

De Coster, Wouter, Mojca Strazisar y Peter De Rijk. "Critical length in long-read resequencing". NAR Genomics and Bioinformatics 2, n.º 1 (13 de enero de 2020). http://dx.doi.org/10.1093/nargab/lqz027.

Texto completo
Resumen
Abstract Long-read sequencing has substantial advantages for structural variant discovery and phasing of variants compared to short-read technologies, but the required and optimal read length has not been assessed. In this work, we used long reads simulated from human genomes and evaluated structural variant discovery and variant phasing using current best practice bioinformatics methods. We determined that optimal discovery of structural variants from human genomes can be obtained with reads of minimally 20 kb. Haplotyping variants across genes only reaches its optimum from reads of 100 kb. These findings are important for the design of future long-read sequencing projects.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Neubert, Kerstin, Eric Zuchantke, Robert Maximilian Leidenfrost, Röbbe Wünschiers, Josephine Grützke, Burkhard Malorny, Holger Brendebach et al. "Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures". BMC Genomics 22, n.º 1 (14 de noviembre de 2021). http://dx.doi.org/10.1186/s12864-021-08115-x.

Texto completo
Resumen
Abstract Background We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. Results We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. Conclusions Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Gehrig, Jeanette L., Daniel M. Portik, Mark D. Driscoll, Eric Jackson, Shreyasee Chakraborty, Dawn Gratalo, Meredith Ashby y Ricardo Valladares. "Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data". Microbial Genomics 8, n.º 3 (18 de marzo de 2022). http://dx.doi.org/10.1099/mgen.0.000794.

Texto completo
Resumen
A long-standing challenge in human microbiome research is achieving the taxonomic and functional resolution needed to generate testable hypotheses about the gut microbiota’s impact on health and disease. With a growing number of live microbial interventions in clinical development, this challenge is renewed by a need to understand the pharmacokinetics and pharmacodynamics of therapeutic candidates. While short-read sequencing of the bacterial 16S rRNA gene has been the standard for microbiota profiling, recent improvements in the fidelity of long-read sequencing underscores the need for a re-evaluation of the value of distinct microbiome-sequencing approaches. We leveraged samples from participants enrolled in a phase 1b clinical trial of a novel live biotherapeutic product to perform a comparative analysis of short-read and long-read amplicon and metagenomic sequencing approaches to assess their utility for generating clinical microbiome data. Across all methods, overall community taxonomic profiles were comparable and relationships between samples were conserved. Comparison of ubiquitous short-read 16S rRNA amplicon profiling to long-read profiling of the 16S-ITS-23S rRNA amplicon showed that only the latter provided strain-level community resolution and insight into novel taxa. All methods identified an active ingredient strain in treated study participants, though detection confidence was higher for long-read methods. Read coverage from both metagenomic methods provided evidence of active-ingredient strain replication in some treated participants. Compared to short-read metagenomics, approximately twice the proportion of long reads were assigned functional annotations. Finally, compositionally similar bacterial metagenome-assembled genomes (MAGs) were recovered from short-read and long-read metagenomic methods, although a greater number and more complete MAGs were recovered from long reads. Despite higher costs, both amplicon and metagenomic long-read approaches yielded added microbiome data value in the form of higher confidence taxonomic and functional resolution and improved recovery of microbial genomes compared to traditional short-read methodologies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Fang, Li, Charlly Kao, Michael V. Gonzalez, Fernanda A. Mafra, Renata Pellegrino da Silva, Mingyao Li, Sören-Sebastian Wenzel, Katharina Wimmer, Hakon Hakonarson y Kai Wang. "LinkedSV for detection of mosaic structural variants from linked-read exome and genome sequencing data". Nature Communications 10, n.º 1 (diciembre de 2019). http://dx.doi.org/10.1038/s41467-019-13397-7.

Texto completo
Resumen
AbstractLinked-read sequencing provides long-range information on short-read sequencing data by barcoding reads originating from the same DNA molecule, and can improve detection and breakpoint identification for structural variants (SVs). Here we present LinkedSV for SV detection on linked-read sequencing data. LinkedSV considers barcode overlapping and enriched fragment endpoints as signals to detect large SVs, while it leverages read depth, paired-end signals and local assembly to detect small SVs. Benchmarking studies demonstrate that LinkedSV outperforms existing tools, especially on exome data and on somatic SVs with low variant allele frequencies. We demonstrate clinical cases where LinkedSV identifies disease-causal SVs from linked-read exome sequencing data missed by conventional exome sequencing, and show examples where LinkedSV identifies SVs missed by high-coverage long-read sequencing. In summary, LinkedSV can detect SVs missed by conventional short-read and long-read sequencing approaches, and may resolve negative cases from clinical genome/exome sequencing studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Mak, Lauren, Dmitry Meleshko, David C. Danko, Waris N. Barakzai, Salil Maharjan, Natan Belchikov y Iman Hajirasouliha. "Ariadne: synthetic long read deconvolution using assembly graphs". Genome Biology 24, n.º 1 (28 de agosto de 2023). http://dx.doi.org/10.1186/s13059-023-03033-5.

Texto completo
Resumen
AbstractSynthetic long read sequencing techniques such as UST’s TELL-Seq and Loop Genomics’ LoopSeq combine 3$$'$$ ′ barcoding with standard short-read sequencing to expand the range of linkage resolution from hundreds to tens of thousands of base-pairs. However, the lack of a 1:1 correspondence between a long fragment and a 3$$'$$ ′ unique molecular identifier confounds the assignment of linkage between short reads. We introduce Ariadne, a novel assembly graph-based synthetic long read deconvolution algorithm, that can be used to extract single-species read-clouds from synthetic long read datasets to improve the taxonomic classification and de novo assembly of complex populations, such as metagenomes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Portik, Daniel M., C. Titus Brown y N. Tessa Pierce-Ward. "Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets". BMC Bioinformatics 23, n.º 1 (13 de diciembre de 2022). http://dx.doi.org/10.1186/s12859-022-05103-0.

Texto completo
Resumen
Abstract Background Long-read shotgun metagenomic sequencing is gaining in popularity and offers many advantages over short-read sequencing. The higher information content in long reads is useful for a variety of metagenomics analyses, including taxonomic classification and profiling. The development of long-read specific tools for taxonomic classification is accelerating, yet there is a lack of information regarding their relative performance. Here, we perform a critical benchmarking study using 11 methods, including five methods designed specifically for long reads. We applied these tools to several mock community datasets generated using Pacific Biosciences (PacBio) HiFi or Oxford Nanopore Technology sequencing, and evaluated their performance based on read utilization, detection metrics, and relative abundance estimates. Results Our results show that long-read classifiers generally performed best. Several short-read classification and profiling methods produced many false positives (particularly at lower abundances), required heavy filtering to achieve acceptable precision (at the cost of reduced recall), and produced inaccurate abundance estimates. By contrast, two long-read methods (BugSeq, MEGAN-LR & DIAMOND) and one generalized method (sourmash) displayed high precision and recall without any filtering required. Furthermore, in the PacBio HiFi datasets these methods detected all species down to the 0.1% abundance level with high precision. Some long-read methods, such as MetaMaps and MMseqs2, required moderate filtering to reduce false positives to resemble the precision and recall of the top-performing methods. We found read quality affected performance for methods relying on protein prediction or exact k-mer matching, and these methods performed better with PacBio HiFi datasets. We also found that long-read datasets with a large proportion of shorter reads (< 2 kb length) resulted in lower precision and worse abundance estimates, relative to length-filtered datasets. Finally, for classification methods, we found that the long-read datasets produced significantly better results than short-read datasets, demonstrating clear advantages for long-read metagenomic sequencing. Conclusions Our critical assessment of available methods provides best-practice recommendations for current research using long reads and establishes a baseline for future benchmarking studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía