Log in

Relevant bibliographies by topics / Third Generation Sequencing (TGS) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Third Generation Sequencing (TGS).

Dissertations / Theses on the topic 'Third Generation Sequencing (TGS)'

Author: Grafiati

Published: 25 January 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 15 dissertations / theses for your research on the topic 'Third Generation Sequencing (TGS).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Faure, Roland. "Haplotype assembly from long reads." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS052.

Full text

Abstract:

Cette thèse propose des solutions pour améliorer l'assemblage des génomes à partir de lectures de séquençage de troisième génération (lectures longues). Plus précisément, elle se concentre sur l'amélioration de l'assemblage des (méta)génomes contenant plusieurs haplotypes, comme des génomes polyploïdes ou des souches bactériennes proches. Les assembleurs actuels ont du mal à séparer les haplotypes très similaires, et fusionnent généralement des (parties d')haplotypes, ce qui entraîne la perte de polymorphismes et d'hétérozygotie dans l'assemblage final. Ce travail présente une série de méthodes et de logiciels pour obtenir des assemblages contenant des haplotypes bien séparés. Plus précisément, GenomeTailor et HairSplitter transforment un assemblage obtenu avec des lectures longues erronées en un assemblage phasé, améliorant considérablement l'état de l'art lorsque de nombreuses souches sont présentes. Le logiciel Alice propose une nouvelle méthode, basée sur des nouveaux sketchs ``MSR'', pour assembler efficacement plusieurs haplotypes séquencés avec des lectures de haute fidélité. Enfin, cette thèse propose une nouvelle stratégie de scaffolding Hi-C basée sur le démêlage des graphes d'assemblage qui améliore considérablement les assemblages finaux, en particulier lorsque plusieurs haplotypes sont présents
This thesis presents solutions to improve genome assembly from third-generation sequencing reads, with a specific focus on improving the assembly of (meta)genomes containing multiple haplotypes, such as polyploid genomes or close bacterial strains. Current assemblers struggle to separate highly similar haplotypes, often collapsing all or parts of the haplotypes into one, thereby discarding polymorphisms and heterozygosity. This work introduces a series of methods and software tools to achieve haplotype-separated assemblies. Specifically, GenomeTailor and HairSplitter transform a collapsed assembly obtained with erroneous long reads into a phased assembly, significantly improving on the state of the art when numerous strains are present. The software Alice introduces a new method based on the new ``MSR'' sketching technique for efficiently assembling multiple haplotypes sequenced with high-fidelity reads. Additionally, this thesis proposes a new Hi-C scaffolding strategy that involves untangling assembly graphs which significantly improves final assemblies, particularly when several haplotypes are present

APA, Harvard, Vancouver, ISO, and other styles

2

Heller, David [Verfasser]. "Structural variant calling using third-generation sequencing data / David Heller." Berlin : Freie Universität Berlin, 2021. http://d-nb.info/122534946X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Mayo, Thomas Richard. "Machine learning for epigenetics : algorithms for next generation sequencing data." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/33055.

Full text

Abstract:

The advent of Next Generation Sequencing (NGS), a little over a decade ago, has led to a vast and rapid increase in the generation of genomic data. The drastically reduced cost has in turn enabled powerful modifications that can be used to investigate not just genetic, but epigenetic, phenomena. Epigenetics refers to the study of mechanisms effecting gene expression other than the genetic code itself and thus, at the transcription level, incorporates DNA methylation, transcription factor binding and histone modifications amongst others. This thesis outlines and tackles two major challenges in the computational analysis of such data using techniques from machine learning. Firstly, I address the problem of testing for differential methylation between groups of bisulfite sequencing data sets. DNA methylation plays an important role in genomic imprinting, X-chromosome inactivation and the repression of repetitive elements, as well as being implicated in numerous diseases, such as cancer. Bisulfite sequencing provides single nucleotide resolution methylation data at the whole genome scale, but a sensitive analysis of such data is difficult. I propose a solution that uses a powerful kernel-based machine learning technique, the Maximum Mean Discrepancy, to leverage well-characterised spatial correlations in DNA methylation, and adapt the method for this particular use. I use this tailored method to analyse a novel data set from a study of ageing in three different tissues in the mouse. This study motivates further modifications to the method and highlights the utility of the underlying measure as an exploratory tool for methylation analysis. Secondly, I address the problem of predictive and explanatory modelling of chromatin immunoprecipitation sequencing data (ChIP-Seq). ChIP-Seq is typically used to assay the binding of a protein of interest, such as a transcription factor or histone, to the DNA, and as such is one of the most widely used sequencing assays. While peak callers are a powerful tool in identifying binding sites of sparse and clean ChIPSeq profiles, more broad signals defy analysis in this framework. Instead, generative models that explain the data in terms of the underlying sequence can help uncover mechanisms that predicting binding or the lack thereof. I explore current problems with ChIP-Seq analysis, such as zero-inflation and the use of the control experiment, known as the input. I then devise a method for representing k-mers that enables the use of longer DNA sub-sequences within a flexible model development framework, such as generalised linear models, without heavy programming requirements. Finally, I use these insights to develop an appropriate Bayesian generative model that predicts ChIP-Seq count data in terms of the underlying DNA sequence, incorporating DNA methylation information where available, fitting the model with the Expectation-Maximization algorithm. The model is tested on simulated data and real data pertaining to the histone mark H3k27me3. This thesis therefore straddles the fields of bioinformatics and machine learning. Bioinformatics is both plagued and blessed by the plethora of different techniques available for gathering data and their continual innovations. Each technique presents a unique challenge, and hence out-of-the-box machine learning techniques have had little success in solving biological problems. While I have focused on NGS data, the methods developed in this thesis are likely to be applicable to future technologies, such as Third Generation Sequencing methods, and the lessons learned in their adaptation will be informative for the next wave of computational challenges.

APA, Harvard, Vancouver, ISO, and other styles

4

Lebó, Marko. "Přímá klasifikace metagenomických signálů ze sekvenace nanopórem." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400964.

Full text

Abstract:

This diploma thesis deals with taxonomy independent methods for classification of metagenomic signals, aquired by a MinION sequencer. It describes the formation and character of metagenomic data and already existing methods of metagenomic data classification and their development. This thesis also evaluates an impact of the third generation sequencing techniques in the world of metagenomics and further specialises on the function of the Oxford Nanopore MinION sequencing device. Lastly, a custom method for metagenomic data classification, based on data obtained from a MinION sequencer, is proposed and compared with an already existing method of classification.

APA, Harvard, Vancouver, ISO, and other styles

5

FORMENTI, GIULIO PAOLO. "THIRD-GENERATION SEQUENCING AND ASSEMBLY OF THE BARN SWALLOW GENOME AND A STUDY ON THE EVOLUTION OF THE HUNTINGTIN GENE." Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/611650.

Full text

Abstract:

The present thesis is divided in two sections. The first section outlines the scientific work that I have accomplished during the last year of my graduate studies. The goal was to generate a reference genome for the European barn swallow (Hirundo rustica rustica). The barn swallow (Hirundo rustica) is a migratory bird that has been the focus of a large number of ecological, behavioural and genetic studies. To facilitate further population genetics and genomic studies, I have generated a high-quality genome for the European subspecies (Hirundo rustica rustica) using third-generation Single Molecule Real-Time (SMRT) DNA sequencing from Pacific Biosciences (Menlo Park, California, USA) and optical mapping from Bionano Genomics (San Diego, California, USA). For optical mapping, DNA molecules were labelled both with one of the original Nick, Label, Repair and Stain (NLRS) nickases (enzyme Nb.BssSI) and with the new Direct Label and Stain (DLS) approach (enzyme DLE-1). This allowed to compare and integrate optical maps derived both from NLRS and DLS technologies. The latter was officially released in February 2018 and avoids nicking and subsequent cleavage of DNA molecules upon staining. To my knowledge, this has been the first genome assembly to incorporate DLS data and this approach has more than doubled the assembly N50 with respect to the nickase system. Furthermore, the dual enzyme hybrid scaffold led to a marginal increase in scaffold N50 and an overall increase of confidence in scaffolds. After removal of haplotigs, the final assembly is approximately 1.21 Gbp in size, with a N50 value of over 25.95 Mbp. The high genome contiguity achieved represents an improvement over 650-fold with respect to a previously reported assembly based on paired-end short read data, and it is well in excess of those specified for “Platinum genomes” by the Vertebrate Genomes Project. It can therefore constitute a valuable resource for studies concerning the evolution of avian genomes in general as well as for population genetics and genomics in the barn swallow, with the potential for boosting research on the barn swallow biology and ecology at unprecedented speed. This scientific endeavour culminated in a publication that I authored entitled “SMRT long-read sequencing and Direct Label and Stain optical maps allow the generation of a high-quality genome assembly for the European barn swallow (Hirundo rustica rustica)” published in the peer-reviewed journal Gigascience (IF 7.5, 2016). The second section of this thesis presents the methodological work and the conclusions drawn from my - and other collaborators - work on the study of the evolutionary origins of Huntington’s Disease, a genetic neurodegenerative disorder. The study was conducted in the Laboratory of Stem Cell Biology and Pharmacology of Neurodegenerative Diseases directed by Prof. Elena Cattaneo at the University of Milan where I worked for the first two years of my PhD (and also during my Master Thesis work) and whose research effort is on the phylogenetic and biological investigation of HD causative gene. The goal that I wished to achieve with this study, as part of an on-going effort in the host laboratory aimed at tracing Huntington’s Disease-causing gene throughout evolution, was to reconstruct and understand the evolutionary origins of the CAG repeat embedded into the exon 1 of the Htt gene. This goal could be achieved by collecting DNA sequences from orthologous genes in order to allow a comparative analysis of the differences and similarities between the human sequence and that of other animal species. More specifically, existing sequences could be retrieved from public databases and/or assessed directly by sequencing from biological samples. These samples could be made available from already in place or newly established collaborations. Htt exon 1 sequences could then be aligned to each other in a multiple alignment, resulting in a detailed picture of Htt exon 1 CAG repeats along the tree of life. The multiple alignment, when subjected to a bioinformatics analysis of the selective pressures, could be used to elucidate the evolutionary features of this simple repeat. The study was made possible also thanks to a collaboration between Prof. Cattaneo and my Ph.D. thesis supervisor Prof. Nicola Saino. At the time of writing, a manuscript is in preparation reporting part of the data from this work together with other data obtained in the Cattaneo’s laboratory.

APA, Harvard, Vancouver, ISO, and other styles

6

Takeda, Haruhiko. "Evolution of multi-drug resistant HCV clones from pre-existing resistant-associated variants during direct-acting antiviral therapy determined by third-generation sequencing." Kyoto University, 2018. http://hdl.handle.net/2433/232107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Alic, Andrei Stefan. "Improved Error Correction of NGS Data." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/67630.

Full text

Abstract:

[EN] The work done for this doctorate thesis focuses on error correction of Next Generation Sequencing (NGS) data in the context of High Performance Computing (HPC). Due to the reduction in sequencing cost, the increasing output of the sequencers and the advancements in the biological and medical sciences, the amount of NGS data has increased tremendously. Humans alone are not able to keep pace with this explosion of information, therefore computers must assist them to ease the handle of the deluge of information generated by the sequencing machines. Since NGS is no longer just a research topic (used in clinical routine to detect cancer mutations, for instance), requirements in performance and accuracy are more stringent. For sequencing to be useful outside research, the analysis software must work accurately and fast. This is where HPC comes into play. NGS processing tools should leverage the full potential of multi-core and even distributed computing, as those platforms are extensively available. Moreover, as the performance of the individual core has hit a barrier, current computing tendencies focus on adding more cores and explicitly split the computation to take advantage of them. This thesis starts with a deep analysis of all these problems in a general and comprehensive way (to reach out to a very wide audience), in the form of an exhaustive and objective review of the NGS error correction field. We dedicate a chapter to this topic to introduce the reader gradually and gently into the world of sequencing. It presents real problems and applications of NGS that demonstrate the impact this technology has on science. The review results in the following conclusions: the need of understanding of the specificities of NGS data samples (given the high variety of technologies and features) and the need of flexible, efficient and accurate tools for error correction as a preliminary step of any NGS postprocessing. As a result of the explosion of NGS data, we introduce MuffinInfo. It is a piece of software capable of extracting information from the raw data produced by the sequencer to help the user understand the data. MuffinInfo uses HTML5, therefore it runs in almost any software and hardware environment. It supports custom statistics to mould itself to specific requirements. MuffinInfo can reload the results of a run which are stored in JSON format for easier integration with third party applications. Finally, our application uses threads to perform the calculations, to load the data from the disk and to handle the UI. In continuation to our research and as a result of the single core performance limitation, we leverage the power of multi-core computers to develop a new error correction tool. The error correction of the NGS data is normally the first step of any analysis targeting NGS. As we conclude from the review performed within the frame of this thesis, many projects in different real-life applications have opted for this step before further analysis. In this sense, we propose MuffinEC, a multi-technology (Illumina, Roche 454, Ion Torrent and PacBio -experimental), any-type-of-error handling (mismatches, deletions insertions and unknown values) corrector. It surpasses other similar software by providing higher accuracy (demonstrated by three type of tests) and using less computational resources. It follows a multi-steps approach that starts by grouping all the reads using a k-mers based metric. Next, it employs the powerful Smith-Waterman algorithm to refine the groups and generate Multiple Sequence Alignments (MSAs). These MSAs are corrected by taking each column and looking for the correct base, determined by a user-adjustable percentage. This manuscript is structured in chapters based on material that has been previously published in prestigious journals indexed by the Journal of Citation Reports (on outstanding positions) and relevant congresses.
[ES] El trabajo realizado en el marco de esta tesis doctoral se centra en la corrección de errores en datos provenientes de técnicas NGS utilizando técnicas de computación intensiva. Debido a la reducción de costes y el incremento en las prestaciones de los secuenciadores, la cantidad de datos disponibles en NGS se ha incrementado notablemente. La utilización de computadores en el análisis de estas muestras se hace imprescindible para poder dar respuesta a la avalancha de información generada por estas técnicas. El uso de NGS transciende la investigación con numerosos ejemplos de uso clínico y agronómico, por lo que aparecen nuevas necesidades en cuanto al tiempo de proceso y la fiabilidad de los resultados. Para maximizar su aplicabilidad clínica, las técnicas de proceso de datos de NGS deben acelerarse y producir datos más precisos. En este contexto es en el que las técnicas de comptuación intensiva juegan un papel relevante. En la actualidad, es común disponer de computadores con varios núcleos de proceso e incluso utilizar múltiples computadores mediante técnicas de computación paralela distribuida. Las tendencias actuales hacia arquitecturas con un mayor número de núcleos ponen de manifiesto que es ésta una aproximación relevante. Esta tesis comienza con un análisis de los problemas fundamentales del proceso de datos en NGS de forma general y adaptado para su comprensión por una amplia audiencia, a través de una exhaustiva revisión del estado del arte en la corrección de datos de NGS. Esta revisión introduce gradualmente al lector en las técnicas de secuenciación masiva, presentando problemas y aplicaciones reales de las técnicas de NGS, destacando el impacto de esta tecnología en ciencia. De este estudio se concluyen dos ideas principales: La necesidad de analizar de forma adecuada las características de los datos de NGS, atendiendo a la enorme variedad intrínseca que tienen las diferentes técnicas de NGS; y la necesidad de disponer de una herramienta versátil, eficiente y precisa para la corrección de errores. En el contexto del análisis de datos, la tesis presenta MuffinInfo. La herramienta MuffinInfo es una aplicación software implementada mediante HTML5. MuffinInfo obtiene información relevante de datos crudos de NGS para favorecer el entendimiento de sus características y la aplicación de técnicas de corrección de errores, soportando además la extensión mediante funciones que implementen estadísticos definidos por el usuario. MuffinInfo almacena los resultados del proceso en ficheros JSON. Al usar HTML5, MuffinInfo puede funcionar en casi cualquier entorno hardware y software. La herramienta está implementada aprovechando múltiples hilos de ejecución por la gestión del interfaz. La segunda conclusión del análisis del estado del arte nos lleva a la oportunidad de aplicar de forma extensiva técnicas de computación de altas prestaciones en la corrección de errores para desarrollar una herramienta que soporte múltiples tecnologías (Illumina, Roche 454, Ion Torrent y experimentalmente PacBio). La herramienta propuesta (MuffinEC), soporta diferentes tipos de errores (sustituciones, indels y valores desconocidos). MuffinEC supera los resultados obtenidos por las herramientas existentes en este ámbito. Ofrece una mejor tasa de corrección, en un tiempo muy inferior y utilizando menos recursos, lo que facilita además su aplicación en muestras de mayor tamaño en computadores convencionales. MuffinEC utiliza una aproximación basada en etapas multiples. Primero agrupa todas las secuencias utilizando la métrica de los k-mers. En segundo lugar realiza un refinamiento de los grupos mediante el alineamiento con Smith-Waterman, generando contigs. Estos contigs resultan de la corrección por columnas de atendiendo a la frecuencia individual de cada base. La tesis se estructura por capítulos cuya base ha sido previamente publicada en revistas indexadas en posiciones dest
[CAT] El treball realitzat en el marc d'aquesta tesi doctoral se centra en la correcció d'errors en dades provinents de tècniques de NGS utilitzant tècniques de computació intensiva. A causa de la reducció de costos i l'increment en les prestacions dels seqüenciadors, la quantitat de dades disponibles a NGS s'ha incrementat notablement. La utilització de computadors en l'anàlisi d'aquestes mostres es fa imprescindible per poder donar resposta a l'allau d'informació generada per aquestes tècniques. L'ús de NGS transcendeix la investigació amb nombrosos exemples d'ús clínic i agronòmic, per la qual cosa apareixen noves necessitats quant al temps de procés i la fiabilitat dels resultats. Per a maximitzar la seua aplicabilitat clínica, les tècniques de procés de dades de NGS han d'accelerar-se i produir dades més precises. En este context és en el que les tècniques de comptuación intensiva juguen un paper rellevant. En l'actualitat, és comú disposar de computadors amb diversos nuclis de procés i inclús utilitzar múltiples computadors per mitjà de tècniques de computació paral·lela distribuïda. Les tendències actuals cap a arquitectures amb un nombre més gran de nuclis posen de manifest que és esta una aproximació rellevant. Aquesta tesi comença amb una anàlisi dels problemes fonamentals del procés de dades en NGS de forma general i adaptat per a la seua comprensió per una àmplia audiència, a través d'una exhaustiva revisió de l'estat de l'art en la correcció de dades de NGS. Esta revisió introduïx gradualment al lector en les tècniques de seqüenciació massiva, presentant problemes i aplicacions reals de les tècniques de NGS, destacant l'impacte d'esta tecnologia en ciència. D'este estudi es conclouen dos idees principals: La necessitat d'analitzar de forma adequada les característiques de les dades de NGS, atenent a l'enorme varietat intrínseca que tenen les diferents tècniques de NGS; i la necessitat de disposar d'una ferramenta versàtil, eficient i precisa per a la correcció d'errors. En el context de l'anàlisi de dades, la tesi presenta MuffinInfo. La ferramenta MuffinInfo és una aplicació programari implementada per mitjà de HTML5. MuffinInfo obté informació rellevant de dades crues de NGS per a afavorir l'enteniment de les seues característiques i l'aplicació de tècniques de correcció d'errors, suportant a més l'extensió per mitjà de funcions que implementen estadístics definits per l'usuari. MuffinInfo emmagatzema els resultats del procés en fitxers JSON. A l'usar HTML5, MuffinInfo pot funcionar en gairebé qualsevol entorn maquinari i programari. La ferramenta està implementada aprofitant múltiples fils d'execució per la gestió de l'interfície. La segona conclusió de l'anàlisi de l'estat de l'art ens porta a l'oportunitat d'aplicar de forma extensiva tècniques de computació d'altes prestacions en la correcció d'errors per a desenrotllar una ferramenta que suport múltiples tecnologies (Illumina, Roche 454, Ió Torrent i experimentalment PacBio). La ferramenta proposada (MuffinEC), suporta diferents tipus d'errors (substitucions, indels i valors desconeguts). MuffinEC supera els resultats obtinguts per les ferramentes existents en este àmbit. Oferix una millor taxa de correcció, en un temps molt inferior i utilitzant menys recursos, la qual cosa facilita a més la seua aplicació en mostres més gran en computadors convencionals. MuffinEC utilitza una aproximació basada en etapes multiples. Primer agrupa totes les seqüències utilitzant la mètrica dels k-mers. En segon lloc realitza un refinament dels grups per mitjà de l'alineament amb Smith-Waterman, generant contigs. Estos contigs resulten de la correcció per columnes d'atenent a la freqüència individual de cada base. La tesi s'estructura per capítols la base de la qual ha sigut prèviament publicada en revistes indexades en posicions destacades de l'índex del Journal of Citation Repor
Alic, AS. (2016). Improved Error Correction of NGS Data [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/67630
TESIS

APA, Harvard, Vancouver, ISO, and other styles

8

Broseus, Lucile. "Méthodes d'étude de la rétention d'intron à partir de données de séquençage de seconde et de troisième générations." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTT027.

Full text

Abstract:

On reconnaît maintenant aux transcrits des implications multiples dans le fonctionnement des cellules eucaryotes. En plus de leur rôle originel de messagers assurant la liaison entre l'ADN et la synthèse protéique, l’usage de transcrits alternatifs apparaît comme un mode de contrôle post-transcriptionnel de l'expression génique. Exemplairement, plusieurs mécanismes distincts de régulation impliquant la production de transcrits matures retenant des introns (IRTs) ont été récemment décrits. Ces observations sont largement tributaires du développement de la seconde génération de séquençage haut-débit de l'ARN (RNA-seq). Cependant, ces données ne permettent pas d’identifier la structure complète des IRTs , dont le répertoire est encore très parcellaire. L’émergence d’une troisième génération de séquençage, à même de lire les transcrits dans leur intégralité, pourrait permettre d’y remédier. Bien que chaque technologie présente des inconvénients propres qui n'autorisent qu'une vision partielle et partiale du transcriptome, elles se complètent sur plusieurs points. Leur association, au moyen de méthodes dites hybrides, offre donc des perspectives intéressantes pour aborder l'étude des isoformes. L'objet de cette thèse est d'examiner ce que ces deux types de données peuvent, seuls ou combinés, apporter plus spécifiquement à l'étude des événements de rétention d'intron (IR). Un nombre croissant de travaux exploitent la profondeur et la largeur de couverture des données de seconde génération pour déceler et quantifier l'IR. Toutefois, il existe encore peu de méthodes informatiques dédiées à leur analyse et l’on fait souvent appel à des méthodes conçues pour d'autres usages comme l'étude de l'expression des gènes ou des exons. En tous les cas, leur capacité à apprécier correctement l'IR ne sont pas garanties. C'est la raison pour laquelle nous mettons en place un plan d'évaluation des méthodes de mesure des niveaux d’IR. Cette analyse révèle un certain nombre de biais, susceptibles de nuire à l'interprétation des résultats et nous amène à proposer une nouvelle méthode d’estimation. Au-delà de la vision centrée sur les variants, les données de longs reads Oxford Nanopore ont le potentiel de révéler la structure complète des IRTs, et ainsi, d’inférer un certain nombre de leurs caractéristiques. Cependant, leur taux d’erreur élevé et la troncation des séquences sont des obstacles incontournables. A large échelle, le traitement informatique de ces données nécessite l’introduction d’heuristiques, qui privilégient certaines formes de transcrits et, en général, occultent les formes rares ou inattendues. Il en résulte une perte importante d’information et de qualité d’interprétation. Pour la réduire, nous développons une méthode hybride de correction des séquences et proposons des stratégies ciblées pour reconstituer et caractériser les IRTs
In eucaryotic cells, the roles of RNA transcripts are known to be varied. Besides their role as messengers, transferring information from DNA to protein synthesis, the usage of alternative transcripts appears as a means to control gene expression in a post-transcriptional manner. Exemplary, the production of mature transcripts retaining introns (IRTs) was recently shown to take part in several distinct regulatory mechanisms. These observations benefited greatly from the development of the second generation of RNA-sequencing (RNA-seq). However, these data do not allow to identify the entire structure of IRTs, whose catalog is still fragmented. The emerging third generation of RNA-seq, apt to read RNA sequences in their full extent, could help achieve this goal. Despite their respective drawbacks and biases, both technologies are, to some extent, complementary. It is therefore appealing to try and combine them through so-called hybrid methods, so as to perform analyses at the isoform level. In the present thesis, we aim to investigate the potential of these two types of data, alone or in combination, in order to study intron retention (IR) events, more specifically. A growing number of studies harness the high coverage depths provided by second generation data to detect and quantify IR. However, there exist few dedicated computational methods, and many studies rely on methods designed for other purposes, such as gene or exon expression analysis. In any case, their ability to accurately measure IR has not been certified. For this reason, we set up a benchmark of the various IR quantification methods. Our study reveals several biases, prone to prejudice the interpretation of results and prompted us to suggest a novel method to estimate IR levels. Beyond event-centered analyses, Oxford Nanopore long read data have the capability to reveal the full-length structure of IRTs, and thereby to allow to infer some of their features. However, their high error rate and truncation events constitute inescapable impediments. Transcriptome-wide, the computational treatment of these data necessitates heuristics which will favor specific transcript forms, and, generally, overlook rare or unexpected ones. This results in a considerable loss of information and precludes meaningful interpretations. To address these issues, we develop a hybrid correction method and suggest specific strategies to recover and characterize IRTs

APA, Harvard, Vancouver, ISO, and other styles

9

CHUANG, WEI-YAO, and 莊為堯. "Acceleration of Alignment-based Error Correction for Third-generation Sequencing." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/s83638.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
106
The 3rd generation sequencing can produce long reads with fast turnaround time yet also with high error rate. Consequently, errors on the sequencing reads are usually corrected before genome assembly. One of the strategies of error correction is alignment-based method, which requires time-consuming alignment among reads based on dynamic programming (DP). In this thesis, we implement a bit-parallelism algorithm to accelerate DP and compare with traditional banded DP speedup. In addition, the bit-parallelism algorithm is fine tuned for correcting errors specific in third-generation sequencing. The results showed that, though bit-parallelism DP is faster than banded DP, the accuracy is unexpectedly decreased. Further investigation indicated that bit-parallelism DP performs worse in tandem repeat regions, which requires specific algorithms for better accuracy.

APA, Harvard, Vancouver, ISO, and other styles

10

Chen, Jia-Min, and 陳珈民. "Error correction by adaptive FM-index extension for third-generation sequencing." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/c8a538.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
106
Third-generation sequencing technologies are able to generate longer reads within shorter turnaround time, but they come at the cost of higher sequencing error rates. Therefore, prior to genome assembly, error correction is required to reduce the errors presented in the sequencing reads. The error correction and assembly software that we developed (called FILEC) has improved the speed and contiguity of a leading genome assembler called Canu; however, the assembly accuracy of FILEC is lower than that of Canu. In this thesis, we first investigated the regions FILEC tend to wrongly corrected, and observed that they are regions containing low-coverage repeats and tandem repeats. Subsequently, we develop new methods for identifying and for improving the correction algorithms specifically for these regions. The experimental results indicated that the accuracies can be slightly improved by improving the original alignment-free correction algorithm. But surprisingly, the accuracies can be greatly improved by the slower alignment-based correction using dynamic programming. Our results imply a good balance of alignment-free and alignment-based correction algorithms is crucial for improving both assembly speed and accuracy.

APA, Harvard, Vancouver, ISO, and other styles

11

Chen, Ping-Yeh, and 陳秉燁. "A Hybrid Error Correction Algorithm for Third-Generation Sequencing Using FM-Index." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/sp4ska.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
103
The advent of next-generation and third generation sequencing technologies offer higher throughput and lower cost for sequencing and assembling a genome. The third generation sequencing technology is able to generate much longer reads, but the advantages of this technology is reduced by its high-error rates. Self-error correction using high-coverage third-generation data is required prior to assembly, but the efficiency and cost are still unsatisfactory. In this thesis, we propose a new hybrid correction algorithm to correct third-generation reads using a FM-index constructed from short/high-quality reads. We replace the erroneous regions in third-generation reads by searching for an alternative-path sequence implied by short/high-quality reads. The accuracy of corrected reads is higher comparison with existing methods. In addition, the assembly contiguity is also improved. Our program requires low memory usage, reasonable running times, and is flexible for hybrid correction of various sequencing technologies.

APA, Harvard, Vancouver, ISO, and other styles

12

TSAI, CHENG-WEI, and 蔡政威. "A self-error correction algorithm for third-generation sequencing using FM-index." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/btedu2.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
105
The 3rd-generation sequencing technologies are becoming the popular choice in de novo assembly projects, because of long reads, less sequencing bias, and more uniform coverage. But it comes at the cost of much higher error rates and thus error correction is often performed prior to assembly. Currently, error correction methods can be divided into alignment-based and alignment-free approaches. Alignment-based methods are more time-consuming but able to correct reads in repetitive and low-coverage regions. On the other hand, alignment-free methods are much faster but have less sensitivity. In this thesis, we develop a novel alignment-free algorithm which reduces the correction problem to a path-searching problem via FM-index extension. In order to correct reads in low-coverage and repetitive regions, an adaptive seeding algorithm using multiple sizes of k-mers is developed. The experimental results indicated that our method is faster than existing alignment-based and alignment-free methods in E. coli and S. cerevisiae datasets. For large genome datasets, our method is slower than alignment-based methods but still faster than existing alignment-free method.

APA, Harvard, Vancouver, ISO, and other styles

13

LEE, KUAN-WEI, and 李冠緯. "Alignment-Free Error Correction for Third-Generation Sequencing by Adaptive Seed Identification." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ze4j89.

Full text

Abstract:

碩士
國立中正大學
資訊工程研究所
106
The thrid-generation sequencing technology is now commonly used for de novo assembly projects because of longer reads, less sequencing bias, and more uniform coverage. However, it comes at the cost of higher error rate, which requires error correction prior to assembly. The correction algorithms are divided into alignment-based methods like Canu, and alignment-free methods, which face the tradeoff between accuracy and speed. We previously developed an alignment-free algorithm based on FM-index, named FILEC, but the assembly contiguity is unsatisfactory in moderate and large genomes. In this thesis, we propose a new method to improve seeding accuracy of FILEC in repeat regions. The proposed method distinguishes unique and repetitive regions and adaptively uses different seeding strategies. The remaining error seeds were trimmed until the errors were removed. The experiment results showed that our method runs much faster than Canu and guarantees the contiguity and concordance of assembly . In large genome dataset, although the assembly result becomes fragmented, our method is still faster than Canu.

APA, Harvard, Vancouver, ISO, and other styles

14

Cosentino, Emanuela. "Optimization of High and Ultra High Molecular Weight DNA purification for Third Generation Sequencing and Optical Mapping in algae." Doctoral thesis, 2020. http://hdl.handle.net/11562/1018425.

Full text

Abstract:

The analysis of long DNA molecules by novel genomic technologies, such as Bionano optical mapping and Third Generation Sequencing, including PacBio Single Molecule Real Time Sequencing and Oxford Nanopore sequencing, provide the opportunity for complete genome characterization and reconstruction, allowing to identify large (balanced) structural variants, to determine the variant phasing and haplotype, to sequence full-length repeated regions and to assemble and scaffold genomes de-novo. Implementation of these technologies requires a combination of highly pure and High Molecular Weight (HMW) DNA, >10^5bp (Bionano Optical Mapping) or >10^4bp (Third Generation Sequencing) in length. However, standardized and suitable extraction methods to obtain highly pure HMW DNA are still missing for many organisms and tissues. In particular, plants and algae store a large amount of phenolic compounds, polysaccharides and a high copy number of chloroplast and mitochondrial DNA, making the extraction of both pure and HMW genomic DNA challenging. The aim of this work was the optimization of methods for the purification of highly pure and (Ultra)HMW DNA from a microalgae selected as case study, Haematococcus pluvialis (H.pluvialis), suitable for Third Generation sequencing and Bionano optical mapping. Despite H.pluvialis is unicellular green microalgae extensively studied for industrial applications, a high quality genome for its biotechnological application is still missing. Therefore, an extensive benchmarking of DNA and nuclei isolation methods was conducted to produce high-quality HMW DNA suitable to generate Third Generation sequencing and Bionano optical mapping data for the reconstruction of its genome de-novo. 4 (U)HMW DNA extraction methods and 8 nuclei isolation methods and 4 post-extraction DNA purification methods were evaluated independently or in combination. To further improve DNA purity and optimize the production of high-quality sequencing data, 4 post-extraction DNA purification methods were also tested. The methods were compared in terms of yield, length and purity of extracted DNA and its analysis by Third Generation sequencing and optical mapping. Only 3 specific combinations of these protocols yielded suitable DNA to generate successful results with PacBio (CTAB buffer+AMPureXP beads purification), Oxford Nanopore (MEB buffer+G-tip- DNA based extraction) and Bionano (MEB buffer+plug- DNA based extraction). The data produced herein can be used to obtain a highly contiguous genome for H.pluvialis with the efficient reconstruction of repetitive genomic portions (highly present in H.pluvialis genome), by eliminating ambiguity in the positions or size of genomic elements.

APA, Harvard, Vancouver, ISO, and other styles

15

Huang, Po-Hao, and 黃柏豪. "Comparison of hsp60 gene sequencing and MALDI-TOF MS for species identification of Enterobacter cloacae complex and the characteristics of resistance mechanisms and clinical features for the third-generation cephalosporin resistant Enterobacter cloacae complex." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/52692600509999858675.

Full text

Abstract:

碩士
高雄醫學大學
醫學研究所碩士班
104
Enterobacter cloacae complex is ubiquitous in different environments and has been increasingly isolated as nosocomial pathogens. Different species of E. cloacae complex may possess distinct infectious potentials and different pathogenicity towards humans, resulting in different clinical outcomes. However, conventional phenotypic identification of E. cloacae complex was difficult and unreliable. Therefore, relevant studies were lacking. This study aimed to identify species of E. cloacae complex by hsp60 sequencing and MALDI-TOF MS and to compare their application on species identification of E. cloacae complex. Furthermore, the prevalence of multidrug resistance among isolates of the E. cloacae complex in human infections is rising. This study investigated the prevalence of resistance to antimicrobial agents in E. cloacae complex. The β-lactamase genes, class 1 integrons and gene cassettes were characterized by PCR and sequencing. Clinical features of E. cloacae complex infection were also elucidated in this study. One hundred and eighty four isolates of E. cloacae complex were collected consecutively from December 2013 to June 2014 from Kaohsiung Medical University Hospital. hsp60 gene sequencing was performed by amplifying and sequencing a fragment of the hsp60 gene. 95.7% (176/184) of the isolates were assigned to their respective species, subspecies, or genetic clusters by hsp60 gene sequencing. The four most frequently identified species and subspecies were E. hormaechei subsp. steigerwaltii (55/184, 29.9%), E. hormaechei subsp. ohrae (37/184, 20.1%), E. cloacae subsp. cloacae (22/184, 12%) and E. kobei (19/184, 10.3%). MALDI-TOF identified the majority of the isolates as E. cloacae (110/184, 59.8%), followed by E. asburiae (44/184, 23.9%), E. cloacae subsp. cloacae (22/184, 12%), E. kobei (6/184, 3.3%), and E. cloacae subsp. dissolvens (2/184, 1.1%). Comparing hsp60 sequencing with MALDI-TOF, the identification for E. cloacae subsp. cloacae and E. cloacae subsp. dissolvens was the same between these two methods. The results of MALDI-TOF had 22.8% coincidence with those obtained by hsp60 sequencing when taking the hsp60 sequencing as the standard. Sixty four (34.8%) of the 184 isolates were nonsusceptible to one of the third-generation cephalosporins, 51 isolates (27.7%) to ceftazidime and 63 isolates (34.2%) to ceftriaxone. Seven kinds of β-lactamase genes were detected, including SHV-12, CTX-M-15, DHA-1, ACT-1/MIR-1, TEM-1, OXA-1 and IMP-8. Among the 64 third-generation cephalosporin-nonsusceptible isolates, 32 (32/64, 50%) had at least one kind of ESBL, AmpC β-lactamase or carbapenemase genes. Forty-five isolates (45/184, 24.5%) carried the class 1 integron gene intI1. The antibiotic resistant gene cassettes included those encoding resistance to trimethoprim (dfrA7, 12, 15, 27), gentamicin (aadB), streptomycin (aadA1, 2), erythromycin (ereA2), rifampin (arr3), aminoglyco¬side-3’-N-acetyltransferase aac3, and aminoglyco¬side-6’-N-acetyltransferase aac(6'')-Ib-cr and aac(6'')-IIc. The non-susceptibilities for seven antibiotics were significantly associated with the presence of class 1 integrons (p < 0.001). These antibiotics were ceftazidime, ceftriaxone, gentamicin, levofloxacin, trimethoprim-sulfamethoxazole, tigecycline and piperacillin/tazobactam. The PFGE analysis revealed that 34 isolates belonged to 13 pulsotypes (A~M). Isolates in the same pulsotype belonged to the same hsp60-based genetic cluster except for pulsotypes B, H and L. The clinical features of E. cloacae complex infection demonstrated that the rates of the third-generation cephalosporin-nonsusceptible isolates were significantly higher in dialysis patients and in 30-day and 100-day mortality. The relevant factors for infection of the third-generation cephalosporin-nonsusceptible isolates including age > 65years old (p = 0.031), inpatient (p = 0.006), renal disease (p = 0.002), dialysis (p = 0.024), catheter usage (p = 0.020) and stay in ICU (p = 0.001). The outcomes revealed the rates of 30-day and 100-day mortality were significantly higher in patients infected with third-generation cephalosporin-nonsusceptible isolates than those in patients infected with third-generation cephalosporin-susceptible isolates. Comparing cluster XI with clusters VI and VIII, cluster XI caused community infections more usually than clusters VI and VIII (cluster XI vs. VI, p = 0.006; cluster XI vs. VIII, p = 0.034). The 30-day and 100-day mortality of cluster XI -infected patients was significantly higher than that of clusters II and VIII-infected patients. Moreover, multivariate analysis showed that stay in ICU was significantly associated with higher patient mortality. In conclusion, hsp60 sequencing can identify species, subspecies and genetic clusters of E. cloacae complex efficiently. MALDI-TOF MS can not identify E. hormaechei and its subspecies. Therefore, hsp60 sequencing is superior to MALDI-TOF MS for species identification of E. cloacae complex. Cluster XI infections usually occur in community. Stay in ICU was significantly associated with patient mortality with E. cloacae complex infections.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!