Dissertations / Theses on the topic 'Very high throughput sequencing'

To see the other types of publications on this topic, follow the link: Very high throughput sequencing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Very high throughput sequencing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Koreki, Axelle. "Recherche de déterminants génétiques de la résistance aux herbicides auxiniques chez le Coquelicot (Papaver rhoeas L.) dans un but de diagnostic." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCK005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Le coquelicot (Papaver rhoeas) est une adventice cosmopolite très répandue dans les cultures de céréales d’hiver en Europe qui présente un haut potentiel d’invasion et de propagation dans les cultures. Il est principalement contrôlé par les herbicides inhibiteurs de l’ALS et les herbicides auxiniques. L’utilisation intensive de ces deux modes d’action à conduit à l’évolution de la résistance dans de nombreuses populations de coquelicot à travers l’Europe. La résistance aux herbicides implique deux catégories de mécanismes : la résistance liée à la cible (RLC) et la résistance non liée à la cible (RNLC). Chez le coquelicot, seuls des mécanismes de RNLC ont été identifiés, mais les gènes spécifiques restent inconnus. Ce travail de thèse a donc plusieurs objectifs : (i) identifier et potentiellement valider les déterminants génétiques de la résistance aux herbicides auxiniques chez le coquelicot et (ii) évaluer la présence de résistance aux herbicides auxiniques dans des populations françaises de Coquelicot. Dans une première partie, nous avons caractérisé phénotypiquement le matériel végétal disponible via des tests biologiques de sensibilités aux herbicides (Chapitre 1) pour évaluer la situation de la résistance des coquelicots aux herbicides auxiniques en France. Nous avons montré que la résistance au 2,4-D en France était répandue, voire très bien installée dans certaines zones. Nous avons également identifié deux parcelles en Italie et en Grèce où des plantes résistantes à l’halauxifène-méthyl ont été détecté, suggérant un début d’évolution de la résistance à ce nouvel herbicide de synthèse. Les populations avec un ratio équilibré d’individus résistants et sensibles ont été utilisées pour la production de matériel végétal pour les approches de biologie moléculaire de la deuxième partie.Dans une deuxième partie, nous avons étudié la résistance constitutive au 2,4-D et à l’halauxifène-méthyl parmi 14 populations via le séquençage de l’ARN (RNAseq) (Chapitre 2). Nous avons montré que les profils d’expression des plantes sensibles et résistantes étaient propres à chaque population. Parmi les gènes différentiellement exprimés chez les plantes résistantes, certaines familles de gènes potentiellement impliqués dans la métabolisation des herbicides (CYP450, GST, transporteurs ABCs etc.) ou des cascades de régulation (facteurs de transcription, protéines kinases) ont été identifiées. Sur la base de ces résultats, le niveau d’expression de ces gènes à été validé via une approche de RT-qPCR à partir d’un échantillon plus large de plantes. L’ensemble des résultats indiquent qu’il existe potentiellement une grande variété de mécanismes de résistance inter- et intra-population. Le deuxième RNAseq (Chapitre 3) visait à étudier la réponse transcriptomique des plantes résistantes et sensibles entre 4h et 48h après l’application du 2,4-D dans deux populations. Nous avons identifié une grande diversité de gènes et de familles de gènes spécifiquement induits chez les plantes résistantes des deux populations, mais leur rôle dans la résistance n’a pas pu être vérifié. Comme dans la résistance constitutive, il peut potentiellement s’agir d’enzymes de détoxication, de transporteurs, voire de potentiels gènes cibles de l’auxine ou de gènes associés à la réponse générale au stress. De plus, le 2,4-D induit une réponse rapide qui est détectable dans les 4h suivant le traitement quels que soient le phénotype et la population. Enfin, la comparaison des gènes différentiellement exprimés de façon constitutive entre les deux approches de RNAseq démontre que l’absence de gènes communs est potentiellement due à une diversité élevée de mécanismes de résistance intra- et -inter populations, ou au fait que les mécanismes qui contribuent le plus à la résistance sont dû à des mutations de structure
Corn poppy (Papaver rhoeas) is a very widespread cosmopolitan weed in winter crops cereal in Europe which has a high potential for invasion and spread in crops. It is mainly controlled by ALS inhibitor herbicides and auxin herbicides. The intensive use of these two modes of action has led to the evolution of resistance in many poppy populations across Europe. Herbicide resistance involves two categories of mechanisms: target-site-based resistance (TSR) and non-target-site-based resistance (NTSR). In poppy, only NTSR mechanisms have been identified, but the specific genes remain unknown. This work therefore has several goals: (i) identify and potentially validate the genetic determinants of resistance to auxin herbicides in corn poppy and (ii) evaluate resistance status to auxin herbicides in French poppy populations.In a first part, we phenotypically characterized the plant material available using herbicides sensitivity bioassays (Chapter 1) to assess the resistance status of poppies to auxin herbicides in France. We have shown that resistance to 2,4-D in France was widespread, even very well established in certain areas. We also identified two areas in Italy and Greece where resistant plants to halauxifen-methyl were detected, suggesting the beginning of the evolution of resistance to this new synthetic herbicide. Populations with a balanced ratio of resistant and sensitive individuals were used for plant material production for the molecular biology approaches of the second part.In a second part, we studied constitutive resistance to 2,4-D and halauxifen-methyl among 14 populations via RNA sequencing (RNAseq) (Chapter 2). We showed that the expression profiles of sensitive and resistant plants were specific to each population. Among the genes differentially expressed in resistant plants, some gene families potentially involved in the metabolism of herbicides (CYP450, GST, ABC transporters, etc.) or regulatory cascades (transcription factors, protein kinases) have been identified. Based on these results, the expression level of these genes was validated via an RT-qPCR approach using a larger sample of plants. All the results indicate that there is potentially a wide variety of inter- and intra-population resistance mechanisms.The second RNAseq (Chapter 3) aimed to study the transcriptomic response of resistant and sensitive plants between 4h and 48h after the application of 2,4-D in two populations. We identified a large diversity of genes and gene families specifically induced in resistant plants from both populations, but their role in resistance could not be verified. As in constitutive resistance, these can potentially be detoxification enzymes, transporters, or even potential auxin target genes or genes associated with the general stress response. In addition, 2,4-D induces a rapid response which is detectable within 4 hours following treatment regardless of the phenotype and population.Finally, the comparison of constitutively differentially expressed genes between the two RNAseq approaches demonstrates that the absence of common genes is potentially due to a high diversity of intra- and -inter population resistance mechanisms, or to the fact that the mechanisms that contribute the most to resistance are due to structural mutations
2

Roguski, Łukasz 1987. "High-throughput sequencing data compression." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/565775.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thanks to advances in sequencing technologies, biomedical research has experienced a revolution over recent years, resulting in an explosion in the amount of genomic data being generated worldwide. The typical space requirement for storing sequencing data produced by a medium-scale experiment lies in the range of tens to hundreds of gigabytes, with multiple files in different formats being produced by each experiment. The current de facto standard file formats used to represent genomic data are text-based. For practical reasons, these are stored in compressed form. In most cases, such storage methods rely on general-purpose text compressors, such as gzip. Unfortunately, however, these methods are unable to exploit the information models specific to sequencing data, and as a result they usually provide limited functionality and insufficient savings in storage space. This explains why relatively basic operations such as processing, storage, and transfer of genomic data have become a typical bottleneck of current analysis setups. Therefore, this thesis focuses on methods to efficiently store and compress the data generated from sequencing experiments. First, we propose a novel general purpose FASTQ files compressor. Compared to gzip, it achieves a significant reduction in the size of the resulting archive, while also offering high data processing speed. Next, we present compression methods that exploit the high sequence redundancy present in sequencing data. These methods achieve the best compression ratio among current state-of-the-art FASTQ compressors, without using any external reference sequence. We also demonstrate different lossy compression approaches to store auxiliary sequencing data, which allow for further reductions in size. Finally, we propose a flexible framework and data format, which allows one to semi-automatically generate compression solutions which are not tied to any specific genomic file format. To facilitate data management needed by complex pipelines, multiple genomic datasets having heterogeneous formats can be stored together in configurable containers, with an option to perform custom queries over the stored data. Moreover, we show that simple solutions based on our framework can achieve results comparable to those of state-of-the-art format-specific compressors. Overall, the solutions developed and described in this thesis can easily be incorporated into current pipelines for the analysis of genomic data. Taken together, they provide grounds for the development of integrated approaches towards efficient storage and management of such data.
Gràcies als avenços en el camp de les tecnologies de seqüenciació, en els darrers anys la recerca biomèdica ha viscut una revolució, que ha tingut com un dels resultats l'explosió del volum de dades genòmiques generades arreu del món. La mida típica de les dades de seqüenciació generades en experiments d'escala mitjana acostuma a situar-se en un rang entre deu i cent gigabytes, que s'emmagatzemen en diversos arxius en diferents formats produïts en cada experiment. Els formats estàndards actuals de facto de representació de dades genòmiques són en format textual. Per raons pràctiques, les dades necessiten ser emmagatzemades en format comprimit. En la majoria dels casos, aquests mètodes de compressió es basen en compressors de text de caràcter general, com ara gzip. Amb tot, no permeten explotar els models d'informació especifícs de dades de seqüenciació. És per això que proporcionen funcionalitats limitades i estalvi insuficient d'espai d'emmagatzematge. Això explica per què operacions relativament bàsiques, com ara el processament, l'emmagatzematge i la transferència de dades genòmiques, s'han convertit en un dels principals obstacles de processos actuals d'anàlisi. Per tot això, aquesta tesi se centra en mètodes d'emmagatzematge i compressió eficients de dades generades en experiments de sequenciació. En primer lloc, proposem un compressor innovador d'arxius FASTQ de propòsit general. A diferència de gzip, aquest compressor permet reduir de manera significativa la mida de l'arxiu resultant del procés de compressió. A més a més, aquesta eina permet processar les dades a una velocitat alta. A continuació, presentem mètodes de compressió que fan ús de l'alta redundància de seqüències present en les dades de seqüenciació. Aquests mètodes obtenen la millor ratio de compressió d'entre els compressors FASTQ del marc teòric actual, sense fer ús de cap referència externa. També mostrem aproximacions de compressió amb pèrdua per emmagatzemar dades de seqüenciació auxiliars, que permeten reduir encara més la mida de les dades. En últim lloc, aportem un sistema flexible de compressió i un format de dades. Aquest sistema fa possible generar de manera semi-automàtica solucions de compressió que no estan lligades a cap mena de format específic d'arxius de dades genòmiques. Per tal de facilitar la gestió complexa de dades, diversos conjunts de dades amb formats heterogenis poden ser emmagatzemats en contenidors configurables amb l'opció de dur a terme consultes personalitzades sobre les dades emmagatzemades. A més a més, exposem que les solucions simples basades en el nostre sistema poden obtenir resultats comparables als compressors de format específic de l'estat de l'art. En resum, les solucions desenvolupades i descrites en aquesta tesi poden ser incorporades amb facilitat en processos d'anàlisi de dades genòmiques. Si prenem aquestes solucions conjuntament, aporten una base sòlida per al desenvolupament d'aproximacions completes encaminades a l'emmagatzematge i gestió eficient de dades genòmiques.
3

Mozere, M. "High-throughput sequencing analysis pipeline." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1528797/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
High-throughput sequencing methods were developed to increase the productivity of processing data from genomic DNA. Sequencing platforms are generating massive amounts of genetic variation data which makes it difficult to pinpoint a small subset of functionally important variants. The focus has now shifted from generating sequences to searching for the critical differences that separate normal variants from disease ones. Our High-throughput Sequencing Analysis Pipeline (HSAP) is a multistep analysis software designed to annotate and filter variants in a top-down fashion from Variant Calling Format (VCF) files in order to find disease causing variants in the patients. It is designed in Linux medium and is composed of a collection of interacting task-specific modules written in different programming languages (such as Python, C++) and shell scripts. Each module is designed to perform a specific task, such as: annotate variants with their functional characterisation, zygosity status, allele frequencies within population; filter variants depending on the inherited disease model, read depth, call quality, physical location and other criteria. The output is added to the universal VCF format file, which contains annotated and filtered genomic variants. The pipeline was verified by identifying/confirming a specific disease-causing mutation for a single-gene disorder. HSAP is designed as an open-source locally self-contained bootable software that uses only information from publicly available databases. It has a user-friendly offline web-interface that allows to select different modules and chain them together to create unique filtering arrangements in order to adapt the pipeline as needed.
4

Durif, Ghislain. "Multivariate analysis of high-throughput sequencing data." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1334/document.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
L'analyse statistique de données de séquençage à haut débit (NGS) pose des questions computationnelles concernant la modélisation et l'inférence, en particulier à cause de la grande dimension des données. Le travail de recherche dans ce manuscrit porte sur des méthodes de réductions de dimension hybrides, basées sur des approches de compression (représentation dans un espace de faible dimension) et de sélection de variables. Des développements sont menés concernant la régression "Partial Least Squares" parcimonieuse (supervisée) et les méthodes de factorisation parcimonieuse de matrices (non supervisée). Dans les deux cas, notre objectif sera la reconstruction et la visualisation des données. Nous présenterons une nouvelle approche de type PLS parcimonieuse, basée sur une pénalité adaptative, pour la régression logistique. Cette approche sera utilisée pour des problèmes de prédiction (devenir de patients ou type cellulaire) à partir de l'expression des gènes. La principale problématique sera de prendre en compte la réponse pour écarter les variables non pertinentes. Nous mettrons en avant le lien entre la construction des algorithmes et la fiabilité des résultats.Dans une seconde partie, motivés par des questions relatives à l'analyse de données "single-cell", nous proposons une approche probabiliste pour la factorisation de matrices de comptage, laquelle prend en compte la sur-dispersion et l'amplification des zéros (caractéristiques des données single-cell). Nous développerons une procédure d'estimation basée sur l'inférence variationnelle. Nous introduirons également une procédure de sélection de variables probabiliste basée sur un modèle "spike-and-slab". L'intérêt de notre méthode pour la reconstruction, la visualisation et le clustering de données sera illustré par des simulations et par des résultats préliminaires concernant une analyse de données "single-cell". Toutes les méthodes proposées sont implémentées dans deux packages R: plsgenomics et CMF
The statistical analysis of Next-Generation Sequencing data raises many computational challenges regarding modeling and inference, especially because of the high dimensionality of genomic data. The research work in this manuscript concerns hybrid dimension reduction methods that rely on both compression (representation of the data into a lower dimensional space) and variable selection. Developments are made concerning: the sparse Partial Least Squares (PLS) regression framework for supervised classification, and the sparse matrix factorization framework for unsupervised exploration. In both situations, our main purpose will be to focus on the reconstruction and visualization of the data. First, we will present a new sparse PLS approach, based on an adaptive sparsity-inducing penalty, that is suitable for logistic regression to predict the label of a discrete outcome. For instance, such a method will be used for prediction (fate of patients or specific type of unidentified single cells) based on gene expression profiles. The main issue in such framework is to account for the response to discard irrelevant variables. We will highlight the direct link between the derivation of the algorithms and the reliability of the results. Then, motivated by questions regarding single-cell data analysis, we propose a flexible model-based approach for the factorization of count matrices, that accounts for over-dispersion as well as zero-inflation (both characteristic of single-cell data), for which we derive an estimation procedure based on variational inference. In this scheme, we consider probabilistic variable selection based on a spike-and-slab model suitable for count data. The interest of our procedure for data reconstruction, visualization and clustering will be illustrated by simulation experiments and by preliminary results on single-cell data analysis. All proposed methods were implemented into two R-packages "plsgenomics" and "CMF" based on high performance computing
5

Langenberger, David. "High-throughput sequencing and small non-coding RNAs." Doctoral thesis, Universitätsbibliothek Leipzig, 2013. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-112876.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this thesis the processing mechanisms of short non-coding RNAs (ncRNAs) is investigated by using data generated by the current method of high-throughput sequencing (HTS). The recently adapted short RNA-seq protocol allows the sequencing of RNA fragments of microRNA-like length (∼18-28nt). Thus, after mapping the data back to a reference genome, it is possible to not only measure, but also visualize the expression of all ncRNAs that are processed to fragments of this specific length. Short RNA-seq data was used to show that a highly abundant class of small RNAs, called microRNA-offset-RNAs (moRNAs), which was formerly detected in a basal chordate, is also produced from human microRNA precursors. To simplify the search, the blockbuster tool that automatically recognizes blocks of reads to detect specific expression patterns was developed. By using blockbuster, blocks from moRNAs were detected directly next to the miR or miR* blocks and could thus easily be registered in an automated way. When further investigating the short RNA-seq data it was realized that not only microRNAs give rise to short ∼22nt long RNA pieces, but also almost all other classes of ncRNAs, like tRNAs, snoRNAs, snRNAs, rRNAs, Y-RNAs, or vault RNAs. The formed read patterns that arise after mapping these RNAs back to a reference genome seem to reflect the processing of each class and are thus specific for the RNA transcripts of which they are derived from. The potential of this patterns in classification and identification of non-coding RNAs was explored. Using a random forest classifier which was trained on a set of characteristic features of the individual ncRNA classes, it was possible to distinguish three types of ncRNAs, namely microRNAs, tRNAs, and snoRNAs. To make the classification available to the research community, the free web service ‘DARIO’ that allows to study short read data from small RNA-seq experiments was developed. The classification has shown that read patterns are specific for different classes of ncRNAs. To make use of this feature, the tool deepBlockAlign was developed. deepBlockAlign introduces a two-step approach to align read patterns with the aim of quickly identifying RNAs that share similar processing footprints. In order to find possible exceptions to the well-known microRNA maturation by Dicer and to identify additional substrates for Dicer processing the small RNA sequencing data of a Dicer knockdown experiment in MCF-7 cells was re-evaluated. There were several Dicer-independent microRNAs, among them the important tumor supressor mir-663a. It is known that many aspects of the RNA maturation leave traces in RNA sequencing data in the form of mismatches from the reference genome. It is possible to recover many well- known modified sites in tRNAs, providing evidence that modified nucleotides are a pervasive phenomenon in these data sets.
6

Zhang, Xuekui. "Mixture models for analysing high throughput sequencing data." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/35982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The goal of my thesis is to develop methods and software for analysing high-throughput sequencing data, emphasizing sonicated ChIP-seq. For this goal, we developed a few variants of mixture models for genome-wide profiling of transcription factor binding sites and nucleosome positions. Our methods have been implemented into Bioconductor packages, which are freely available to other researchers. For profiling transcription factor binding sites, we developed a method, PICS, and implemented it into a Bioconductor package. We used a simulation study to confirm that PICS compares favourably to rival methods, such as MACS, QuEST, CisGenome, and USeq. Using published GABP and FOXA1 data from human cell lines, we then show that PICS predicted binding sites were more consistent with computationally predicted binding motifs than the alternative methods. For motif discovery using transcription binding sites, we combined PICS with two other existing packages to create the first complete set of Bioconductor tools for peak-calling and binding motif analysis of ChIP-Seq and ChIP-chip data. We demonstrate the effectiveness of our pipeline on published human ChIP-Seq datasets for FOXA1, ER, CTCF and STAT1, detecting co-occurring motifs that were consistent with the literature but not detected by other methods. For nucleosome positioning, we modified PICS into a method called PING. PING can handle MNase-Seq and MNase- or sonicated-ChIP-Seq data. It compares favourably to NPS and TemplateFilter in scalability, accuracy and robustness to low read density. To demonstrate that PING predictions from sonicated data can have sufficient spatial resolution to be biologically meaningful, we use H3K4me1 data to detect nucleosome shifts, discriminate functional and non-functional transcription factor binding sites, and confirm that Foxa2 associates with the accessible major groove of nucleosomal DNA. All of the above uses single-end sequencing data. At the end of the thesis, we briefly discuss the issue of processing paired-end data, which we are currently investigating.
7

Roberts, Adam. "Ambiguous fragment assignment for high-throughput sequencing experiments." Thesis, University of California, Berkeley, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=3616509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

As the cost of short-read, high-throughput DNA sequencing continues to fall rapidly, new uses for the technology have been developed aside from its original purpose in determining the genome of various species. Many of these new experiments use the sequencer as a digital counter for measuring biological activities such as gene expression (RNA-Seq) or protein binding (ChIP-Seq).

A common problem faced in the analysis of these data is that of sequenced fragments that are "ambiguous", meaning they resemble multiple loci in a reference genome or other sequence. In early analyses, such ambiguous fragments were ignored or were assigned to loci using simple heuristics. However, statistical approaches using maximum likelihood estimation have been shown to greatly improve the accuracy of downstream analyses and have become widely adopted Optimization based on the expectation-maximization (EM) algorithm are often employed by these methods to find the optimal sets of alignments, with frequent enhancements to the model. Nevertheless, these improvements increase complexity, which, along with an exponential growth in the size of sequencing datasets, has led to new computational challenges.

Herein, we present our model for ambiguous fragment assignment for RNA-Seq, which includes the most comprehensive set of parameters of any model introduced to date, as well as various methods we have explored for scaling our optimization procedure. These methods include the use of an online EM algorithm and a distributed EM solution implemented on the Spark cluster computing system. Our advances have resulted in the first efficient solution to the problem of fragment assignment in sequencing.

Furthermore, we are the first to create a fully generalized model for ambiguous fragment assignment and present details on how our method can provide solutions for additional high-throughput sequencing assays including ChIP-Seq, Allele-Specific Expression (ASE), and the detection of RNA-DNA Differences (RDDs) in RNA-Seq.

8

Hoffmann, Steve. "Genome Informatics for High-Throughput Sequencing Data Analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-152643.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis introduces three different algorithmical and statistical strategies for the analysis of high-throughput sequencing data. First, we introduce a heuristic method based on enhanced suffix arrays to map short sequences to larger reference genomes. The algorithm builds on the idea of an error-tolerant traversal of the suffix array for the reference genome in conjunction with the concept of matching statistics introduced by Chang and a bitvector based alignment algorithm proposed by Myers. The algorithm supports paired-end and mate-pair alignments and the implementation offers methods for primer detection, primer and poly-A trimming. In our own benchmarks as well as independent bench- marks this tool outcompetes other currently available tools with respect to sensitivity and specificity in simulated and real data sets for a large number of sequencing protocols. Second, we introduce a novel dynamic programming algorithm for the spliced alignment problem. The advantage of this algorithm is its capability to not only detect co-linear splice events, i.e. local splice events on the same genomic strand, but also circular and other non-collinear splice events. This succinct and simple algorithm handles all these cases at the same time with a high accuracy. While it is at par with other state- of-the-art methods for collinear splice events, it outcompetes other tools for many non-collinear splice events. The application of this method to publically available sequencing data led to the identification of a novel isoform of the tumor suppressor gene p53. Since this gene is one of the best studied genes in the human genome, this finding is quite remarkable and suggests that the application of our algorithm could help to identify a plethora of novel isoforms and genes. Third, we present a data adaptive method to call single nucleotide variations (SNVs) from aligned high-throughput sequencing reads. We demonstrate that our method based on empirical log-likelihoods automatically adjusts to the quality of a sequencing experiment and thus renders a \"decision\" on when to call an SNV. In our simulations this method is at par with current state-of-the-art tools. Finally, we present biological results that have been obtained using the special features of the presented alignment algorithm
Diese Arbeit stellt drei verschiedene algorithmische und statistische Strategien für die Analyse von Hochdurchsatz-Sequenzierungsdaten vor. Zuerst führen wir eine auf enhanced Suffixarrays basierende heuristische Methode ein, die kurze Sequenzen mit grossen Genomen aligniert. Die Methode basiert auf der Idee einer fehlertoleranten Traversierung eines Suffixarrays für Referenzgenome in Verbindung mit dem Konzept der Matching-Statistik von Chang und einem auf Bitvektoren basierenden Alignmentalgorithmus von Myers. Die vorgestellte Methode unterstützt Paired-End und Mate-Pair Alignments, bietet Methoden zur Erkennung von Primersequenzen und zum trimmen von Poly-A-Signalen an. Auch in unabhängigen Benchmarks zeichnet sich das Verfahren durch hohe Sensitivität und Spezifität in simulierten und realen Datensätzen aus. Für eine große Anzahl von Sequenzierungsprotokollen erzielt es bessere Ergebnisse als andere bekannte Short-Read Alignmentprogramme. Zweitens stellen wir einen auf dynamischer Programmierung basierenden Algorithmus für das spliced alignment problem vor. Der Vorteil dieses Algorithmus ist seine Fähigkeit, nicht nur kollineare Spleiß- Ereignisse, d.h. Spleiß-Ereignisse auf dem gleichen genomischen Strang, sondern auch zirkuläre und andere nicht-kollineare Spleiß-Ereignisse zu identifizieren. Das Verfahren zeichnet sich durch eine hohe Genauigkeit aus: während es bei der Erkennung kollinearer Spleiß-Varianten vergleichbare Ergebnisse mit anderen Methoden erzielt, schlägt es die Wettbewerber mit Blick auf Sensitivität und Spezifität bei der Vorhersage nicht-kollinearer Spleißvarianten. Die Anwendung dieses Algorithmus führte zur Identifikation neuer Isoformen. In unserer Publikation berichten wir über eine neue Isoform des Tumorsuppressorgens p53. Da dieses Gen eines der am besten untersuchten Gene des menschlichen Genoms ist, könnte die Anwendung unseres Algorithmus helfen, eine Vielzahl weiterer Isoformen bei weniger prominenten Genen zu identifizieren. Drittens stellen wir ein datenadaptives Modell zur Identifikation von Single Nucleotide Variations (SNVs) vor. In unserer Arbeit zeigen wir, dass sich unser auf empirischen log-likelihoods basierendes Modell automatisch an die Qualität der Sequenzierungsexperimente anpasst und eine \"Entscheidung\" darüber trifft, welche potentiellen Variationen als SNVs zu klassifizieren sind. In unseren Simulationen ist diese Methode auf Augenhöhe mit aktuell eingesetzten Verfahren. Schließlich stellen wir eine Auswahl biologischer Ergebnisse vor, die mit den Besonderheiten der präsentierten Alignmentverfahren in Zusammenhang stehen
9

Duggett, Nicholas A. "High-throughput sequencing of the chicken gut microbiome." Thesis, University of Birmingham, 2016. http://etheses.bham.ac.uk//id/eprint/6678/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The chicken (\(Gallus\) \(gallus\) \(domesticus\)) is the most abundant and widely distributed livestock animal with a global population of over 21 bill ion. A newly hatched broiler chick increases its body weight by 25% overnight and 50-fold over five weeks. The symbiotic, complex and variable community of the microbiome forms an important part of the gastrointestinal tract (gut). It is involved in gut development, biochemistry, immunology, physiology and non-specific resistance to infection. This study investigated the chicken gut microbiota using high-throughput 16S rRNA sequencing and culture-based techniques. There was specific interest in the proventriculus of which there is limited research currently in the literature and the caecum because it contains the highest density of bacterial cells in the gut at 10\(^1\)\(^1\) per gram. The results showed no significant difference in the first stages of the gut which shared a low-diversity microbiota dominated by a few \(Lactobacillus\) species. The microbiota becomes more diverse in the latter pa1ts of the small intestine where \(C/ostridiales\) and \(Enterobacteriaceae\) were present in higher numbers. The caecum was the most diverse organ with the majority of species belonging to Ruminococcaceae, Lachnospiraceae and \(Alistipes\). A number of novel species were isolated from the chicken gut and six of these were whole-genome sequenced.
10

Chiang, HyoJin Rosaria. "Examination of mammalian microRNAs by high-throughput sequencing." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/65289.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biology, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references.
Small non-coding RNAs play an important role in a wide range of cellular events. MicroRNAs (miRNAs) are an abundant class of small RNAs that post-transcriptionally repress expression of their target genes. Since miRNA targeting is based on its sequence, accurate and comprehensive annotation of miRNA genes is fundamental to understanding miRNA gene regulation. Advances in high-throughput sequencing technology have led to discoveries of novel small RNA genes and identifications of their properties. We describe a method for construction of small-RNA library for Illumina sequencing platform that improves upon previous efforts. Sequencing data from small-RNA libraries constructed using this protocol can be used to profile small RNAs from a broad range of samples. In particular, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. The analysis of the data provide a substantially revised list of confidently identified murine miRNAs, thereby providing a more accurate picture of the general features of mammalian miRNAs and their abundance in the genome. In addition, our results revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan pre-miRNA, cases of consequential 5' heterogeneity, newly identified instances of miRNA editing, and widespread pre-miRNA uridylation reminiscent of Lin28-like miRNA regulation.
by HyoJin Rosaria Chiang.
Ph.D.
11

Stromberg, Michael Peter. "Enabling high-throughput sequencing data analysis with MOSAIK." Thesis, Boston College, 2010. http://hdl.handle.net/2345/1332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis advisor: Gabor T. Marth
During the last few years, numerous new sequencing technologies have emerged that require tools that can process large amounts of read data quickly and accurately. Regardless of the downstream methods used, reference-guided aligners are at the heart of all next-generation analysis studies. I have developed a general reference-guided aligner, MOSAIK, to support all current sequencing technologies (Roche 454, Illumina, Applied Biosystems SOLiD, Helicos, and Sanger capillary). The calibrated alignment qualities calculated by MOSAIK allow the user to fine-tune the alignment accuracy for a given study. MOSAIK is a highly configurable and easy-to-use suite of alignment tools that is used in hundreds of labs worldwide. MOSAIK is an integral part of our genetic variant discovery pipeline. From SNP and short-INDEL discovery to structural variation discovery, alignment accuracy is an essential requirement and enables our downstream analyses to provide accurate calls. In this thesis, I present three major studies that were formative during the development of MOSAIK and our analysis pipeline. In addition, I present a novel algorithm that identifies mobile element insertions (non-LTR retrotransposons) in the human genome using split-read alignments in MOSAIK. This algorithm has a low false discovery rate (4.4 %) and enabled our group to be the first to determine the number of mobile elements that differentially occur between any two individuals
Thesis (PhD) — Boston College, 2010
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology
12

Xing, Zhengrong. "Poisson multiscale methods for high-throughput sequencing data." Thesis, The University of Chicago, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10195268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

In this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions.

We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios.

Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks.

We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings.

Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.

13

de, Lange Katrina Melanie. "Understanding inflammatory bowel disease using high-throughput sequencing." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/265370.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
For over two decades, the study of genetics has been making significant progress towards understanding the causes of common disease. Across a wide range of complex disorders there have been hundreds of associated loci identified, largely driven by common genetic variation. Now, with the advent of next-generation sequencing technology, we are able to interrogate rare and low frequency variation in a high throughput manner for the first time. This provides an exciting opportunity to investigate the role of rarer variation in complex disease risk on a genome-wide scale, potentially o↵ering novel insights into the biological mechanisms underlying disease pathogenesis. In this thesis I will assess the potential of this technology to further our understanding of the genetics of complex disease, using inflammatory bowel disease (IBD) as an example. After first reviewing the history of genetic studies into IBD, I will describe the analytical challenges that can occur when using sequencing to perform case-control association testing at scale, and the methods that can be used to overcome these. I then test for novel IBD associations in a low coverage whole genome sequencing dataset, and uncover a significant burden of rare, damaging missense variation in the gene NOD2, as well as a more general burden of such variation amongst known inflammatory bowel disease risk genes. Through imputation into both new and existing genotyped cohorts, I also describe the discovery of 26 novel IBD-associated loci, including a low frequency missense variant in ADCY7 that approximately doubles the risk of ulcerative colitis. I resolve biological associations underlying several of these novel associations, including a number of signals associated with monocyte-specific changes in integrin gene expression following immune stimulation. These results reveal important insights into the genetic architecture of inflammatory bowel disease, and suggest that a combination of continued array-based genome- wide association studies, imputed using substantial new reference panels, and large scale deep sequencing projects will be required in order to fully understand the genetic basis of complex diseases like IBD.
14

Schwartz, Jerrod Joseph. "Technologies for high throughput single molecule DNA sequencing /." May be available electronically:, 2009. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Siragusa, Enrico [Verfasser]. "Approximate string matching for high-throughput sequencing / Enrico Siragusa." Berlin : Freie Universität Berlin, 2015. http://d-nb.info/1074404882/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Keebler, Jonathan Edward Myers. "Spontaneous Mutation Discovery via High-Throughput Sequencing of Pedigrees." NCSU, 2010. http://www.lib.ncsu.edu/theses/available/etd-03312010-151914/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Recent technological advances have made high-throughput DNA sequencing a routine laboratory experiment. This progression in technology has been made possible by the parallel production of millions of short fragments of sequence. The responsibility of garnering biological information from these DNA fragments has shifted from the wet-lab to the bioinformatician. As sequencing technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donorsâ genotypes, a task that is not necessarily trivial using high-throughput sequencing reads. A violation of Mendelian inheritance laws observed amid the resequenced genomes of family members can indicate the presence of a de novo mutation. A method for locating de novo mutations by probabilistically inferring genotypes across a pedigree using high-throughput sequencing is presented and applied to two resequenced nuclear families: one as a collaborative effort within The 1,000 Genomes Project, and the second in an attempt to discover candidate driver and passenger mutations within the genome of an Acute Lymphoblastic Leukemia. The mutation findings within these projects are presented, and the approach is examined in detail, highlighting areas where method improvements may be made. Considering the challenges experienced in these studies within the larger context of the nascent field of Personal Genomics, an honest assessment is presented of developments that must be made before the application of whole-genome sequencing on the scale of an individual human can unequivocally be used to predict, diagnose, or treat human disease.
17

Weese, David [Verfasser]. "Indices and Applications in High-Throughput Sequencing / David Weese." Berlin : Freie Universität Berlin, 2013. http://d-nb.info/1036130150/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Person, Kerry P. (Kerry Patrick). "Operational streamlining in a high-throughput genome sequencing center." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/37248.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M.B.A.)--Massachusetts Institute of Technology, Sloan School of Management; and, (S.M.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering; in conjunction with the Leaders for Manufacturing Program at MIT, 2006.
Includes bibliographical references (p. 83-84).
Advances in medicine rely on accurate data that is rapidly provided. It is therefore critical for the Genome Sequencing platform of the Broad Institute of MIT and Harvard to continually strive to reduce cost, improve throughput, and increase the quality of its data output. In the past, new technology in the form of both chemistry improvements and robotics has allowed the Institute to achieve these goals in a step-wise manner. However, as the rate of technology progression in sequencing has slowed, the Institute has been forced to look to continuous, incremental improvement in order to achieve its goals. The Core Sequencing/Detection group handles the high-throughput sequencing duties at the Broad Institute. Through the use of robotics and cutting edge biology, they are able to process and sequence upwards of 50 billion bases of DNA per year. The work that this thesis was based on took place primarily in this automated production area. This thesis utilizes a number of lean concepts, including the 7 Wastes and pull production control.
(cont.) Kanban systems, workflow changes, and a 5S implementation were used to bring these concepts to life at the Broad Institute. In order to correctly size the kanban system, process buildup diagrams and discrete event simulation were used. Each of these tools helped to drive the process towards the Institute's goals of reducing cost and improving quality and throughput.
by Kerry P. Person.
S.M.
M.B.A.
19

Fritz, Markus Hsi-Yang. "Exploiting high throughput DNA sequencing data for genomic analysis." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Woolford, Julie Ruth. "Statistical analysis of small RNA high-throughput sequencing data." Thesis, University of Cambridge, 2012. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.610375.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Pérez, Cantalapiedra Carlos. "Accessing genetic variability in Spanish barleys through high-throughput sequencing." Doctoral thesis, Universitat Autònoma de Barcelona, 2016. http://hdl.handle.net/10803/399850.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
L'ordi és un cultiu important a la regió mediterrània, caracteritzada per precipitacions escasses i irregulars. A la Península Ibèrica, ha estat conreat durant milers d'anys, permeten l’aparició d’adaptacions específiques a l’estrès. Aquestes característiques, presents en les varietats locals espanyoles, romanen sense ser explotades en la millora de cereals. La seqüenciació d'alt rendiment (HTS, per les sigles en anglès) ha revolucionat la investigació fent possible la seqüenciació dels genomes de múltiples organismes. El mapa físic de l'ordi, amb seqüències associades, va ser publicat a finals de 2012. Per treure partit d'aquests recursos, calia facilitar-ne l'accés a genetistes i milloradors. Aquest va ser l'objectiu que ens va portar a desenvolupar Barleymap, una eina informàtica que permet localitzar marcadors genètics en el genoma de l’ordi. Aquesta aplicació integra i localitza marcadors de diferents plataformes de genotipat d'ordi àmpliament utilitzades. Un altre avantatge de la HTS és que es poden dur a terme diferents tipus d'experiments amb diferents objectius d'investigació. Nosaltres fem servir la seqüenciació de l’exoma pel mapeig fi d'un QTL de resistència a l’oïdi d'una varietat local espanyola. A partir d'una gran població de mapeig, vam ser capaços de delimitar la posició del QTL a un contig físic. A més, vam poder identificar i ensamblar parcialment un gen candidat que s'expressa. Per aconseguir això, una sèrie aproximacions bioinformàtiques van ser aplicades per diferenciar la variació de presència-absència en un grup de gens de la família NBS-LRR. Una altra aplicació poderosa de la HTS és RNAseq, que permet seqüenciar transcriptomes complets, i dur a terme assajos d'expressió amb una resolució sense precedent. Ensamblem de novo els transcriptomes d'un cultivar d'ordi susceptible a sequera i d'una varietat local espanyola resistent. Comparem els canvis d'expressió, en fulles i inflorescències en desenvolupament d'ambdós genotips, sota tractaments de sequera. Es van revelar grans diferències en les seves respostes a estrès. La comparació amb altres treballs de sequera en ordi, i l'anàlisi dels factors de transcripció i elements reguladors implicats va proporcionar noves dades sobre la complexa xarxa d'expressió gènica d'ordi sota estrès. En resum, la HTS aporta moltes noves possibilitats. Per aprofitar-la totalment, s'ha de fomentar la col·laboració de bioinformàtics i genetistes, per adaptar els nous recursos genòmics a les necessitats específiques.
La cebada es un cultivo importante en la región mediterránea, caracterizada por escasas e irregulares precipitaciones. En la Península Ibérica, ha sido cultivada durante miles de años, surgiendo adaptaciones específicas a estrés. Estas características, presentes en las variedades locales españolas, permanecen sin ser explotadas en mejora. La secuenciación de alto rendimiento (HTS, por sus siglas en inglés) ha revolucionado la investigación. Ha hecho posible secuenciar los genomas de múltiples organismos. El mapa físico de cebada, con secuencias asociadas, fue publicado a finales de 2012. Para sacar partido de estos recursos, había que facilitar el acceso a dicho recurso a genetistas y mejoradores. Este fue el objetivo que nos llevó a desarrollar Barleymap, una herramienta informática que permite localizar marcadores genéticos en el genoma de cebada. La aplicación integra y localiza marcadores de distintas plataformas de genotipado de cebada ampliamente utilizadas. Otra ventaja de la HTS es que se pueden llevar a cabo distintos tipos de experimentos con distintos objetivos de investigación. Nosotros utilizamos la secuenciación del exoma para mapeo fino de un QTL de resistencia a oidio de una variedad local española. A partir de una gran población de mapeo, fuimos capaces de acotar la posición del QTL a un solo contig físico. Además, pudimos identificar, y ensamblar parcialmente, un gene candidato que se expresa. Para conseguir esto, una serie de enfoques bioinformáticos fueron aplicados para diferenciar variación de presencia-ausencia, en un grupo de genes relacionados de la familia NBS-LRR. Otra aplicación poderosa de la HTS es RNAseq, que permite secuenciar transcriptomas completos, y llevar a cabo ensayos de expresión con una resolución sin precedente. Ensamblamos de novo los transcriptomas de un cultivar de cebada susceptible a sequía y de una variedad local española resistente. Comparamos los cambios de expresión, en hojas e inflorescencias en desarrollo de ambos genotipos, bajo tratamientos de sequía. Se revelaron grandes diferencias en sus respuestas a estrés. La comparación con otros trabajos de sequía en cebada, y el análisis de los factores de transcripción y elementos reguladores implicados proporcionó nuevos datos sobre la compleja red de expresión génica de cebada bajo estrés. En resumen, la HTS trae muchas nuevas posibilidades. Para aprovecharla totalmente, se debe fomentar colaboración de bioinformáticos y genetistas, para adaptar los nuevos recursos genómicos a necesidades específicas.
Barley is an important crop in the Mediterranean region, characterized by scarce and irregular rainfalls. In the Iberian Peninsula, it has been cultivated for thousands of years, leading to specific adaptations to prevalent biotic and abiotic stresses. These features, present in Spanish barley landraces, remain to be exploited in breeding. High-throughput sequencing (HTS) has revolutionized plant research. It has made it possible to sequence the genomes of multiple organisms. The sequence-enriched physical map of barley was published in late 2012. A first step to exploit barley genomics, for practical purposes, was facilitating geneticists and breeders access to the barley physical map. This was the aim which led us to the development of Barleymap, a software tool which allows locating genetic markers in the barley physical-genetic map. This application effectively integrates and maps markers from different widely used barley genotyping platforms, and, in general, any marker with sequence information. Another advantage of HTS is that diverse experimental setups can be used with different research objectives. Here, we used exome sequencing to fine-map a powdery mildew resistance QTL from a Spanish barley landrace. Exploiting a large mapping population, we were able to narrow down the position of the QTL to a single physical contig. Moreover, we could identify, and partially assemble, an expressed candidate gene. To achieve this, an array of bioinformatics approaches was applied to differentiate presence-absence variation, within a cluster of closely related genes of the NBS-LRR family. Another powerful application of HTS is RNAseq, which allows sequencing whole transcriptomes, and gene expression assays can be performed with unprecedented power. We de novo assembled the transcriptomes of a drought susceptible elite barley cultivar and a drought resistant Spanish barley landrace. Then, we compared the expression changes, in leaves and developing inflorescences from both genotypes, under drought treatments. This revealed large differences in their responses to stress. A comparison with other drought gene expression studies on barley, and an analysis of transcription factors and cis¬-regulatory elements involved, provided new insights into the complex barley gene expression network under stress. In summary, HTS has brought many new possibilities to plant research. To take full advantage of it, crosstalk between bioinformatics and genetics must be fostered to adapt the new genomic resources to specific needs.
22

Kircher, Martin. "Understanding and improving high-throughput sequencing data production and analysis." Doctoral thesis, Universitätsbibliothek Leipzig, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-71102.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Advances in DNA sequencing revolutionized the field of genomics over the last 5 years. New sequencing instruments make it possible to rapidly generate large amounts of sequence data at substantially lower cost. These high-throughput sequencing technologies (e.g. Roche 454 FLX, Life Technology SOLiD, Dover Polonator, Helicos HeliScope and Illumina Genome Analyzer) make whole genome sequencing and resequencing, transcript sequencing as well as quantification of gene expression, DNA-protein interactions and DNA methylation feasible at an unanticipated scale. In the field of evolutionary genomics, high-throughput sequencing permitted studies of whole genomes from ancient specimens of different hominin groups. Further, it allowed large-scale population genetics studies of present-day humans as well as different types of sequence-based comparative genomics studies in primates. Such comparisons of humans with closely related apes and hominins are important not only to better understand human origins and the biological background of what sets humans apart from other organisms, but also for understanding the molecular basis for diseases and disorders, particularly those that affect uniquely human traits, such as speech disorders, autism or schizophrenia. However, while the cost and time required to create comparative data sets have been greatly reduced, the error profiles and limitations of the new platforms differ significantly from those of previous approaches. This requires a specific experimental design in order to circumvent these issues, or to handle them during data analysis. During the course of my PhD, I analyzed and improved current protocols and algorithms for next generation sequencing data, taking into account the specific characteristics of these new sequencing technologies. The presented approaches and algorithms were applied in different projects and are widely used within the department of Evolutionary Genetics at the Max Planck Institute of Evolutionary Anthropology. In this thesis, I will present selected analyses from the whole genome shotgun sequencing of two ancient hominins and the quantification of gene expression from short-sequence tags in five tissues from three primates.
23

Anandhakumar, Chandran. "Advancing Synthetic Gene Regulators Development with High-Throughput Sequencing Technologies." 京都大学 (Kyoto University), 2015. http://hdl.handle.net/2433/202663.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Mohamadi, Hamid. "Parallel algorithms and software tools for high-throughput sequencing data." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/62072.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With growing throughput and dropping cost of High-Throughput Sequencing (HTS) technologies, there is a continued need to develop faster and more cost-effective bioinformatics solutions. However, the algorithms and computational power required to efficiently analyze HTS data have lagged considerably. In health and life sciences research organizations, de novo assembly and sequence alignment have become two key steps in everyday research and analysis. The de novo assembly process is a fundamental step in analyzing previously uncharacterized organisms and is one of the most computationally demanding problems in bioinformatics. The sequence alignment is a fundamental operation in a broad spectrum of genomics projects. In genome resequencing projects, they are often used prior to variant calling. In transcriptome resequencing, they provide information on gene expression. They are even used in de novo sequencing projects to help contiguate assembled sequences. As such designing efficient, scalable, and accurate solutions for de novo assembly and sequence alignment problems would have a wide effect in the field. In this thesis, I present a collection of novel algorithms and software tools for the analysis of high-throughput sequencing data using efficient data structures. I also utilize the latest advances in parallel and distributed computing to design and develop scalable and cost-effective algorithms on High-Performance Computing (HPC) infrastructures especially for the de novo assembly and sequence alignment problems. The algorithms and software solutions I develop are publicly available for free for academic use, to facilitate research at health and life sciences laboratories and other organizations worldwide.
Science, Faculty of
Graduate
25

Stokowy, Tomasz, Markus Eszlinger, Michał Świerniak, Krzysztof Fujarewicz, Barbara Jarząb, Ralf Paschke, and Kurt Krohn. "Analysis options for high-throughput sequencing in miRNA expression profiling." Universitätsbibliothek Leipzig, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-144393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Background: Recently high-throughput sequencing (HTS) using next generation sequencing techniques became useful in digital gene expression profiling. Our study introduces analysis options for HTS data based on mapping to miRBase or counting and grouping of identical sequence reads. Those approaches allow a hypothesis free detection of miRNA differential expression. Methods: We compare our results to microarray and qPCR data from one set of RNA samples. We use Illumina platforms for microarray analysis and miRNA sequencing of 20 samples from benign follicular thyroid adenoma and malignant follicular thyroid carcinoma. Furthermore, we use three strategies for HTS data analysis to evaluate miRNA biomarkers for malignant versus benign follicular thyroid tumors. Results: High correlation of qPCR and HTS data was observed for the proposed analysis methods. However, qPCR is limited in the differential detection of miRNA isoforms. Moreover, we illustrate a much broader dynamic range of HTS compared to microarrays for small RNA studies. Finally, our data confirm hsa-miR-197-3p, hsa-miR-221-3p, hsa-miR-222-3p and both hsa-miR-144-3p and hsa-miR-144-5p as potential follicular thyroid cancer biomarkers. Conclusions: Compared to microarrays HTS provides a global profile of miRNA expression with higher specificity and in more detail. Summarizing of HTS reads as isoform groups (analysis pipeline B) or according to functional criteria (seed analysis pipeline C), which better correlates to results of qPCR are promising new options for HTS analysis. Finally, data opens future miRNA research perspectives for HTS and indicates that qPCR might be limited in validating HTS data in detail.
26

Ainsworth, David. "Computational approaches for metagenomic analysis of high-throughput sequencing data." Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/44070.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This 'data deluge' has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.
27

Wan, Ji. "Global analysis of alternative polyadenylation regulation using high-throughput sequencing." Diss., University of Iowa, 2012. https://ir.uiowa.edu/etd/3548.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Messenger RNAs (mRNAs) have to undergo a series of post-transcriptional processing steps before translation. One of the post-transcriptional steps - 3' end processing, which consists of cleavage and polyadenylation, is critical for delimiting the 3' end of mRNA and determining regulatory elements for downstream post-transcriptional/translational regulation. Like another well-characterized mRNA processing step - splicing, 3' end processing is very flexible due to the diversity of trans-acting factors and cis-acting elements in the 3' end of mRNA. In recent years, the differential usage of alternative polyA sites (APA) of the same gene, which leads to mRNA isoforms of different 3' UTR, has been increasingly revealed by both experimental and computational studies. More significantly, the global changes of 3' UTR length have been observed in multiple clinical settings, particularly in the cancer cells. However, the depiction of APA phenomenon does not synchronize the efforts to study the mechanism underlying APA biogenesis. In this thesis, we first describe general principle and pipeline to identify APA in different biological or clinical conditions using various high throughput sequencing techniques. After that, we present the work about the global impacts of two RNA binding proteins (ESRP/aCP) and one core 3' end processing factor (CstF64 and its paralog CstF64τ) on the regulation of APA. The APA identification analyses and motif analyses suggest a wide range of APA associated with the expression change of those proteins in different cell lines. In addition, for each protein, we have collect substantial evidence about the mechanism underlying the APA induction. Our findings could provide significant insights into the APA regulation mechanisms. In addition, we also conducted a research on the induction of APA in JEG-3 cells as a response to the change of oxygen supply (Hypoxia and Normoxia). Using a robustness protocol for specifically sequencing 3' end of mRNA, we identified more than 500 APA events and revealed a global shortening pattern of 3' UTR length as a result of hypoxia. The work on APA in this thesis largely increases the understanding of APA regulation by various proteins and provided new evidence for the APA in clinical condition.
28

Mammana, Alessandro [Verfasser]. "Patterns and algorithms in high-throughput sequencing count data / Alessandro Mammana." Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1108270956/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Love, Michael I. [Verfasser]. "Statistical analysis of high-throughput sequencing count data / Michael I. Love." Berlin : Freie Universität Berlin, 2013. http://d-nb.info/1043197842/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Wignall-Fleming, Elizabeth Bowie. "Investigations into the dynamics of paramyxovirus infections by high-throughput sequencing." Thesis, University of Glasgow, 2019. http://theses.gla.ac.uk/40905/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The paramyxovirus family can cause a broad spectrum of diseases from mild febrile illnesses to more severe diseases that may require hospitalisation and can in the most serious cases have fatal outcomes. Understanding the virus infection dynamics is fundamental to the development of novel targets for therapeutic and vaccine development. The advancement of High-throughput sequencing (HTS) has revolutionised biomedical research providing unparalleled opportunities to answer complex questions. In this study we developed a workflow using directional analysis of HTS data to gain a unique opportunity to simultaneously analyse the kinetics of virus transcription and replication for PIV5 strain W3, PIV2, MuV and PIV3. The workflow could be used for the study of all negative strand viruses. The developed workflow was used to investigate a number of characteristics of paramyxovirus transcription including quantification of the transcription gradient, RNA editing resulting in the generation of non-templated mRNAs and the production of read-through mRNAs. Interestingly, the processivity of the RNA polymerase during transcription was shown to remain consistent throughout the infection amongst all of the viruses analysed. Additionally, virus replication and the generation of antigenomes were found to occur at early times post infection. This was surprising, as the current model for virus replication requires sufficient levels of NP to be present in the cytoplasm before the virus can enter replicative mode. These results suggest a revision of this model in which the virus produces local sites of virus transcription and replication in the cytoplasm known as foci and it is the level of NP surrounding the virus genomes at these local sites that dictates the virus ability to enter a replicative mode. PIV5 strain W3 was shown to supress virus gene expression at late times post infection resulting in the establishment of a persistent infection. The developed workflow was used to analyse the infection dynamics of PIV5. There were no changes in the RNA polymerase processivity of transcription that could account for the suppression of protein synthesis. A comparative analysis of PIV5 strains W3 and CPI+ identified a mutation of a serine to a phenylalanine at position 157 of the P protein in CPI+, a phosphorylation site that when phosphorylated by polo-like kinase 1 (PLK-1) was previously shown to play a role in the inhibition of virus RNA synthesis, that abolished the virus ability to supress protein synthesis and establish a persistent infection. This indicates that phosphorylation of serine at position 157 is responsible for the inhibition of virus gene expression and the establishment of persistence.
31

Ballinger, Tracy J. "Analysis of genomic rearrangements in cancer from high throughput sequencing data." Thesis, University of California, Santa Cruz, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3729995.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

In the last century cancer has become increasingly prevalent and is the second largest killer in the United States, estimated to afflict 1 in 4 people during their life. Despite our long history with cancer and our herculean efforts to thwart the disease, in many cases we still do not understand the underlying causes or have successful treatments. In my graduate work, I’ve developed two approaches to the study of cancer genomics and applied them to the whole genome sequencing data of cancer patients from The Cancer Genome Atlas (TCGA). In collaboration with Dr. Ewing, I built a pipeline to detect retrotransposon insertions from paired-end high-throughput sequencing data and found somatic retrotransposon insertions in a fifth of cancer patients.

My second novel contribution to the study of cancer genomics is the development of the CN-AVG pipeline, a method for reconstructing the evolutionary history of a single tumor by predicting the order of structural mutations such as deletions, duplications, and inversions. The CN-AVG theory was developed by Drs. Haussler, Zerbino, and Paten and samples potential evolutionary histories for a tumor using Markov Chain Monte Carlo sampling. I contributed to the development of this method by testing its accuracy and limitations on simulated evolutionary histories. I found that the ability to reconstruct a history decays exponentially with increased breakpoint reuse, but that we can estimate how accurately we reconstruct a mutation event using the likelihood scores of the events. I further designed novel techniques for the application of CN-AVG to whole genome sequencing data from actual patients and applied these techniques to search for evolutionary patterns in glioblastoma multiforme using sequencing data from TCGA. My results show patterns of two-hit deletions, as we would expect, and amplifications occurring over several mutational events. I also find that the CN-AVG method frequently makes use of whole chromosome copy number changes following by localized deletions, a bias that could be mitigated through modifying the cost function for an evolutionary history.

32

Sibthorp, Christopher. "Analysis of the Aspergillus nidulans transcriptome using high-throughput RNA sequencing." Thesis, University of Liverpool, 2012. http://livrepository.liverpool.ac.uk/9973/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The filamentous fungus, Aspergillus nidulans is a well-characterized model organism which has been used extensively for the study of eukaryotic cell biology and genetics over the past 60 years. The A. nidulans genome was sequenced in 2005, and various genome annotations have been released since, the majority of which rely heavily on in silico gene prediction. The development of high-throughput next generation sequencing technologies has revolutionised transcriptomics by allowing RNA-analysis of whole transcriptomes through massively parallel cDNA sequencing (RNA-seq). This sequencing approach has been applied to the A. nidulans transcriptome, and augmented by the development of a novel strategy for selectively sequencing the 5′ ends of RNAs on the ABI SOLiD platform. This aimed to produce a more robust resource for gene interrogation and the investigation of regulatory elements which impact on the transcriptomal landscape in A. nidulans. Bioinformatic analysis RNA-seq data was used to define 15,375 transcription start site (TSS) regions, which have been characterised by statistical analysis of mapped 5′ end distribution. Motif finding within sequence regions surrounding these TSS identified 16 putative functional promoter motifs based on overrepresentation and distributional analysis within promoters, and GO annotation found significant functional enrichment amongst genes associated with two of these motifs (AARARAAA and TTTYTTY). Transcript assembly of RNA-seq data has also revealed 16065 putative transcripts, 1112 of which were mapped to regions annotated as intergenic. From these transcripts we identified 38 strong candidates for novel protein coding genes (six of which contained non-canonical translation start sites), and over 400 additional transcripts containing putative coding regions. Separation of RNA-seq data in two sets of strand specific reads was shown to greatly increase the quality of transcript assembly and facilitated the identification of 2291 occurrences of sense:antisense overlap between assembled transcripts, four of which have been proven experimentally. Finally, assembled transcripts have been used to detect multiple transcript isoforms arising from alternative splicing events. 374 distinct loci were identified as the origins of alternatively spliced transcripts, and six of these were verified experimentally.
33

Glaus, Peter. "Bayesian methods for gene expression analysis from high-throughput sequencing data." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/bayesian-methods-for-gene-expression-analysis-from-highthroughput-sequencing-data(cf9680e0-a3f2-4090-8535-a39f3ef50cc4).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We study the tasks of transcript expression quantification and differential expression analysis based on data from high-throughput sequencing of the transcriptome (RNA-seq). In an RNA-seq experiment subsequences of nucleotides are sampled from a transcriptome specimen, producing millions of short reads. The reads can be mapped to a reference to determine the set of transcripts from which they were sequenced. We can measure the expression of transcripts in the specimen by determining the amount of reads that were sequenced from individual transcripts. In this thesis we propose a new probabilistic method for inferring the expression of transcripts from RNA-seq data. We use a generative model of the data that can account for read errors, fragment length distribution and non-uniform distribution of reads along transcripts. We apply the Bayesian inference approach, using the Gibbs sampling algorithm to sample from the posterior distribution of transcript expression. Producing the full distribution enables assessment of the uncertainty of the estimated expression levels. We also investigate the use of alternative inference techniques for the transcript expression quantification. We apply a collapsed Variational Bayes algorithm which can provide accurate estimates of mean expression faster than the Gibbs sampling algorithm. Building on the results from transcript expression quantification, we present a new method for the differential expression analysis. Our approach utilizes the full posterior distribution of expression from multiple replicates in order to detect significant changes in abundance between different conditions. The method can be applied to differential expression analysis of both genes and transcripts. We use the newly proposed methods to analyse real RNA-seq data and provide evaluation of their accuracy using synthetic datasets. We demonstrate the advantages of our approach in comparisons with existing alternative approaches for expression quantification and differential expression analysis. The methods are implemented in the BitSeq package, which is freely distributed under an open-source license. Our methods can be accessed and used by other researchers for RNA-seq data analysis.
34

Solayman, Md. "High-Throughput Sequencing Based Probing of Protein/RNA Structures and Functions." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/416290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The rapid advancement in sequencing chemistry, sequencing technologies, and bioinformatics has significantly increased the sequencing automation and lowered the cost. The applications of high-throughput sequencing (HTS) technologies are expanding from research laboratories to diagnostic clinics on a regular basis. Moreover, diverse methods used in epigenetics, proteomics, structure probing of macromolecules (DNA, RNA, and proteins) have been developed based on the HTS technology. This thesis describes the development of two novel techniques, high-throughput split-protein profiling (HiTS) and RNA solvent accessibility probing method (RL-Seq), broadening the applications of HTS technologies for probing protein/RNA structures and functions. Chapter 1 of the thesis provides an overview of the history of HTS technologies, available platforms, ongoing development in this field, and their diverse applications, particularly in the area of proteomics and RNA structure probing. In Chapter 2, we introduced the HiTS method that allowed fast identification of self- and assisted complementary positions of three antibiotic-resistant proteins (fosfomycin, fosA3; erythromycin, ermB; and chloramphenicol, catI resistant-proteins). The finding of suitable split sites in proteins is important because they are used as reporters in protein complementary assay (PCA) for studying protein-protein interactions in different organisms. However, only a small number of split-protein systems have been identified so far owing to manual, labourintensive optimization of the candidate genes. The proposed HiTS method employs transposon mutagenesis, conditional interaction of split fragments by rapamycin-regulated FRB-FKBP protein pairs, and deep sequencing for fast identification of self- and assisted complementary fragments, which are subsequently confirmed by low-throughput testing. In Chapter 3, we further applied the HiTS method on T7 RNA polymerase (T7 RNAP), a bacteriophage RNA polymerase, considering its importance in synthetic biology in addition to the PCA. We found that the newly developed HiTS method could also be applicable to T7 RNAP for locating suitable split sites for self-complementing variants. Several selfcomplementing variants were found and one with a stronger signal than the wild type one. In Chapter 4, in preparation of applying HTS technology to probe RNA solvent accessibility, we reviewed the available experimental and computational techniques for RNA solvent accessibility studies and identified existing research gaps. Current experimental approaches for studying RNA solvent accessibility include hydroxyl radical probing (HRF-Seq), light activated structural examination of RNA (LASER), and its modified versions (LASER-Seq, LASER-Map, and icLASER). The reactivity readouts of these methods are based on either the reverse transcriptase stop (RT-stop) at cleavage points or mutational profiling at adduct formation sites. These approaches rely on reverse transcriptase enzymes and random primers, which suffer from non-specific drop-off to create short truncated sequences, which successively lead to false-positive signals at probe-reactive sites. In Chapter 5, we proposed the RL-Seq (RtcB Ligation-Seq) method to overcome the abovementioned limitations of the existing approaches. The method is illustrated by measuring the solvent accessibility of Escherichia coli complete ribosomal complexes at the single-nucleotide resolution. In this method, unique properties of RtcB ligase were used to identify the probing sites by ligating a pre-defined 5′-OH end containing linker with the hydroxyl radicals cleavage generated 3′-P ends. The application of this method to ribosomal RNAs (23S, 16S, and 5S rRNAs) confirmed its ability to estimate solvent accessibility with high sensitivity (required low sequencing depth) and accuracy (strong correlation to structure-derived values). In addition, the pre-defined linker employed in this method allowed using of a fixed primer in reverse transcription reaction and significantly minimized the biases during subsequent PCR amplification. In Chapter 6, we discussed the future prospects of these HTS technology-based methods developed in this thesis.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
Institute for Glycomics
Full Text
35

Paicu, Claudia. "miRNA detection and analysis from high-throughput small RNA sequencing data." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/63738/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Small RNAs (sRNAs) are a broad class of short regulatory non-coding RNAs. microRNAs (miRNAs) are a special class of -21-22 nucleotide sRNAs which are derived from a stable hairpin-like secondary structure. miRNAs have critical gene regulatory functions and are involved in many pathways including developmental timing, organogenesis and development in both plants and animals. Next generation sequencing (NGS) technologies, which are often used for identifying miRNAs, are continuously evolving, generating datasets containing millions of sRNAs, which has led to new challenges for the tools used to predict miRNAs from such data. There are several tools for miRNA detection from NGS datasets, which we review in this thesis, identifying a number of potential shortcomings in their algorithms. In this thesis, we present a novel miRNA prediction algorithm, miRCat2. Our algorithm is more robust to variations in sequencing depth due to the fact that it compares aligned sRNA reads to a random uniform distribution to detect peaks in the input dataset, using a new entropy-based approach. Then it applies filters based on the miRNA biogenesis on the read alignment and on the computed secondary structure. Results show that miRCat2 has a better specificity-sensitivity trade-off than similar tools, and its predictions also contains a larger percentage of sequences that are downregulated in mutants in the miRNA biogenesis pathway. This confirms the validity of novel predictions, which may lead to new miRNA annotations, expanding and contributing to the field of sRNA research.
36

Bista, Iliana-Aglaia. "Defining a high throughput sequencing identification framework for freshwater ecosystem biomonitoring." Thesis, Bangor University, 2016. https://research.bangor.ac.uk/portal/en/theses/defining-a-high-throughput-sequencing-identification-framework-for-freshwater-ecosystem-biomonitoring(133e53f8-e300-495b-89e9-c1b3188d8acb).html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Freshwater ecosystems are currently amongst the most threatened habitats due to high levels of anthropogenic stress and increasing efforts are required to monitor their status and assess aquatic biodiversity. Biomonitoring, which is the systematic measurement of the responses of aquatic biota to environmental stressors, is used to evaluate ecosystem status. Macroinvertebrates are commonly used organisms for ecosystem assessment, due to their numerous biomonitoring qualities, which qualify them as ecological indicators. Traditional taxonomy-based monitoring is labour intensive, which limits the throughput, and is often inefficient in providing species level identification, which limits the accuracy of detections. The introduction of molecular based methods for biomonitoring, especially when coupled with High Throughput Sequencing (HTS) applications, offers a step change in ecosystem monitoring. Here I tested the utility of DNA based applications for increasing the efficiency of freshwater ecosystem biomonitoring, using benthic macroinvertebrates as a target group. For the first part of this work, I used DNA barcoding of the Cytochrome Oxidase Subunit I (COI), from individual specimens, to populate a barcode reference library for 94 species of Trichoptera, Gastropoda and Chironomidae from the UK. Then, I used High Throughput Sequencing (HTS) methods to characterise diversity from complex environmental samples. First, I used metabarcoding of aqueous environmental DNA (eDNA) and community invertebrate samples (Chironomidae pupal exuviae), collected on regular intervals throughout a year, to identify diversity levels and temporal patterns of community variation on ecosystem-wide and group specific scales. Finally, I used a structured design of mock macroinvertebrate communities, of known biomass content, to perform a comparison between PCR-based metabarcoding of the COI gene and PCR-free shotgun sequencing of mitochondrial genomes (mito-metagenomics), and evaluate their efficiency for accurate characterisation of biomass content of bulk samples. Overall, HTS has demonstrated great potential for advancing biomonitoring efforts, allowing ecosystem scale diversity detection from non-invasive types of samples, such as eDNA, whilst moving into mito-metagenomic work could improve the field even further by improving quantitative abundance results on the community composition level.
37

GIANGREGORIO, TANIA. "High throughput sequencing analysis for the molecular diagnosis of Inherited Thrombocytopenias." Doctoral thesis, Università degli Studi di Trieste, 2019. http://hdl.handle.net/11368/2962379.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Inherited thrombocytopenias are a heterogenous group of rare genetic disorders characterized by reduced platelet count sometimes combined with bleeding tendency and/or other clinical defects. The molecular diagnosis of ITs is essential to make clinical decision and infer personalized prognosis and risks. More than 30 genes have been identified that harbor mutations responsible for ITs (Balduini et al., 2017). In addition, ITs often show phenotypic overlaps that hamper the correct diagnosis with the traditional diagnostic algorithm based on step-wise specialized investigations. However, the advent of next generation sequencing has changed the diagnostic approach of diseases characterized by high genetic heterogeneity like ITs. In order to improve the diagnosis of IT, we designed a targeted next generation sequencing panel (IT-NGS) to screen the 28 genes more commonly mutated in ITs. Ninety-seven consecutive probands with a suspicious of ITs had been sequenced. The analysis led us to reach a definite diagnosis for 37 probands. In these probands we identified known or novel likely pathogenic mutations causing specific diseases, including monoallelic Bernard Soulier syndrome (N=14), biallelic Bernard Soulier syndrome (N=4), ACTN1-related thrombocytopenia (N=4), MYH9-related disease (N=7), ANKRD26-related thrombocytopenia (N=4), congenital amegakaryocytic thrombocytopenia (N=1), grey platelet syndrome (N=1), Wiskott-Aldrich syndrome (N=1) and Acute Myelogenous Leukemia (N=1). In another 34 cases we identified variants of uncertain significance (VUS) whose pathogenic role has to be supported by segregation analysis and in-depth functional studies. Since 17 probands had no potential candidate variant impacting IT-NGS genes, they are eligible for whole exome sequencing (WES) to clone novel genes involved in ITs. In conclusion, since some IT forms predispose to additional acquired disease during life, an accurate diagnosis is essential to infer personalized prognosis and define proper treatments and follow-up. Because of clinical and genetic heterogeneity, the molecular diagnosis of ITs represents a lengthy and expensive challenge using conventional technologies. The use of IT-NGS in clinical practice aided by specific investigations clarifying the role of variant of uncertain significance, overcomes these issues facilitating a definite diagnosis in patients with a suspicious of known ITs forms.
38

Barquist, Lars. "High-throughput experimental and computational studies of bacterial evolution." Thesis, University of Cambridge, 2014. https://www.repository.cam.ac.uk/handle/1810/245138.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The work in this thesis is concerned with the study of bacterial adaptation on short and long timescales. In the first section, consisting of three chapters, I describe a recently developed high-throughput technology for probing gene function, transposon-insertion sequencing, and its application to the study of functional differences between two important human pathogens, Salmonella enterica subspecies enterica serovars Typhi and Typhimurium. In a first study, I use transposon-insertion sequencing to probe differences in gene requirements during growth on rich laboratory media, revealing differences in serovar requirements for genes involved in iron-utilization and cell-surface structure biogenesis, as well as in requirements for non-coding RNA. In a second study I more directly probe the genomic features responsible for differences in serovar pathogenicity by analyzing transposon-insertion sequencing data produced following a two hour infection of human macrophage, revealing large differences in the selective pressures felt by these two closely related serovars in the same environment. The second section, consisting of two chapters, uses statistical models of sequence variation, i.e. covariance models, to examine the evolution of intrinsic termination across the bacterial kingdom. A first collaborative study provides background and motivation in the form of a method for identifying Rho-independent terminators using covariance models built from deep alignments of experimentally-verified terminators from Escherichia coli and Bacillus subtilis. In the course of the development of this method I discovered a novel putative intrinsic terminator in Mycobacterium tuberculosis. In the final chapter, I extend this approach to de novo discovery of intrinsic termination motifs across the bacterial phylogeny. I present evidence for lineage-specific variations in canonical Rho-independent terminator composition, as well as discover seven non-canonical putative termination motifs. Using a collection of publicly available RNA-seq datasets, I provide evidence for the function of some of these elements as bona fide transcriptional attenuators.
39

Ghazanfar, Shila. "Statistical approaches to harness high throughput sequencing data in diverse biological systems." Thesis, The University of Sydney, 2017. http://hdl.handle.net/2123/17268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The development of novel statistical approaches to questions specific to biological systems of interest is becoming more valuable as we tackle increasingly complex problems. This thesis explores three distinct biological systems in which high throughput sequencing data is utilised, varying in research area, organism, number of sequencing platforms and datasets integrated, and structure such as matched samples; showcasing the variety of study designs and thus the need for tailored statistical approaches. First, we characterise allelic imbalance from RNA-Seq data including stringent filtering criteria and a count based likelihood ratio test. This work identified genes of particular importance in livestock genomics such as those related to energy use. Second, we outline a novel methodology to identify highly expressed genes and cells for single cell RNA-Seq data. We derive a gamma-normal mixture model to identify lowly and highly expressed components, and use this to identify novel markers for olfactory sensory neuron (OSN) maturity across publicly available mouse neuron datasets. In addition we estimate single cell networks and find that mature OSN single cell networks are more centralised than immature OSN single cell networks. Third, we develop two novel frameworks for relating information from Whole Exome DNA-Seq and RNA-Seq data when i) samples are matched and when ii) samples are not necessary matched between platforms. In the latter case, we relate functional somatic mutation driver gene scores to transcriptional network correlation disturbance using a permutation testing framework, identifying potential candidate genes for targeted therapies. In the former case, we estimate directed mutation-expression networks for each cancer using linear models, providing a useful exploratory tool for identifying novel relationships among genes. This thesis demonstrates the importance of tailored statistical approaches to further understanding across many biological systems.
40

Esteve, Codina Anna. "Characterization of the Iberian pig genome and transcriptome using high throughput sequencing." Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/134673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
En aquesta tesis, hem estudiat els patrons de variabilidad nucleotídica del genoma del porc per entendre millor quines forces evolutives l’han afectat. El porc domèstic és una espècie domèstica que presenta una gran variabilitat fenotípica arrel del procés de domesticació i de la formació de races moderna. A més, el porc senglar i altres espècies pròximes encara, avui, estan vives, facilitant, així, la búsqueda de gens candidats que han sofert selecció artificial. El porc, és, també, important en el camp de la biomedicina, com a model de malalties humanes i com a reservori d’organs humans. En el primer capítol, hem volgut detectar si hi ha hagut selecció artificial en un possible gen candidat per la qualitat de la carn en porcs, la SERPINA6. Per això, hem estudiat la variabilitat nucleotídica de diversos porcs domèstics i senglars de diferents orígens (asiàtics i europeus). L’anàlisis realitzat, però, no ha estat concloent. En segon lloc, fent ús de les noves tècniques de seqüenciació nova generació, hem pogut estudiar la variabilitat nucleotídica, no només d’un cert gen, sinó de tot el genoma complet d’un porc Ibèric. A més, també, ens ha permès estudiar i caracteritzar el seu transcriptoma. Per dur-ho a terme, hem utilitzat diverses tècniques i metodologies complementàries: ‘whole genome sequencing’, ‘reduced representation libraries’, ‘pool sequencing’ and ‘transcriptome sequencing’. L’estimació de la variabilitat nucleotídica ha estat de 0.7kb-1, un valor gens negligible considerant l’alt coeficient de consanguinitat d’aquesta estirp de porcs. Hem observat, també, que els telòmers tenen una variabilitat més alta que els centròmers, fet que es pot explicar per una taxa de recombinació més alta. A més, el cromosoma X presenta una variabilitat molt més baixa de la esperada respecte als autosomes, causada, segurament per selecció o altres efectes demogràfics. Per estudiar regions en el genoma sota selecció, hem dividit el genoma en finestres no solapants i calculat diferents test de selecció, tant en un pool de porcs ibèrics, com en un sol individu. Les regions amb excés de polimorfisme i que per tant, podrien estar sota selecció balancejadora, estan enriquides en receptors olfactoris i gens del complex d’histocompatibilitat. En canvi, en regions amb excés de diferenciació i variabilitat molt baixa, no sembla que hi hagi un clar enriquiment en cap funció. De totes maneres, citem possibles gens candidats relacionats amb el metabolisme lipídic, la queratinització, la formació de pèls i el comportament. Per altra banda, les tècniques de seqüenciació massiva permeten també, detectar variants estructurals basant-se en els patrons del ‘read depth’. D’aquesta manera, hem pogut identificar guanys en el nombre de còpies de certes regions del genoma Ibèric respecte al genoma de referència. En total, hem trobat que 36 Mb del genoma estan afectades i que aproximadament un 5% de gens es troben dins aquestes regions. Així doncs, hem pogut identificar nous paràlogs de gens anotats; la majoria formant part de grans famílies gèniques. Finalment, hem comparat el transcriptoma de gònades masculines entre dos porcs amb fenotpis molt extrems, un d’Ibèric i un Large White. Els gens diferencialment expressats estan relacionats amb l’espermatogenesis i el metabolisme lipídic, acord amb les seves diferències fenotípiques. També hem pogut identificar nous gens no anotats, long-non-coding RNAs i elements de transposició expressats en aquest teixit.
41

Okonechnikov, Konstantin [Verfasser]. "High-throughput RNA sequencing: a step forward in transcriptome analysis / Konstantin Okonechnikov." Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1084634686/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Rubelt, Florian [Verfasser]. "Investigations into the human immunoglobulin repertoire utilizing high-throughput sequencing / Florian Rubelt." Berlin : Freie Universität Berlin, 2012. http://d-nb.info/1030488894/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Chao, Yuanqing, and 晁元卿. "Studies of biofilm development by advanced microscopic techniques and high-throughput sequencing." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2013. http://hub.hku.hk/bib/B50899922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study was conducted to investigate the biofilm formation by using advanced microscopic and high-throughput sequencing techniques. The major tasks were (1) to quantitatively evaluate the initial bacterial attachment processes by Atomic Force Microscopy (AFM); (2) to characterize the chemical variation during biofilm formation by Raman microscopy; (3) to analyze the microbial structure and functions in the wastewater and drinking water biofilms by metagenomic analysis. To determine the lateral detachment force for bacteria, a quantitative method using contact mode of AFM was developed. The established method had good repeatability and sensitivity to various bacteria and substrata, and was applied to evaluate the roles of bacterial surface polymers in Phase I and II attachment, i.e. lipopolysaccharides, type 1 fimbria and capsular colanic acid. The results indicated lipopolysaccharides largely enhanced Phases I and II attachment. Fimbriae increased Phase I attachment but not significantly influence the adhesion strength in Phase II. Moreover, colanic acid had negative effect on attachment in both of Phases I and II. Surface-enhanced Raman scattering was applied to evaluate the chemical components in the biofilm matrix at different growth phases, including initial attached bacteria, colonies and mature biofilm. Three model bacteria, including Escherichia coli, Pseudomonas putida, and Bacillus subtilis, were used to cultivate biofilms. The results showed that the content of carbohydrates, proteins, and nucleic acids in biofilm matrix increased significantly along with the biofilm growth of three bacteria judging from the intensities and appearance probabilities of related marker peaks in the spectra. The content of lipids, however, only increased in the Gram-negative biofilms. Moreover, metagenomic data, coupled with PCR-based 454 pyrosequencing reads, were generated for activated sludge and biofilm from a full-scale hybrid reactor to study the microbial taxonomic and functional differences/connections between activated sludge and biofilm. The results showed that the dominant bacteria co-existed in two samples. Global functions in activated sludge and biofilm metagenomes showed quite similar pattern, revealing the limited differences of overall functions existed in two samples. For nitrogen removal, the diversity and abundance of nitrifiers and denitrifiers in biofilm did not surpass that in activated sludge. Whilst, higher abundances of nitrification and denitrification genes were indeed found in biofilm, suggesting the increased nitrogen removal by applying biofilm might be attributed to removal efficiency rather than biomass accumulation of nitrogen removal bacteria. To investigate the bacterial structure and functions of drinking water biofilm, PCR-based 454 pyrosequencing of 16S rRNA gene and Illumina metagenomic data were generated and analyzed. Significant differences of bacterial diversity and taxonomic structure were found between biofilms formed on stainless steel and plastics. Moreover, ecological succession could be obviously observed during biofilm formation. The metabolic network analysis for drinking water biofilm constructed for the first time. Moreover, the occurrence and abundance of specific genes involving in the bacterial pathway of glutathione metabolism and production/degradation of extracellular polymeric substances were also evaluated.
published_or_final_version
Civil Engineering
Doctoral
Doctor of Philosophy
44

Chen, Nanhua. "Application of high-throughput sequencing for the analyses of PRRSV-host interactions." Diss., Kansas State University, 2014. http://hdl.handle.net/2097/18664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Doctor of Philosophy
Department of Diagnostic Medicine and Pathobiology
Raymond R. R. Rowland
Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) is the most costly virus to the swine industry, worldwide. This study explored the application of deep sequencing techniques to understand better the virus-host interaction. On the virus side, PRRSV exists as a quasispecies. The first application of deep sequencing was to investigate amino acid substitutions in hypervariable regions during acute infection and after virus rebound. The appearance and disappearance of mutations, especially the generation of a new N-glycosylation site in GP5, indicated they are likely the result of immune selection. The second application of deep sequencing was to investigate the quasispecies makeup in pigs with severe combined immunodeficiency (SCID) that lack B and T cells. The results showed the same pattern of amino acid substitutions in SCID and normal littermates and no different mutations were identified between SCID and normal littermates. This suggests the mutations that appear during the early stages of infection are the product of the virus becoming adapted to replication in pigs. The third application of deep sequencing was to investigate the locations of recombination events between GFP-expressing PRRSV infectious clones. The results identified different cross-over occurred within three conserved regions between EGFP and GFPm genes. And finally, the fourth goal was applied to develop a set of sequencing tools for analyzing the host antibody repertoire. A simple method was developed to amplify swine VDJ repertoires. Shared and abundant VDJ sequences that are likely expressed by PRRSV-activated B cells were determined in pigs that had different neutralization activities. These sequences are potentially correlated with different antibody responses.
45

Bellos, Evangelos. "Statistical methods for elucidating copy number variation in high-throughput sequencing studies." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24867.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Copy number variation (CNV) is pervasive in the human genome and has been shown to contribute significantly to phenotypic diversity and disease aetiology. High-throughput sequencing (HTS) technologies have allowed for the systematic investigation of CNV at an unprecedented resolution. HTS studies offer multiple distinct features that can provide evidence for the presence of CNV. We have developed an integrative statistical framework that jointly analyses multiple sequencing features at the population level to achieve sensitive and precise discovery of CNV. First, we applied our framework to low-coverage whole-genome sequencing experiments and used data from the 1000 Genomes Project to demonstrate a substantial improvement in CNV detection accuracy over existing methods. Next, we extended our approach to targeted HTS experiments, which offer improved cost-efficiency by focusing on a predetermined subset of the genome. Targeted HTS involves an enrichment step that introduces non-uniformity in sequencing coverage across target regions and thus hinders CNV identification. To that end, we designed a customized normalization procedure that counteracts the effects of enrichment bias and enhances the underlying CNV signal. Our extended framework was benchmarked on contiguous capture datasets, where it was shown to outperform competing strategies by a wide margin. Capture sequencing can also generate large amounts of data in untargeted genomic regions. Although these off-target results can be a valuable source of CNV evidence, they are subject to complex enrichment patterns that confound their interpretation. Therefore, we developed the first normalization strategy that can adapt to the highly heterogeneous nature of off-target capture and thus facilitate CNV investigation in untargeted regions. All in all, we present a generalized CNV detection toolset that has been shown to achieve robust performance across datasets and sequencing platforms and can therefore provide valuable insight into the prevalence and impact of CNV.
46

Oral, Münevver. "Insights into isogenic clonal fish line development using high-throughput sequencing technologies." Thesis, University of Stirling, 2016. http://hdl.handle.net/1893/24909.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Isogenic clonal fish lines are a powerful resource for aquaculture-related research. Fully inbred individuals, clone founders, can be produced either through mitotic gynogenesis or androgenesis and a further generation from those propagates fully inbred clonal lines. Despite rapid generation, as opposed to successive generation of sibling mating as in mice, the production of such lines may be hampered due to (i) potential residual contribution from irradiated gametes associated with poorly optimised protocols, (ii) reduced survival of clone founders and (iii) spontaneous arisal of meiotic gynogenetics with varying degree of heterozygosity, contaminating fully homozygous progenies. This research set out to address challenges and gain insights into isogenic clonal fish lines development by using double-digest RADseq (ddRADseq) to generate large numbers of genetic markers covering the genome of interest. Analysis of potential contribution from irradiated sperm indicated successful uniparental inheritance in meiotic and mitotic gynogenetics European seabass. Exclusive transmission of maternal alleles was detected in G1 progeny of Atlantic salmon (with a duplicated genome), while G2 progenies presented varying levels of sire contribution suggesting sub-optimal UV irradiation which was undetected previously with 27 microsatellite markers. Identification of telomeric markers in European seabass, with higher recombination frequencies for efficient differentiation of meiotic and mitotic gynogenetics was successful, and a genetic linkage map was generated from this data. One clear case of a spontaneous meiotic gynogenetic fish was detected among 18 putative DH fish in European seabass, despite earlier screening for isogenicity using 11 microsatellite markers. An unidentified larval DNA restriction digestion inhibition mechanism observed in Nile tilapia prevented the construction of SNP-based genetic linkage map. In summary, this study provides strong evidence on efficacy of NGS technologies for the development and verification of isogenic clonal fish lines. Reliable establishment of isogenic clonal fish lines is critical for their utility as a research tool.
47

Beckers, Matthew. "Quality checking and expression analysis of high-throughput small RNA sequencing data." Thesis, University of East Anglia, 2015. https://ueaeprints.uea.ac.uk/58581/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The advent of high-throughput RNA sequencing (RNA-seq) methods have made it possible to sequence transcriptomes for the cell-wide identi�cation of small non-coding RNAs (sRNAs) and to assess their regulation using di�erential expression analysis by comparing two or more di�erent conditions. During an analysis of a typical set of sRNA sequencing (sRNA-seq) libraries, a large variety of tools and methods are used on the dataset in order to understand the data's quality, content, and to summarise the knowledge gained from the entire analysis. Many of the tools available to do this were created for mRNA sequencing (mRNA-seq) datasets. In this thesis, we present and implement a processing pipeline that can be used to assess the quality and the di�erential expression of sRNA-seq datasets over two or more di�erent conditions. We then utilise aspects of this pipeline in various sRNA-seq experiments. Firstly, we combine our pipeline with current tools for miRNA identi�cation to assess the regulation of miRNAs during larval caste di�erentiation in a novel genome; the European bumblebee (Bombus terrestris). Secondly, we explore the di�erential expression during cell stress of all classes of sRNAs using two cell lines in humans. We also �nd that a speci�c protein, Ro60, is required for the expression of mRNA-derived sRNAs during stress, similar to the way in which sRNAs derived from Y RNAs are regulated. Finally, we utilise our understanding of sRNA mapping patterns, alongside current tools for miRNA identi�cation, to search for functional miRNAs and other sRNAs in the novel genomes of two diatoms. The lack of canonical miRNA predictions in this study has repercussions for the evolutionary theory behind miRNAs. The implementation of our pipeline for sRNA-seq data provides an interactive and quality controlled work ow that can be used to process a dataset from raw sequences to the results of several di�erential expression experiments for all identi�ed sRNA classes within a sequenced transcriptome.
48

Gupta, Namita. "Computational Identification of B Cell Clones in High-Throughput Immunoglobulin Sequencing Data." Thesis, Yale University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10633249.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

Humoral immunity is driven by the expansion, somatic hypermutation, and selection of B cell clones. Each clone is the progeny of a single B cell responding to antigen. with diversified Ig receptors. The advent of next-generation sequencing technologies enables deep profiling of the Ig repertoire. This large-scale characterization provides a window into the micro-evolutionary dynamics of the adaptive immune response and has a variety of applications in basic science and clinical studies. Clonal relationships are not directly measured, but must be computationally inferred from these sequencing data. In this dissertation, we use a combination of human experimental and simulated data to characterize the performance of hierarchical clustering-based methods for partitioning sequences into clones. Our results suggest that hierarchical clustering using single linkage with nucleotide Hamming distance identifies clones with high confidence and provides a fully automated method for clonal grouping. The performance estimates we develop provide important context to interpret clonal analysis of repertoire sequencing data and allow for rigorous testing of other clonal grouping algorithms. We present the clonal grouping tool as well as other tools for advanced analyses of large-scale Ig repertoire sequencing data through a suite of utilities, Change-O. All Change-O tools utilize a common data format, which enables the seamless integration of multiple analyses into a single workflow. We then apply the Change-O suite in concert with the nucleotide coding se- quences for WNV-specific antibodies derived from single cells to identify expanded WNV-specific clones in the repertoires of recently infected subjects through quantitative Ig repertoire sequencing analysis. The method proposed in this dissertation to computationally identify B cell clones in Ig repertoire sequencing data with high confidence is made available through the Change-O suite and can be applied to provide insight into the dynamics of the adaptive immune response.

49

Marticke, Simone Sigrid. "Ultra-high throughput sequencing analysis of FOXP2 occupancy in the human genome /." May be available electronically:, 2008. http://proquest.umi.com/login?COPT=REJTPTU1MTUmSU5UPTAmVkVSPTI=&clientId=12498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

MORGAN, ANNA. "Identification of New Hereditary Hearing Loss Genes Using High-Throughput Sequencing Technologies." Doctoral thesis, Università degli Studi di Trieste, 2017. http://hdl.handle.net/11368/2908121.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Hearing loss (HL) is the most frequent birth defect in developed societies affecting approximately 1 to 3 in every 1000 live births. HL is a remarkably complex and heterogeneous disease presenting with various phenotypes as a result of both genetic and environmental factors. Within genetic or hereditary hearing loss (HHL) about 70% of cases can be classified as non-syndromic hearing loss (NSHL), i.e. with the absence of abnormalities in other organs, and to date 158 NSHL loci and 95 genes have been reported as causative. Considering that the achievement of a correct molecular diagnosis is essential for uncovering the molecular mechanisms of hearing loss, in order to provide patients with prognostic information and personalized risk assessments and reduce public health costs, this study aims to define the genetic cause of hearing loss in a subset of NSHL familial cases coming from both Italy and Qatar. In order to overcome the high genetic heterogeneity of this disease and the fact that different major players seem to be involved in the Italian and Qatari populations, next generation sequencing techniques have been employed in this study. In particular, in the case of the Qatari population, this study represents the first high-throughput screening for the molecular diagnosis of hearing loss, being thus extremely valuable from an epidemiological point of view. As a first step, patients have been screened for 96 deafness-genes using a custom targeted re-sequencing (TRS) panel. Data analysis led to the identification of the molecular cause in 50% of all families, highlighting TECTA and MYO7A as major players in the Italian population, and CDH23 and TMC1 in the Qatari one. . Families negative to TRS have been selected for whole exome sequencing (WES) analysis, with the purpose of discovering new disease-related genes. So far two new candidates, SPATC1L and PLS1, in two Italian families have been identified. SPATC1L encodes the speriolin-like protein, whose function is still unknown. A novel stop variant has been identified in an Italian family affected by autosomal dominant NSHL (ADNSHL) and some functional studies (i.e. expression analysis in mouse whole cochleae, in vitro molecular cloning) together with statistical analysis (i.e. a candidate-gene population-based statistical study in cohorts from Caucasus and Central Asia) supported the role of this gene in hearing function and loss. In the case of PLS1, a new missense variant has been identified in an Italian ADNSHL family. The gene encodes the plastin-1 protein, which has already been associated to hearing loss in mice. The generation of a knock-in in the Zebrafish model (in collaboration with ZeClinics, a Biotech Contract Research Organization (CRO) and early-phase biopharmaceutical (PHARMA) company using Zebrafish for the study of human diseases, located in Barcelona, Spain) is now in progress and its gene expression in Zebrafish larvae inner ear has been preliminary confirmed. Altogether these results clearly proved that TRS followed by WES and functional studies are powerful tools for both the molecular diagnosis of NSHL, and the identification of new disease-related genes.

To the bibliography