Dissertations / Theses on the topic 'EXOME SEQUENCING DATA'

To see the other types of publications on this topic, follow the link: EXOME SEQUENCING DATA.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 15 dissertations / theses for your research on the topic 'EXOME SEQUENCING DATA.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sigurgeirsson, Benjamín. "Analysis of RNA and DNA sequencing data : Improved bioinformatics applications." Doctoral thesis, KTH, Genteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-184158.

Full text
Abstract:
Massively parallel sequencing has rapidly revolutionized DNA and RNA research. Sample preparations are steadfastly advancing, sequencing costs have plummeted and throughput is ever growing. This progress has resulted in exponential growth in data generation with a corresponding demand for bioinformatic solutions. This thesis addresses methodological aspects of this sequencing revolution and applies it to selected biological topics. Papers I and II are technical in nature and concern sample preparation and data anal- ysis of RNA sequencing data. Paper I is focused on RNA degradation and paper II on generating strand specific RNA-seq libraries. Paper III and IV deal with current biological issues. In paper III, whole exomes of cancer patients undergoing chemotherapy are sequenced and their genetic variants associ- ated to their toxicity induced adverse drug reactions. In paper IV a comprehensive view of the gene expression of the endometrium is assessed from two time points of the menstrual cycle. Together these papers show relevant aspects of contemporary sequencing technologies and how it can be applied to diverse biological topics.

QC 20160329

APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Lu, and 张璐. "Identification and prioritization of single nucleotide variation for Mendelian disorders from whole exome sequencing data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2012. http://hub.hku.hk/bib/B48521905.

Full text
Abstract:
With the completion of human genome sequencing project and the rapid development of sequencing technologies, our capacity in tackling with genetic and genomic changes that underlie human diseases has never been greater. The recent successes in identifying disease causal single nucleotide variations (SNVs) for Mendelian disorders using whole exome sequencing may bring us one step further to understand the pathogenesis of Mendelian diseases. However, many hurdles need to be overcome before the promises can become widespread reality. In this study, we investigated various strategies and designed a toolkit named PriSNV for SNV identification and prioritization, respectively. The SNV identification pipeline including read alignment, PCR duplication removal, indel realignment, base quality score recalibration, SNV and genotype calling was examined by simulation and real sequencing data. By incorporating sequencing errors and small indels, most of the read alignment software can achieve satisfied results. Nonetheless, the reads with medium size and large indels are prone to be wrongly mapped to the reference genome due to the limitation of gap opening strategies of available read alignment software. In addition, although mapping quality can only reflect certain information of the mapping error rate, it is still important to be adopted to filter out obvious read alignment errors. The PCR duplication removal, indel realignment and base quality score recalibration have proven to be necessary and can substantially reduce the false positive SNV calls. Based on the same quality criterion, Varscan performs as the most sensitive software for SNV calling, unfortunately at mean time the false positive calls are enriched in its result. In order to prioritize the small subset of functionally important variants from tens of thousands of variants in whole human exome, we developed a toolkit called PriSNV, a systematic prioritization pipeline that makes use of information on variant quality, gene candidacy based on the number of novel nonsynonymous mutations in a gene, gene functional annotation, known involvement in the disease or relevant pathways, and location in linkage regions. Prediction of functional impact of the coding variants is also used to aid the search for causal mutations in Mendelian disorders. For the patient affected by Chron's disease, the candidate genes can be substantially reduced from 9615 to 3 by the gene selection strategies implemented in PriSNV. In general, our results for SNV identification can help the biologists to realize the limitation of available software and shed light on the development of new strategies for accurately identifying SNV calls in the future. PriSNV, the software we developed for SNV prioritization, can provide significant help to biologists in prioritizing SNV calls in a systematic way and reducing search space for further analysis and experimental verification.
published_or_final_version
Paediatrics and Adolescent Medicine
Master
Master of Philosophy
APA, Harvard, Vancouver, ISO, and other styles
3

Carraro, Marco. "Development of bioinformatics tools to predict disease predisposition from Next Generation Sequencing (NGS) data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3426807.

Full text
Abstract:
The sequencing of the human genome has opened up completely new avenues in research and the notion of personalized medicine has become common. DNA Sequencing technology has evolved by several orders of magnitude, coming into the range of $1,000 for a complete human genome. The promise of identifying genetic variants that influence our lifestyles and make us susceptible to diseases is now becoming reality. However, genome interpretation remains one the most challenging problems of modern biology. The focus of my PhD project is the development of bioinformatics tools to predict diseases predisposition from sequencing data. Several of these methods have been tested in the context of the Critical Assessment of Genome Interpretation (CAGI), always achieving good prediction performances. During my PhD project I faced the complete spectrum of challenges to be address in order to translate the sequencing revolution into clinical practice. One of the biggest problem when dealing with sequencing data is the interpretation of variants pathogenic effect. Dozens of bioinformatics tools have been created to separate mutations that could be involved in a pathogenic phenotype from neutral variants. In this context the problem of benchmarking is critical, as prediction performance are usually tested on different sets of variants, making the comparison among these tools impossible. To address this problem I performed a blinded comparison of pathogenicity predictors in the context of CAGI, realizing the most complete performance assessment among all the iterations of this collaborative experiment. Another challenge that needs to be address to realize the personalized medicine revolution is the phenotype prediction. During my PhD I had the opportunity to develop several methods for the complex phenotype prediction from targeted enrichment and exome sequencing data. In this context challenges like misinterpretation or overinterpretation of variants pathogenicity have emerged, like in the case of phenotype prediction from the Hopkins Clinical Panel. In addition, other complementary issues of phenotype predictions, like the possible presence of incidental findings have to be considered. Ad hoc prediction strategies have been defined while facing with different kinds of sequencing data. A clear example is the case of Crohn’s disease risk prediction. Always in the context of the CAGI experiment, three iterations of this prediction challenge have been run so far. Analysis of datasets revealed how population structure and bias in data preparation and sequencing could affect prediction performance, leading to inflated results. For this reason a completely new prediction strategy has been defined for the last edition of the Crohn’s disease challenge, exploiting data from Genome Wide Association Studies and Protein Protein Interaction network, to address the problem of missing heritability. Good prediction performance have been achieved, especially for individuals with an extreme predicted risk score. Last, my work has been focused on the prediction of a health related trait: the blood group phenotype. The accuracy of serological tests is very poor for minor blood groups or weak phenotypes. Blood groups incompatibilities can be harmful for critical individuals like oncohematological patients. BOOGIE exploits haplotype tables, and the nearest neighbor algorithm to identify the correct phenotype of a patient. The accuracy of our method has been tested in ABO and RhD systems achieving good results. In addition, our analyses paved the way for a further increase in performance, moving towards a prediction system that in the future could become a real alternative to wet lab experiments.
Il completamento del progetto genoma umano ha aperto numerosi nuovi orizzonti di ricerca. Tra questi, la possibilità di conoscere le basi genetiche che rendono ogni individuo suscettibile alle diverse malattie ha aperto la strada ad una nuova rivoluzione: l’avvento della medicina personalizzata. Le tecnologie di sequenziamento del DNA hanno subito una notevole evoluzione, ed oggi il prezzo per sequenziare un genoma è ormai prossimo alla soglia psicologica dei $ 1 000. La promessa di identificare varianti genetiche che influenzano il nostro stile di vita e che ci rendono suscettibili alle malattie sta quindi diventando realtà. Tuttavia, molto lavoro è ancora necessario perché questo nuovo tipo di medicina possa trasformarsi in realtà. In particolare la sfida oggi non è più data dalla generazione dei dati di sequenziamento, ma è rappresentata invece dalla loro interpretazione. L'obiettivo del mio progetto di dottorato è lo sviluppo di metodi bioinformatici per predire la predisposizione a patologie, a partire da dati di sequenziamento. Molti di questi metodi sono stati testati nel contesto del Critical Assessment of Genome Interpretation (CAGI), una competizione internazionale focalizzata nel definire lo stato dell’arte per l’interpretazione del genoma, ottenendo sempre buoni risultati. Durante il mio progetto di dottorato ho avuto l'opportunità di affrontare l’intero spettro delle sfide che devono essere gestite per tradurre le nuove capacità di sequenziamento del genoma in pratica clinica. Uno dei problemi principali che si devono gestire quando si ha a che fare con dati di sequenziamento è l'interpretazione della patogenicità delle mutazioni. Decine di predittori sono stati creati per separare varianti neutrali dalle mutazioni che possono essere causa di un fenotipo patologico. In questo contesto il problema del benchmarking è fondamentale, in quanto le prestazioni di questi tool sono di solito testate su diversi dataset di varianti, rendendo impossibile un confronto di performance. Per affrontare questo problema, una comparazione dell’accuratezza di questi predittori è stata effettuata su un set di mutazioni con fenotipo ignoto nel contesto del CAGI, realizzando la valutazione per predittori di patogenicità più completa tra tutte le edizioni di questo esperimento collaborativo. La previsione di fenotipi a partire da dati di sequenziamento è un'altra sfida che deve essere affrontata per realizzare le promesse della medicina personalizzata. Durante il mio dottorato ho avuto l'opportunità di sviluppare diversi predittori per fenotipi complessi utilizzando dati provenienti da pannelli genici ed esomi. In questo contesto sono stati affrontati problemi come errori di interpretazione o la sovra interpretazione della patogenicità della varianti, come nel caso della sfida focalizzata sulla predizione di fenotipi a partire dall’Hopkins Clinical Panel. Sono inoltre emersi altri problemi complementari alla previsione di fenotipo, come per esempio la possibile presenza di risultati accidentali. Specifiche strategie di predizione sono state definite lavorando con diversi tipi di dati di sequenziamento. Un esempio è dato dal morbo di Crohn. Tre edizioni del CAGI hanno proposto la sfida di identificare individui sani o affetti da questa patologia infiammatoria utilizzando unicamente dati di sequenziamento dell’esoma. L'analisi dei dataset ha rivelato come la presenza di struttura di popolazione e problemi nella preparazione e sequenziamento degli esomi abbiano compromesso le predizioni per questo fenotipo, generando una sovrastima delle performance di predizione. Tenendo in considerazione questo dato è stata definita una strategia di predizione completamente nuova per questo fenotipo, testata in occasione dell'ultima edizione del CAGI. Dati provenienti da studi di associazione GWAS e l’analisi delle reti di interazione proteica sono stati utilizzati per definire liste di geni coinvolti nell’insorgenza della malattia. Buone performance di predizione sono state ottenute in particolare per gli individui a cui era stata assegnata una elevata probabilità di essere affetti. In ultima istanza, il mio lavoro è stato focalizzato sulla predizione di gruppi sanguigni, sempre a partire da dati di sequenziamento. L'accuratezza dei test sierologici, infatti, è ridotta in caso di gruppi di sangue minori o fenotipi deboli. Incompatibilità per tali gruppi sanguigni possono essere critiche per alcune classi di individui, come nel caso dei pazienti oncoematologici. La nostra strategia di predizione ha sfruttato i dati genotipici per geni che codificano per gruppi sanguigni, presenti in database dedicati, e il principio di nearest neighbour per effettuare le predizioni. L’accuratezza del nostro metodo è stata testata sui sistemi ABO e RhD ottenendo buone performance di predizione. Inoltre le nostre analisi hanno aperto la strada ad un ulteriore aumento delle prestazioni per questo tool.
APA, Harvard, Vancouver, ISO, and other styles
4

Fewings, Eleanor Rose. "The use of whole exome sequencing data to identify candidate genes involved in cancer and benign tumour predisposition." Thesis, University of Cambridge, 2019. https://www.repository.cam.ac.uk/handle/1810/285963.

Full text
Abstract:
The development of whole exome sequencing has transformed the study of disease predisposition. The sequencing of both large disease sets and smaller rare disease families enables the identification of new predisposition variants and potentially provide clinical insight into disease management. There is no standard protocol for analysing exome sequencing data. Outside of extremely large sequencing studies including thousands of individuals, statistical approaches are often underpowered to detect rare disease associated variants. Aggregation of variants into functionally related regions, including genes, gene clusters, and pathways, allows for the detection of biological processes that, when interrupted, may impact disease risk. In silico functional studies can also be utilised to further understand how variants disrupt biological processes and identify genotype-phenotype relationships. This study describes the exploration of sequencing datasets from cancers and benign tumour diseases including: i) hereditary diffuse gastric cancer, ii) sweat duct proliferation tumours, iii) adrenocortical carcinoma, and iv) breast cancer. Each set underwent germline whole exome sequencing followed by additional tumour or targeted sequencing to identify associated predisposition genes. Variants within a cluster of risk genes that are involved in double strand break repair were identified as associated with hereditary diffuse gastric cancer risk via gene ontology enrichment analysis. This cluster included PALB2 within which, using externally collated data, loss of function variants were identified as significantly associated with hereditary diffuse gastric cancer risk. Germline protein-affecting variants in the myosin gene MYH9 were identified in all individuals with a rare sweat duct proliferative syndrome, suggesting a role for MYH9 in skin development, regulation and tumorigenesis. These MYH9 variants were analysed in silico to identify a genotype-phenotype relationship between the clinical presentation and variants in the ATP binding pocket of the protein. Tumour matched normal sequence data from adrenocortical carcinoma cases was used to elucidate the role of Lynch syndrome genes in disease pathogenesis. Within the breast cancer set, candidate genes were selected to undergo targeted sequencing in a larger set of cases to further explore their role in breast cancer risk. Risk associated genes identified within this study may ultimately aid in diagnosis and management of disease. This thesis has also generated multiple novel tools and sequencing analysis techniques that may be of use for further studies by aiding in the prioritisation of candidate variants. The described techniques will provide support to researchers working on rare, statistically underpowered datasets and to provide standard analysis pipelines for a range of dataset sizes and types, including familial data and unrelated individuals.
APA, Harvard, Vancouver, ISO, and other styles
5

Chennen, Kirsley. "Maladies rares et "Big Data" : solutions bioinformatiques vers une analyse guidée par les connaissances : applications aux ciliopathies." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAJ076/document.

Full text
Abstract:
Au cours de la dernière décennie, la recherche biomédicale et la pratique médicale ont été révolutionné par l'ère post-génomique et l'émergence des « Big Data » en biologie. Il existe toutefois, le cas particulier des maladies rares caractérisées par la rareté, allant de l’effectif des patients jusqu'aux connaissances sur le domaine. Néanmoins, les maladies rares représentent un réel intérêt, car les connaissances fondamentales accumulées en temps que modèle d'études et les solutions thérapeutique qui en découlent peuvent également bénéficier à des maladies plus communes. Cette thèse porte sur le développement de nouvelles solutions bioinformatiques, intégrant des données Big Data et des approches guidées par la connaissance pour améliorer l'étude des maladies rares. En particulier, mon travail a permis (i) la création de PubAthena, un outil de criblage de la littérature pour la recommandation de nouvelles publications pertinentes, (ii) le développement d'un outil pour l'analyse de données exomique, VarScrut, qui combine des connaissance multiniveaux pour améliorer le taux de résolution
Over the last decade, biomedical research and medical practice have been revolutionized by the post-genomic era and the emergence of Big Data in biology. The field of rare diseases, are characterized by scarcity from the patient to the domain knowledge. Nevertheless, rare diseases represent a real interest as the fundamental knowledge accumulated as well as the developed therapeutic solutions can also benefit to common underlying disorders. This thesis focuses on the development of new bioinformatics solutions, integrating Big Data and Big Data associated approaches to improve the study of rare diseases. In particular, my work resulted in (i) the creation of PubAthena, a tool for the recommendation of relevant literature updates, (ii) the development of a tool for the analysis of exome datasets, VarScrut, which combines multi-level knowledge to improve the resolution rate
APA, Harvard, Vancouver, ISO, and other styles
6

Chakrabortty, Sharmistha. "SNPs and Indels Analysis in Human Genome using Computer Simulation and Sequencing Data." University of Toledo Health Science Campus / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=mco1501726874739045.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bertoldi, Loris. "Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421950.

Full text
Abstract:
In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors.
Nell’ultimo decennio, l’enorme diminuzione del costo del sequenziamento dovuto allo sviluppo di tecnologie ad alto rendimento ha completamente rivoluzionato il modo di approcciare i problemi genetici. In particolare, il sequenziamento dell’intero esoma e dell’intero genoma stanno contribuendo ad un progresso straordinario nello studio delle varianti genetiche umane, aprendo nuove prospettive nella medicina personalizzata. Essendo un campo relativamente nuovo e in rapido sviluppo, strumenti appropriati e conoscenze specializzate sono richieste per un’efficiente produzione e analisi dei dati. Per rimanere al passo con i tempi, nel 2014, l’Università degli Studi di Padova ha finanziato il progetto strategico BioInfoGen con l’obiettivo di sviluppare tecnologie e competenze nella bioinformatica e nella biologia molecolare applicate alla genomica personalizzata. Lo scopo del mio dottorato è stato quello di contribuire a questa sfida, implementando una serie di strumenti innovativi, al fine di applicarli per investigare e possibilmente risolvere i casi studio inclusi all’interno del progetto. Inizialmente ho sviluppato una pipeline per analizzare i dati Illumina, capace di eseguire in sequenza tutti i processi necessari per passare dai dati grezzi alla scoperta delle varianti sia germinali che somatiche. Le prestazioni del sistema sono state testate mediante controlli interni e tramite la sua applicazione su un gruppo di pazienti affetti da tumore gastrico, ottenendo risultati interessanti. Dopo essere state chiamate, le varianti devono essere annotate al fine di definire alcune loro proprietà come la posizione a livello del trascritto e della proteina, l’impatto sulla sequenza proteica, la patogenicità, ecc. Poiché la maggior parte degli annotatori disponibili presentavano errori sistematici che causavano una bassa coerenza nell’annotazione finale, ho implementato VarPred, un nuovo strumento per l’annotazione delle varianti, che garantisce la migliore accuratezza (>99%) comparato con lo stato dell’arte, mostrando allo stesso tempo buoni tempi di esecuzione. Per facilitare l’utilizzo di VarPred, ho sviluppato un’interfaccia web molto intuitiva, che permette non solo la visualizzazione grafica dei risultati, ma anche una semplice strategia di filtraggio. Inoltre, per un’efficace prioritizzazione mediata dall’utente delle varianti umane, ho sviluppato QueryOR, una piattaforma web adatta alla ricerca all’interno dei geni causativi, ma utile anche per trovare nuove associazioni gene-malattia. QueryOR combina svariate caratteristiche innovative che lo rendono comprensivo, flessibile e facile da usare. La prioritizzazione è raggiunta tramite un processo di selezione positiva che fa emergere le varianti maggiormente significative, piuttosto che filtrare quelle che non soddisfano i criteri imposti. QueryOR è stato usato per analizzare i due casi studio inclusi all’interno del progetto BioInfoGen. In particolare, ha permesso di scoprire le varianti causative dei pazienti affetti da malattie da accumulo lisosomiale, evidenziando inoltre l’efficacia del pannello di sequenziamento sviluppato. Dall’altro lato invece QueryOR ha semplificato l’individuazione del gene LRP2 come possibile candidato per spiegare i soggetti con un fenotipo simile alla malattia di Dent, ma senza alcuna mutazione nei due geni precedentemente descritti come causativi, CLCN5 e OCRL. Come corollario finale, è stata effettuata un’analisi estensiva su varianti esomiche ricorrenti, mostrando come la loro origine possa essere principalmente spiegata da imprecisioni nel genoma di riferimento, tra cui regioni mal assemblate e basi non corrette, piuttosto che da errori piattaforma-specifici.
APA, Harvard, Vancouver, ISO, and other styles
8

Hsieh, PingHsun. "Model-Based Population Genetics in Indigenous Humans: Inferences of Demographic History, Adaptive Selection, and African Archaic Admixture using Whole-Genome/Exome Sequencing Data." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/612540.

Full text
Abstract:
Reconstructing the origins and evolutionary journey of humans is a central piece of biology. Complementary to archeology, population genetics studying genetic variation among individuals in extant populations has made considerable progress in understanding the evolution of our species. Particularly, studies in indigenous humans provide valuable insights on the prehistory of humans because their life history closely resembles that of our ancestors. Despite these efforts, it can be difficult to disentangle population genetic inferences because of the interplay among evolutionary forces, including mutation, recombination, selection, and demographic processes. To date, few studies have adopted a comprehensive framework to jointly account for these confounding effects. The shortage of such an approach inspired this dissertation work, which centered on the development of model-based analysis and demonstrated its importance in population genetic inferences. Indigenous African Pygmy hunter-gatherers have been long studied because of interest in their short stature, foraging subsistence strategy in rainforests, and long-term socio-economic relationship with nearby farmers. I proposed detailed demographic models using genomes from seven Western African Pygmies and nine Western African farmers (Appendix A). Statistical evidence was shown for a much deeper divergence than previously thought and for asymmetric migrations with a larger contribution from the farmers to Pygmies. The model-based analyses revealed significant adaption signals in the Pygmies for genes involved in muscle development, bone synthesis, immunity, reproduction, etc. I also showed that the proposed model-based approach is robust to the confounding effects of evolutionary forces (Appendix A). Contrary to the low-latitude African homeland of humans, the indigenous Siberians are long-term survivors inhabiting one of the coldest places on Earth. Leveraging whole exome sequencing data from two Siberian populations, I presented demographic models for these North Asian dwellers that include divergence, isolation, and gene flow (Appendix B). The best-fit models suggested a closer genetic affinity of these Siberians to East Asians than to Europeans. Using the model-based framework, seven NCBI BioSystems gene sets showed significance for polygenic selection in these Siberians. Interestingly, many of these candidate gene sets are heavily related to diet, indicating possible adaptations to special dietary requirements in these populations in cold, resource-limited environments. Finally, I moved beyond studying the history of extant humans to explore the origins of our species in Africa (Appendix C). Specifically, with statistical analyses using genomes only from extant Africans, I rejected the null model of no archaic admixture in Africa and in turn gave the first whole-genome evidence for interbreeding among human species in Africa. Using extensive simulation analyses under various archaic admixture models, the results suggest recurrent admixture between the ancestors of archaic and modern Africans, with evidence that at least one such event occurred in the last 30,000 years in Africa.
APA, Harvard, Vancouver, ISO, and other styles
9

Nambot, Sophie. "Exploration pangénomique des anomalies du développement de causes rares." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCI012.

Full text
Abstract:
Titre : Exploration pangénomique des anomalies du développement de causes raresMots clés : anomalies du développement, séquençage d’exome, paratage de données, phénotypage inverseLes anomalies du développement sont un groupe de maladies hétérogènes, tant sur le plan clinique que moléculaire. Elles comprennent plus de 3.000 maladies monogéniques, mais seulement un tiers d’entre elles ont actuellement une cause moléculaire connue. Bien que les progrès des techniques de séquençage aient permis d’identifier des centaines de nouveaux gènes ces dernières années, de nombreux patients restent encore sans diagnostic. La grande hétérogénéité génétique de ces pathologies met à l’épreuve la démarche diagnostique classique comprenant une expertise clinique, une étude pan-génomique par puce à ADN et/ou l’analyse ciblée de gènes connus et, depuis peu, le séquençage haut débit d’exome ciblé sur les gènes associés à une pathologie humaine. En attendant que le séquençage du génome soit économiquement plus accessible et l’interprétaion des ses données mieux appréhendée pour une utilisation diagnostique, nous avons choisi d’explorer de nouvelles stratégies afin d’optimiser le séquençage d’exome dans l’identification de nouvelles bases moléculaires.Le premier article a pour objectif de démontrer la faisabilité et l’efficacité de la réanalyse annuelle des données de séquençage d’exome négatif dans un cadre diagnostique. Les patients éligibles à l’étude présentaient une anomalie du développement sans cause moléculaire établie après une démarche diagnostique classique incluant une analyse chromosomique sur puce à ADN et une analyse d’exome diagnostique. Cette première étude a permis de réaliser un nombre significatif de diagnostics supplémentaires, mais aussi d’identifier des variations candidates pour lesquelles nous avons utilisé le partage international de données et l’approche de phénotypage inverse pour établir des corrélations phénotype-génotype et des cohortes de réplication génotypique et/ou phénotypique. Ces stratégies nous ont permis de remplir les critères ACMG nécessaires pour établir la pathogénicité de ces variations.Fort de cette expérience et souhaitant aller plus loin dans l’identification de nouvelles bases moléculaires pour nos patients, nous avons poursuivi cet effort de réanalyse dans un cadre de recherche. Ce travail fait l’objet du second article de cette thèse et a conduit à l’identification de 17 nouveaux gènes d’anomalies du développement. Le partage de données a conduit à l’élaboration de nombreuses collaborations internationales et de plusieurs études fonctionnelles par des équipes spécialisées.L’application de ces outils dans une forme syndromique de déficience intellectuelle ultra-rare est illustrée à travers le troisieme article. Suite à un effort collaboratif important, nous avons pu décrire de manière précise le phénotype de 25 patients jamais rapportés dans la littérature porteurs de variations pathogènes au sein du gène TBR1, gène candidat dans les troubles du spectre autistique associés à une déficience intellectuelle.Ces différents travaux démontrent l’efficacité de stratégies innovantes dans l’identification de nouvelles bases moléculaires chez les patients atteints d’anomalies du développement, à savoir la réanalyse des données d’exome, le phénotypage inverse et le partage international de données. Pour les patients et leur famille, cela permet de comprendre l’origine de leur pathologie, de mettre fin à l’errance diagnostique, de préciser le pronostic et l’évolution développementale probable, et la mise en place d’une prise en charge adaptés. Il est aussi indispensable pour fournir un conseil génétique fiable, et éventuellement proposer un diagnostic prénatal voire pré-implantatoire. Pour les généticiens, cela permet la compréhension de nouveaux processus physiopathologiques, l’élaboration de nouveaux tests diagnostiques et la découverte de nouvelles cibles thérapeutiques
Title : Genome-wide exploration of congenital anomalies of rare causesKey words : congenital anomalies, exome sequencing, data-sharing, reverse phenotypingCongenital anomalies are a group of diseases that are both clinically and molecularly heterogeneous. They include more than 3,000 monogenic diseases, but only a third of them have a known molecular cause. Although advances in sequencing techniques have identified hundreds of new genes in recent years, many patients remain undiagnosed. The vast genetic heterogeneity of these conditions challenges the conventional diagnostic approach that typically includes clinical expertise, a pan-genomic microarray study and/or targeted analysis of known genes and, recently, exome sequencing targeting the genes already associated with human disease. Until genome sequencing becomes more affordable and the interpretation of its data for diagnostic use is better perceived, we have chosen to explore new strategies to optimize the identification of new molecular bases through exome sequencing.The first article aimed to demonstrate the feasibility and effectiveness of annual reanalysis of negative exome sequencing data in a diagnostic setting. Patients eligible for the study had developmental anomalies, but no molecular cause was established after a standard diagnostic procedure including DNA chromosome analysis and diagnostic exome analysis. This first study yielded a significant number of additional diagnoses, but also identified candidate variants for which we used international data-sharing and reverse phenotyping to establish cohorts of genotypic and/or phenotypic replication and genotype-phenotype correlations. These strategies allowed us to meet the ACMG criteria necessary to establish the pathogenicity of these variants.With this experience, and because we wished to go further in identifying new molecular bases for our patients, we continued the reanalysis project within a research framework. This was the focus of the second article of this thesis. The reanalysis project led to the identification of 17 new genes associated with congenital anomalies. Data-sharing has led to the development of numerous international collaborations and functional studies carried out by specialized teams.The third article illustrated the application of these tools in a syndromic form of ultra-rare intellectual disability. Following a considerable collaborative effort, we were able to accurately describe the phenotype of 25 unreported patients in the literature with pathogenic variants in the TBR1 gene, a candidate gene in autism spectrum disorders associated to intellectual disability.These various studies demonstrate how innovative strategies can be effective for identifying new molecular bases in patients with congenital anomalies. These strategies include exome data reanalysis, reverse phenotyping, and international data-sharing. For patients and their families, knowing the molecular basis of the disease makes it possible to understand the origin of the condition and to put an end to diagnostic wandering. In addition, they are able to learn more about the prognosis and developmental progression, and they can obtain appropriate care management. This information is also essential for reliable genetic counseling, and may offer the possibility of prenatal or even pre-implantation diagnosis. These new diagnoses also give geneticists a chance to understand new physiopathological processes, to develop new diagnostic tests and even to discover new therapeutic targets
APA, Harvard, Vancouver, ISO, and other styles
10

GIOVANNETTI, AGNESE. "Analysis of non-coding DNA from whole exome sequencing data." Doctoral thesis, 2019. http://hdl.handle.net/11573/1234470.

Full text
Abstract:
Next Generation Sequencing technologies have completely changed the way to study molecular bases underlying Rare Genetic Diseases (RGDs). Currently, sequencing of the exonic portion of the human genome – the exome (1%) – performed through Whole Exome Sequencing (WES) experiments represents the most used approach to discover molecular mechanisms underlying RGDs. To date, several tools have been developed to analyse and interpret data generated from WES. However, due to both technical and experimental limitations, its diagnostic rate is ~20-30%. In this context, we evaluated whether WES data contain information on non-coding sequences, focusing on microRNAs (miRNAs). Comparative analysis of capture design and experimental coverage allowed to disclose that in WES data reside information related to miRNA sequences that are efficiently captured by most exome enrichment kits. We therefore analysed WES of a cohort of 259 individuals, including patients affected by several genetic diseases and their unaffected relatives, searching for variants in miRNAs and performing functional annotation. Sanger sequence validation confirms the reliable call of variants mapping in miRNA sequences. To date, no dedicated tool is available to properly retrieve and analyse miRNAs from WES and WGS data. We therefore developed a tool, “AnnomiR”, that allows to systematically analyse miRNA variants and miRNAs, providing functional annotation retrieved from several databases. This tool can be integrated in a standard workflow of analysis for WES and WGS data. WES data contain a great amount of information that is generally discarded by commonly used workflow of analysis and that should be considered, as it could help in the comprehension of molecular mechanisms underlying RGDs. In this context, systematic study of miRNAs could help elucidating their role as disease-causative and phenotypic modifiers in a wide spectrum of human diseases, allowing to achieve a better characterisation of variability of the human genome related to these non-coding sequences.
APA, Harvard, Vancouver, ISO, and other styles
11

Garonzi, Marianna. "ANALYSIS AND INTERPRETATION OF WHOLE EXOME SEQUENCING DATA OF LEUKEMIA PATIENTS." Doctoral thesis, 2017. http://hdl.handle.net/11562/960651.

Full text
Abstract:
Leukemias are a cancer type which affects the leukocytes progenitor cells. These malignancies are highly heterogeneous in terms of molecular mechanisms involved in their onset and progression. Heterogeneity can be further observed within the same subgroup of disease at the inter-individual level, being reflected by different clinical outcomes and responses to treatment in different patients. Unfortunately, the exact leukemia aetiology is still poorly understood and consequently also related prevention, diagnostic, prognostic and follow up methods remain mainly unidentified. Therefore, early-diagnosis, together with specifically tailored approaches to leukemia treatment, still represents a key point in determining patients’ health, life quality and estimated life. Several efforts have been started to improve diagnosis, treatment and disease monitoring of leukemia. In this regard, the work presented in my PhD thesis is part of an international project, named “NGS-PTL: Next Generation Sequencing platform for targeted Personalized Therapy of Leukemia”, whose objective is the development of technologies for the diagnosis and prognosis of haematological cancers. According to the project’s objective, my thesis work aims to identify sequence variants from Whole Exome Sequencing data for the acute types of leukemia, to be used as potential biomarkers to improve therapeutic interventions and for personalize treatments. The work describes the setup and application of a bioinformatic pipeline able to identify the somatic mutations in the leukemia patients and the driver carrier genes, again with the result obtained by its application on all the samples of the project. The setup of the pipeline has required the identification of a set of tools to apply to Cancer sequencing data. In particular, selection of dedicated software to perform the initial pre-processing of the data guarantees the use of sequencing data of high quality and ensures that the subsequent analysis will be performed on well-generated data. Moreover, the selection of MuTect as variant caller has allowed us to overcome specific problems related to the heterogeneity of Cancer sample. The application of these software has led us to the identification of a large and reliable set of somatic variants to be evaluated for the identifications of new biomarkers and driver genes. Then, the interpretation of the somatic variants has required the use of specific database and resources to correctly interpret them and eventually to correlate the mutations with the driving or the development of the leukemia. Using the available biological knowledge, we were able to select likely highly damaging variants, some of which already connected with leukemia in cancer-related sources (COSMIC, ICGC and CIViC). At the end, the discover of genes that drives the development of the disease was performed using three statistical tools on the set of annotated mutations for each leukemia type, leading to the identification of a total of 32 biomarkers. In conclusion, the discovery of potential novel biomarkers, again with the additional biological information provided by the specific resources applied has demonstrated the importance of the application of NGS in the study of Leukemic patients.
APA, Harvard, Vancouver, ISO, and other styles
12

Lee, Cheng-Yang, and 李正揚. "Evaluation and integration of somatic copy number detection tools for whole-exome sequencing data." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/v3rnqf.

Full text
Abstract:
碩士
臺北醫學大學
醫學資訊研究所
104
Copy Number Variations (CNVs) are a form of structural variation that manifest as amplifications, deletions, translocations, and insertions in the genome with segment size larger than 50 bp. Previous studies have reported that CNVs are associated with biological functions of nervous system, cellular development and metabolism in healthy people while also have relationships with diseases such as autism, schizophrenia and obesity. Recent related studies have also uncovered additional important role of CNVs in cancers. With the decreasing costs and high accuracy of next-generation sequencing, whole-exome sequencing ( WES ) has become a dominant method for identifying CNVs in both research and clinical settings. Since the accurate identification of CNVs may affect successful clinical diagnosis and prognosis, substantial efforts have been devoted to develop tools for detecting CNVs for WES, but these tools have their own limitation. However, no single method can achieve the complete detection of all kinds of CNV events. Accordingly, we tried to evaluate as many detection tools as possible by using WES data obtained from TCGA (The Cancer Genome Atlas) GChub, to achieve a fully consideration and evaluation of existing somatic copy number variations ( somatic CNVs, SCNVs ) detection tools. Furthermore, we also constructed and integrated platform for CNVs detection in VM. After evaluation, the study found that ExomeCNV and VEGAWES could have higher accuracy for detecting CNVs; EXCAVATOR could have preference for large CNVs; VarScan2 could need more time to execute CNVs detecting. The study also made a table to summarize all result of evaluation and the table will be convenient for users to find tools which could be fitted their own experimental design. Finally, users can use a simple command line to execute analysis pipeline made by the study to detect CNVs.
APA, Harvard, Vancouver, ISO, and other styles
13

CHAHAL, ASHISH. "ANALYSIS AND ANNOTATION OF EXOME SEQUENCING DATA TO IDENTIFY AND PRIORITIZE GENES RESPONSIBLE FOR PROSTATE ADENOCARCINOMA." Thesis, 2015. http://dspace.dtu.ac.in:8080/jspui/handle/repository/15573.

Full text
Abstract:
After skin cancer prostate cancer is the second most prevalent cancer in men. Somatic mutations in Prostate Adenocarcinoma are revealed by processing of the next-generation DNA sequencing data of the exome region. Mutation in exome region directly effects the expression of the genes and sometimes inhibits the expression which can lead to several diseases. High throughput technologies and NGS analysis enable us to find out variations in the exome region that are involved in complex pathways of cancers. Biomarkers can be identified using NGS and exome sequencing analysis pipelines which can help in diagnosis, treatment and prognosis of the cancer. Exome play a major role in protein profiling so any change in this region affect the individual. PRAD exome data was used to analyze the variations in the exome region. Data for PRAD was downloaded from the TCGA web portal for tumor matched with normal types 17 samples on which exome sequence analysis pipeline were applied to predict and prioritize the genes involved for PRAD pathway. Perl programming language was used to prioritize and analyze the exome data. Perl script maf2vcf.pl, DisGeNET, Annovar software packages were used to find out 93 probable genes that were filtered from DisGeNET. Then 54 genes were found in conserved regions with phastconselements46way score > 400. 17 TCGA IDs samples showed sequence alignment errors which were filtered by matching with segmented duplications. Polyphen2 annotations were used to give scores about the deleterious effect of the variants. After these steps we got the most probable genes that might be responsible for the cause of Prostate Adenocarcinoma (PRAD). GSTT1, TP53, CYP19A1, BRAF genes were already involved in the pathway of occurrence of prostate cancer and these genes were also present in the filtered genes in this study. Using experimental validation methods on the filtered genes we may help in finding out the novel genes that are involved in the complex pathway of prostate cancer.
APA, Harvard, Vancouver, ISO, and other styles
14

Chou, Kai-Ming, and 周楷茗. "A method to screen gene variations that are associated with familial cancers based on the next generation exome sequencing data." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/24640394375292814085.

Full text
Abstract:
碩士
國立陽明大學
生物醫學資訊研究所
101
In this study, I took advantage of exome sequencing technology to determine genetic variants that would cause hereditary cancers. The methods used in this study have three parts: preprocessing (mapping and quality assessment), filtering and consequence prediction. For preprocessing, we used Burrows-Wheeler Aligner (BWA) and Samtools to align reads and to pick up variants. A quality assessment was then undertaken on the basis of exome probe coverage. The results from preprocessing step and the candidate variants found in common population were marked as background. Such variants were unlikely to cause hereditary disease. Ensembl perl APIs were used to annotate consequences of each candidate variants. We also used PolyPhen, SIFT and Condel to predict the severity of each candidate variants. This method has been applied to investigate cases of rare hereditary disease discovered in Taiwan population. These candidate SNVs in other families that have the same hereditary cancer should be validated by experimental appraoches. My method could point out candidate variants which might cause hereditary cancer. In principle, this method could also be applied to other diseases using exome sequencing data.
APA, Harvard, Vancouver, ISO, and other styles
15

Abaji, Rachid. "Using whole-exome sequencing data in an exome-wide association study approach to identify genetic risk factors influencing acute lymphoblastic leukemia response : a focus on asparaginase complications & vincristine-induced peripheral neuropathy." Thesis, 2018. http://hdl.handle.net/1866/24602.

Full text
Abstract:
Le traitement de la leucémie lymphoblastique aiguë (LLA) de l’enfant, une affection d'origine maligne des cellules progénitrices lymphoïdes, s’est considérablement amélioré au cours des dernières décennies. En effet, le taux de succès du traitement a dépassé 90% dans des conditions favorables. Cependant, des toxicités liées au traitement peuvent être fatales et entrainer l’interruption ou la cessation du traitement. L'allergie, la pancréatite et la thrombose sont des complications fréquentes du traitement de la LLA et sont associées à l'utilisation de l'asparaginase (ASNase), tandis qu’une toxicité fréquente due à la vincristine (VCR) induit la neuropathie périphérique (VIPN). Étant donné que l’ajustement du schéma posologique afin d’augmenter l'efficacité et diminuer la toxicité est un processus sensible, ceci demeure un défi majeur dans plusieurs protocoles de traitement. La pharmacogénétique étudie comment des altérations de la composante génétique peuvent influer sur la variabilité interindividuelle observée dans la réponse au traitement. Une meilleure compréhension de la base moléculaire de cette variabilité pourrait améliorer considérablement les résultats du traitement, en permettant la personnalisation de ce dernier en fonction du profil génétique du patient. Des études récentes suggèrent l’avantage d’appliquer l’analyse de l’exome à la découverte de variants associés à des traits humains complexes ainsi qu’à des phénotypes de réactions médicamenteuses. L'objectif de notre travail était d'utiliser les données de séquençage pour réaliser des études d'association à l'échelle de l'exome, y compris des étapes de filtrage et de validation, afin d'identifier de nouveaux variants génétiques susceptibles de moduler le risque de développer des complications associées à ASNase et à VIPN. Douze SNP étaient associés à des complications due à l’ASNase dans la cohorte initiale, dont 3 étaient associés à une allergie, 3 à une pancréatite et 6 à une thrombose. Parmi ceux-ci, les variants rs3809849, rs11556218 et rs34708521 des gènes MYBBP1A, IL16 et SPEF2 respectivement ont été associés à des complications multiples et leur association à une pancréatite a été répliquée dans une cohorte de validation indépendante. En ce qui concerne la VCR, trois variantes ont été associées à la modulation du risque de VIPN: rs2781377 dans SYNE2, rs10513762 dans MRPL47 et rs3803357 dans BAHD1. Nous démontrons également le puissant effet combiné de la présence de plusieurs variants de risque pour chacune des toxicités étudiées et fournissons des modèles de prédiction du risque pour la pancréatite et le VIPN basés sur la méthode d’évaluation du risque génétique pondérée et qui ont été validés à l’interne. De plus, étant donné une association du polymorphisme du gène MYBBP1A avec de multiples issus de traitement, nous avons cherché à comprendre comment cette altération génétique se traduit par des variabilités de réponse aux traitements à l’ASNase. En utilisant la technique CRISPR-CAS9 pour induire l'inactivation de gènes dans des lignées cellulaires cancéreuses PANC1 (pancréatiques) nous avons testé la différence de viabilité entre les cellules inactivées et les cellules du type sauvage à la suite de la suppression du gène et du traitement par ASNase. Nos résultats suggèrent un rôle fonctionnel de ce gène dans la modulation de la viabilité, de la capacité de prolifération et de la morphologie des cellules knock-out, ainsi que dans leur sensibilité à l'ASNase, et plaident en outre pour que le gène influence l’issus du traitement de la LLA par ASNase. Le présent travail démontre que l’utilisation de l’approche de séquençage de l’exome entier dans le contexte d’une étude d’association à l’échelle de l’exome est une stratégie valide « sans hypothèse » pour identifier de nouveaux marqueurs génétiques modulant l’effet du traitement de la LLA de l’enfant, et souligne l’importance de l'effet synergique de la combinaison des locus à risque.
Treatment of childhood acute lymphoblastic leukemia (ALL), a malignant disorder of lymphoid progenitor cells has improved significantly over the past decades and treatment success rates have surpassed 90% in favorable settings. However, treatment-related toxicities can be life-threatening and cause treatment interruption or cessation. Allergy, pancreatitis and thrombosis are common complications of ALL treatment associated with the use of asparaginase (ASNase), while vincristine-induced peripheral neuropathy (VIPN) is a frequent toxicity of vincristine (VCR). It is a sensitive process and a constant struggle to adjust the dosing regimen to ensure maximum efficacy and minimum toxicity. Pharmacogenetics studies show alterations in the genetic component between individuals can influence the observed variability in treatment response. A better understanding of the molecular basis of this variability in drug effect could significantly improve treatment outcome by allowing the personalization of ALL treatment based on the genetic profile of the patient. Emerging reports suggest the benefit of applying exome analysis to uncover variants associated with complex human traits as well as drug response phenotypes. Our objective in this work was to use available whole-exome sequencing data to perform exome-wide association studies followed by stepwise filtering and validation processes to identify novel variants with a potential to modulate the risk of developing ASNase complications and VIPN. Twelve SNPs were associated with ASNase complications in the discovery cohort including 3 associated with allergy, 3 with pancreatitis and 6 with thrombosis. Of those, rs3809849 in MYBBP1A, rs11556218 in IL16 and rs34708521 in SPEF2 genes were associated with multiple complications and their association with pancreatitis was replicated in an independent validation cohort. As for VCR, three variants were associated with modulating the risk of VIPN: rs2781377 in SYNE2, rs10513762 in MRPL47 and rs3803357 in BAHD1. We also demonstrate a strong combined effect of harbouring multiple risk variants for each of the studied toxicities, and provide internally-validated risk-prediction models based on the weighted genetic risk score method for pancreatitis and VIPN. Furthermore, given the association of the polymorphism in MYBBP1A gene with multiple treatment outcomes, we aimed at understanding how this genetic alteration translates into differences in ASNase treatment response through cell-based functional analysis. Using CRISPR-CAS9 technology we produced gene knockout of PANC1 (pancreatic) cancer cell-lines and tested the difference in viability between the knockouts and wild-type cells following gene deletion and ASNase treatment. Our results suggest a functional role of this gene in modulating the viability, proliferation capacity and the morphology of the knockout cells as well as their sensitivity to ASNase and further advocates the implication of the gene in influencing the outcome of ALL treatment with ASNase. The present work demonstrates that using whole-exome sequencing data in the context of exome-wide association study is a successful “hypothesis-free” strategy for identifying novel genetic markers modulating the effect of childhood ALL treatment and highlights the importance of the synergistic effect of combining risk loci.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography