Thèses : « Computational Genomic »

1

Mumey, Brendan Marshall. « Some computational problems from genomic mapping / ». Thesis, Connect to this title online ; UW restricted, 1997. http://hdl.handle.net/1773/6932.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Alkan, Can. « Computational Studies on Evolution and Functionality of Genomic Repeats ». Case Western Reserve University School of Graduate Studies / OhioLINK, 2005. http://rave.ohiolink.edu/etdc/view?acc_num=case1120143436.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Gaspar, Paulo Miguel da Silva. « Computational methods for gene characterization and genomic knowledge extraction ». Doctoral thesis, Universidade de Aveiro, 2014. http://hdl.handle.net/10773/13949.

Texte intégral

Résumé :

Doutoramento conjunto MAPi em Ciências da Computação
Motivation: Medicine and health sciences are changing from the classical symptom-based to a more personalized and genetics-based paradigm, with an invaluable impact in health-care. While advancements in genetics were already contributing significantly to the knowledge of the human organism, the breakthrough achieved by several recent initiatives provided a comprehensive characterization of the human genetic differences, paving the way for a new era of medical diagnosis and personalized medicine. Data generated from these and posterior experiments are now becoming available, but its volume is now well over the humanly feasible to explore. It is then the responsibility of computer scientists to create the means for extracting the information and knowledge contained in that data. Within the available data, genetic structures contain significant amounts of encoded information that has been uncovered in the past decades. Finding, reading and interpreting that information are necessary steps for building computational models of genetic entities, organisms and diseases; a goal that in due course leads to human benefits. Aims: Numerous patterns can be found within the human variome and exome. Exploring these patterns enables the computational analysis and manipulation of digital genomic data, but requires specialized algorithmic approaches. In this work we sought to create and explore efficient methodologies to computationally calculate and combine known biological patterns for various purposes, such as the in silico optimization of genetic structures, analysis of human genes, and prediction of pathogenicity from human genetic variants. Results: We devised several computational strategies to evaluate genes, explore genomes, manipulate sequences, and analyze patients’ variomes. By resorting to combinatorial and optimization techniques we were able to create and combine sequence redesign algorithms to control genetic structures; by combining the access to several web-services and external resources we created tools to explore and analyze available genetic data and patient data; and by using machine learning we developed a workflow for analyzing human mutations and predicting their pathogenicity.
Motivação: A medicina e as ciências da saúde estão atualmente num processo de alteração que muda o paradigma clássico baseado em sintomas para um personalizado e baseado na genética. O valor do impacto desta mudança nos cuidados da saúde é inestimável. Não obstante as contribuições dos avanços na genética para o conhecimento do organismo humano até agora, as descobertas realizadas recentemente por algumas iniciativas forneceram uma caracterização detalhada das diferenças genéticas humanas, abrindo o caminho a uma nova era de diagnóstico médico e medicina personalizada. Os dados gerados por estas e outras iniciativas estão disponíveis mas o seu volume está muito para lá do humanamente explorável, e é portanto da responsabilidade dos cientistas informáticos criar os meios para extrair a informação e conhecimento contidos nesses dados. Dentro dos dados disponíveis estão estruturas genéticas que contêm uma quantidade significativa de informação codificada que tem vindo a ser descoberta nas últimas décadas. Encontrar, ler e interpretar essa informação são passos necessários para construir modelos computacionais de entidades genéticas, organismos e doenças; uma meta que, em devido tempo, leva a benefícios humanos. Objetivos: É possível encontrar vários padrões no varioma e exoma humano. Explorar estes padrões permite a análise e manipulação computacional de dados genéticos digitais, mas requer algoritmos especializados. Neste trabalho procurámos criar e explorar metodologias eficientes para o cálculo e combinação de padrões biológicos conhecidos, com a intenção de realizar otimizações in silico de estruturas genéticas, análises de genes humanos, e previsão da patogenicidade a partir de diferenças genéticas humanas. Resultados: Concebemos várias estratégias computacionais para avaliar genes, explorar genomas, manipular sequências, e analisar o varioma de pacientes. Recorrendo a técnicas combinatórias e de otimização criámos e conjugámos algoritmos de redesenho de sequências para controlar estruturas genéticas; através da combinação do acesso a vários web-services e recursos externos criámos ferramentas para explorar e analisar dados genéticos, incluindo dados de pacientes; e através da aprendizagem automática desenvolvemos um procedimento para analisar mutações humanas e prever a sua patogenicidade.

Styles APA, Harvard, Vancouver, ISO, etc.

4

SINHA, AMIT U. « Discovery and Analysis of Genomic Patterns : Applications to Transcription Factor Binding and Genome Rearrangement ». University of Cincinnati / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1204227723.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Saha, Mandal Arnab. « Computational Analysis of the Evolution of Non-Coding Genomic Sequences ». University of Toledo Health Science Campus / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=mco1372349811.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Danks, Jacob R. « Algorithm Optimizations in Genomic Analysis Using Entropic Dissection ». Thesis, University of North Texas, 2015. https://digital.library.unt.edu/ark:/67531/metadc804921/.

Texte intégral

Résumé :

In recent years, the collection of genomic data has skyrocketed and databases of genomic data are growing at a faster rate than ever before. Although many computational methods have been developed to interpret these data, they tend to struggle to process the ever increasing file sizes that are being produced and fail to take advantage of the advances in multi-core processors by using parallel processing. In some instances, loss of accuracy has been a necessary trade off to allow faster computation of the data. This thesis discusses one such algorithm that has been developed and how changes were made to allow larger input file sizes and reduce the time required to achieve a result without sacrificing accuracy. An information entropy based algorithm was used as a basis to demonstrate these techniques. The algorithm dissects the distinctive patterns underlying genomic data efficiently requiring no a priori knowledge, and thus is applicable in a variety of biological research applications. This research describes how parallel processing and object-oriented programming techniques were used to process larger files in less time and achieve a more accurate result from the algorithm. Through object oriented techniques, the maximum allowable input file size was significantly increased from 200 mb to 2000 mb. Using parallel processing techniques allowed the program to finish processing data in less than half the time of the sequential version. The accuracy of the algorithm was improved by reducing data loss throughout the algorithm. Finally, adding user-friendly options enabled the program to use requests more effectively and further customize the logic used within the algorithm.

Styles APA, Harvard, Vancouver, ISO, etc.

7

CICCOLELLA, SIMONE. « Practical algorithms for Computational Phylogenetics ». Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/364980.

Texte intégral

Résumé :

In questo manoscritto vengono discussi le principali sfide computazionali nel campo della inferenza di filogenesi tumorale a vengono proposte diverse soluzione per i tre principali problemi di (i) ricostruzione dell'evoluzioni di un campione tumorale, (ii) clustering di dati SCS per una piu' pulita e veloce inferenza e (iii) il confronto di diverse filogenesi. Inoltre viene discusso come combinare le diverse soluzioni in una singola pipeline per una piu' rapida analisi.
In this manuscript we described the main computational challenges of the cancer phylogenetic field and we proposed different solutions for the three main problems of (i) the progression reconstruction of a tumor sample, (ii) the clustering of SCS data to allow for a cleaner and faster inference and (iii) the evaluation of different phylogenies. Furthermore we combined them into a usable pipeline to allow for a faster analysis.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Picard, Colette Lafontaine. « Dynamics of DNA methylation and genomic imprinting in arabidopsis ». Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122539.

Texte intégral

Résumé :

Thesis: Ph. D., Massachusetts Institute of Technology, Computational and Systems Biology Program, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 210-226).
DNA methylation is an epigenetic mark that is highly conserved and important in diverse cellular processes, ranging from transposon silencing to genomic imprinting. In plants, DNA methylation is both mitotically and meiotically heritable, and changes in DNA methylation can be generationally stable and have long-lasting consequences. This thesis aims to improve understanding of DNA methylation dynamics in plants, particularly across generations and during reproduction. In the first project, I present an analysis of the generational dynamics of gene body methylation using recombinant inbred lines derived from differentially methylated parents. I show that while gene body methylation is highly generationally stable, changes in methylation state occur nonrandomly and are enriched in regions of intermediate methylation.
Important DNA methylation changes also occur during seed development in flowering plants, and these changes underlie genomic imprinting, the phenomenon of parent-of-origin specific gene expression. In plants, imprinting occurs in the endosperm, a seed tissue that functions analogously to the mammalian placenta. Imprinted expression is linked to DNA methylation patterns that serve to differentiate the maternally- and paternally-inherited alleles, but the mechanisms used to achieve imprinted expression are often unknown. I next explore imprinted expression and DNA methylation in Arabidopsis lyrata, a close relative of the model plant Arabidopsis thaliana. I find that the majority of imprinted genes in A. lyrata endosperm are also imprinted in A. thaliana, suggesting that imprinted expression is generally conserved. Surprisingly, a subset of A. lyrata imprinted genes are associated with a novel DNA methylation pattern and may be regulated by a different mechanism than their A.
thaliana counterparts. I then explore the genetics of paternal suppression of the seed abortion phenotype caused by mutation of a maternally expressed imprinted gene. Finally, I present the first large single-nuclei RNA-seq dataset generated in plants, reporting data from 1,093 individual nuclei obtained from developing seeds. I find evidence of previously uncharacterized cell states in endosperm, and examine imprinted expression at the single-cell level. Together, these projects contribute to our understanding of DNA methylation and imprinting dynamics during plant development, and highlight the strong generational stability of certain DNA methylation patterns.
by Colette Lafontaine Picard.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Computational and Systems Biology Program

Styles APA, Harvard, Vancouver, ISO, etc.

9

Rezwan, Faisal Ibne. « Improving computational predictions of Cis-regulatory binding sites in genomic data ». Thesis, University of Hertfordshire, 2011. http://hdl.handle.net/2299/7133.

Texte intégral

Résumé :

Cis-regulatory elements are the short regions of DNA to which specific regulatory proteins bind and these interactions subsequently influence the level of transcription for associated genes, by inhibiting or enhancing the transcription process. It is known that much of the genetic change underlying morphological evolution takes place in these regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental (wet-lab) methods for finding binding sites exist, but all have some limitations regarding their applicability, accuracy, availability or cost. On the other hand computational methods for predicting the position of binding sites are less expensive and faster. Unfortunately, however, these algorithms perform rather poorly, some missing most binding sites and others over-predicting their presence. The aim of this thesis is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence. Previous related work involved the use of machine learning algorithms for integrating predictions of TFBSs, with particular emphasis on the use of the Support Vector Machine (SVM). This thesis has built upon, extended and considerably improved this earlier work. Data from two organisms was used here. Firstly the relatively simple genome of yeast was used. In yeast, the binding sites are fairly well characterised and they are normally located near the genes that they regulate. The techniques used on the yeast genome were also tested on the more complex genome of the mouse. It is known that the regulatory mechanisms of the eukaryotic species, mouse, is considerably more complex and it was therefore interesting to investigate the techniques described here on such an organism. The initial results were however not particularly encouraging: although a small improvement on the base algorithms could be obtained, the predictions were still of low quality. This was the case for both the yeast and mouse genomes. However, when the negatively labeled vectors in the training set were changed, a substantial improvement in performance was observed. The first change was to choose regions in the mouse genome a long way (distal) from a gene over 4000 base pairs away - as regions not containing binding sites. This produced a major improvement in performance. The second change was simply to use randomised training vectors, which contained no meaningful biological information, as the negative class. This gave some improvement over the yeast genome, but had a very substantial benefit for the mouse data, considerably improving on the aforementioned distal negative training data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct. The final experiment used an updated version of the yeast dataset, using more state of the art algorithms and more recent TFBSs annotation data. Here it was found that using randomised or distal negative examples once again gave very good results, comparable to the results obtained on the mouse genome. Another source of negative data was tried for this yeast data, namely using vectors taken from intronic regions. Interestingly this gave the best results.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Alkhnbashi, Omer S. [Verfasser], et Rolf [Akademischer Betreuer] Backofen. « Computational characterisation of genomic CRISPR-Cas systems in archaea and bacteria ». Freiburg : Universität, 2017. http://d-nb.info/1139210904/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

11

CHELONI, STEFANO. « COMPUTATIONAL ASSESSMENT OF GENOMIC AND FUNCTIONAL HETEROGENEITY IN ACUTE MYELOID LEUKAEMIA ». Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/790331.

Texte intégral

Résumé :

Acute myeloid leukaemia (AML) is the most diffused leukaemia in adults and represents a disease with an urgent medical need as 50-60% of patients relapse within 3 years after diagnosis. Intra-tumour heterogeneity (ITH), both at biological and genetic level, is a crucial feature of AMLs which results necessary for tumour maintenance and drug resistance. From the biological side, AML is hierarchically organised, with, at the top level, leukaemia stem cells (LSCs), a rare cell population having self-renewal, differentiation and quiescence properties. Quiescent LSCs can be less sensitive to radiation and chemotherapy acting as a source for leukaemia relapse. Genetically, AMLs harbour patient-specific combinations of different driver mutations, which are organised within individual cases in sub-clones with distinct fitness. In our experimental plan, we hypothesised that clonal evolution dynamics of relapsing AMLs are characterised by the selective expansion of quiescent low-frequency sub-clones present within the primary LSC population, which serve as the genomic and functional reservoir of the tumour. We performed whole-exome sequencing (WES) and longitudinal clonal evolution analyses of i) a cohort of 30 AML patients, ii) xenotransplanted human leukaemias and iii) functionally isolated leukaemic subpopulations with diverse proliferation histories. We identified 3 clonal evolution patterns in our cohort: stable, “gain of clones” and “loss of clones”, at relapse. We dissected the evolutionary dynamics of the “gain of clones” group performing high sensitivity sequencing (HSS) and found low frequency sub-clonal mutations in primary AML, some of which were selected and expanded after chemotherapy. We lead back the origin of these sub-clones within the quiescent leukaemic subpopulation. Furthermore, we assessed transcriptional ITH of leukaemic subpopulations with diverse proliferation histories of two patients’ xenografts by single cell (sc) RNA sequencing and identified a set of markers genes potentially associated with leukaemogenesis and quiescence. All together our findings point out that the clonal architecture of primary AML is more complex than what previously thought and are the first direct proof showing that relapsing AMLs are shaped by expansion of low-frequency preexisting clones. These clones appear to be sustained by the quiescent LSC pool. We expect that the outcome of our studies will provide new insights into the mechanisms of disease progression and treatment response in AML, and potentially points toward novel therapeutic approaches.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Sandberg, Rickard. « Analyses of genomic and gene expression signatures / ». Stockholm, 2004. http://diss.kib.ki.se/2004/91-7140-015-X/.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

13

Shapiro, B. Jesse (Benjamin Jesse). « Genomic signatures of sex, selection and speciation in the microbial world ». Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61788.

Texte intégral

Résumé :

Thesis (Ph. D.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2010.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (p. 218-228).
Understanding the microbial world is key to understanding global biogeochemistry, human health and disease, yet this world is largely inaccessible. Microbial genomes, an increasingly accessible data source, provide an ideal entry point. The genome sequences of different microbes may be compared using the tools of population genetics to infer important genetic changes allowing them to diversify ecologically and adapt to distinct ecological niches. Yet the toolkit of population genetics was developed largely with sexual eukaryotes in mind. In this work, I assess and develop tools for inferring natural selection in microbial genomes. Many tools rely on population genetics theory, and thus require defining distinct populations, or species, of bacteria. Because sex (recombination) is not required for reproduction, some bacteria recombine only rarely, while others are extremely promiscuous, exchanging genes across great genetic distances. This behavior poses a challenge for defining microbial population boundaries. This thesis begins with a discussion of how recombination and positive selection interact to promote ecological adaptation. I then describe a general pipeline for quantifying the impacts of mutation, recombination and selection on microbial genomes, and apply it to two closely related, yet ecologically distinct populations of Vibrio splendidus, each with its own microhabitat preference. I introduce a new tool, STARRInIGHTS, for inferring homologous recombination events. By assessing rates of recombination within and between ecological populations, I conclude that ecological differentiation is driven by small number of habitat-specific alleles, while most loci are shared freely across habitats. The remainder of the thesis focuses on lineage-specific changes in natural selection among anciently diverged species of gamma proteobacteria. I develop two new metrics, selective signatures and slow:fast, for detecting deviations from the expected rate of evolution in 'core' proteins (present in single copy in most species). Because they rely on empirical distributions of evolutionary rates across species, these methods should become increasingly powerful as more and more microbial genomes are sampled. Overall, the methods described here significantly expand the repertoire of tools available for microbial population genomics, both for investigating the process of ecological differentiation at the finest of time scales, and over billions of years of microbial evolution.
by B. Jesse Shapiro.
Ph.D.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Walther, Jürgen. « Revealing DNA dynamics from atomistic to genomic level by multiscale computational approaches ». Doctoral thesis, Universitat de Barcelona, 2019. http://hdl.handle.net/10803/667845.

Texte intégral

Résumé :

The study of DNA from atomistic to mesoscopic level and connecting different resolution levels constitutes a major challenge since the new millennium. In the early 2000s, experiments could resolve for the first time the structure of the nucleosome in high detail or capture physical contacts in the genome of segments far apart in sequence. At around the same time, the force field development for atomistic nucleic acid simulations reached a peak with parmbsc0 in 2007 and coarse grain nucleosome fiber models emerged. The first decade ended with a remarkable experimental advance in visualizing the whole genome, Hi-C. In the current decade, almost ten years after Hi-C was invented, the structure of the cell nucleus is still a very hot topic. We can now harvest the fruits of the pioneers in the first decade of multi-scale investigation of DNA and connect the different resolution levels to obtain a complete picture of DNA from electron orbitals to genome folding. In this work, we use computational approaches to dissect the different resolution levels, from atomistic MD simulations to mesoscopic secondary chromatin structure modeling. We developed a force-field (parmbsc1) for the accurate description of atomistic DNA dynamics based on quantum mechanical simulations. With the accuracy of parmbsc1, sequence-dependent effects of B-DNA flexibility beyond the base pair level were described and used as a starting point to parametrize a novel helical coarse grain model which shows similar accuracy to the DNA dynamics obtained by atomistic MD, but at much lower computational cost. In a newly developed nucleosome fiber model the coarse grain DNA algorithm is used for the linker DNA description and alongside with a simple mesoscopic characterization of the nucleosome chromatin dynamics can be probed at kilobase scale with a DNA model whose roots lie in the quantum mechanical regime. On top of that, to meet current standards of accessibility and usability of tools, the developed coarse grain DNA and nucleosome fiber model are freely available as stand-alone versions or integrated in a single webserver or large-scale online research environment platform.
El estudio del ADN desde la escala atómica a la mesoscópica y la conexión entre dichos niveles de resolución constituye uno de los desafíos mayores del nuevo milenio. Desde el inicio del siglo XX, diversos experimentos han permitido elucidar la estructura del nucleosoma a escala atómica, y por otro lado capturar los contactos entre segmentos del genoma cuyas secuencias se encuentran muy alejadas. En paralelo, el desarrollo teórico de campos de fuerza para la simulación de sistemas atomísticos de ácidos nucleicos logró su primera madurez con la publicación de parmbsc0 en 2007, al tiempo que empezaron a salir publicados los primeros modelos de grano grueso para representar fibras de nucleosomas. La primera década del presente milenio termina con uno de los experimentos más destactados a la hora de visualizar el genoma completo: Hi-C. Actualmente, a casi 10 años del advenimiento del Hi-C, la estructura del núcleo celular sigue siendo un campo muy activo. Es ahora el momento justo para cosechar de los frutos plantados por los pioneros una década atrás y trabajar en la conexión entre los diferentes niveles de resolución logrando una imagen completa y global del ADN en el núcleo celular desde los electrones hasta los cromosomas. En este trabajo, usamos una aproximación computacional para integrar los diferentes niveles de resolución, desde simulaciones atomísticas de Dinámica Molecular hasta el modelado de fibras de cromatina. Desarrollamos un campo de fuerza atomístico (parmbsc1) que reproduce de forma exacta la dinámica del ADN, basado en cálculos de mecánica cuántica. Gracias a la exactitud de parmbsc1, los efectos estructurales secuencia-dependientes a nivel atómico fueron capturados y usados como parámetros para desarrollar un nuevo modelo helicoidal de grano grueso que ha mostrado una exactitud similar con un coste computacional mucho menor. En el modelo de fibra de cromatina, el modelo de grano grueso mencionado anteriormente es usado para simular el comportamiento del ADN “linker” (libre) entre los nucleosomas que son representados de forma simple pero que permiten estudiar fibras a la escala de kilobases con un modelo basado en la mecánica cuántica. Sumado a lo anterior, y para hacer nuestros modelos y herramientas disponibles y accesibles de acuerdo a los estándares actuales, los modelos y métodos desarrollados en esta tesis se distribuyen de forma libre como una versión “stand-alone” o integrado en una plataforma de investigación online.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Cavalli, Florence Marie Géraldine. « A computational study of transcriptional regulation in eukaryotes on a genomic scale ». Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609725.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

16

Ge, Jianye. « Computational Algorithms and Evidence Interpretation in DNA Forensics based on Genomic Data ». University of Cincinnati / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1234916402.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

17

Bashir, Ali. « Computational methods for analyzing and detecting genomic structural variation applications to cancer / ». Diss., [La Jolla, Calif.] : University of California, San Diego, 2009. http://wwwlib.umi.com/cr/ucsd/fullcit?p3344883.

Texte intégral

Résumé :

Thesis (Ph. D.)--University of California, San Diego, 2009.
Title from first page of PDF file (viewed April 7, 2009). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 194-211).

Styles APA, Harvard, Vancouver, ISO, etc.

18

Tran, Thao Thanh Thi. « Genomic data mining for the computational prediction of small non-coding RNA genes ». Diss., Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/33966.

Texte intégral

Résumé :

The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Fimereli, Danai. « Computational analyses of gene fusions, viruses and parasitic genomic elements in breast cancer ». Doctoral thesis, Universite Libre de Bruxelles, 2018. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/263609.

Texte intégral

Résumé :

Breast cancer is the most common cancer in women and research efforts to unravel the underlying mechanisms that drive carcinogenesis are continuous. The emergence of high-throughput sequencing techniques and their constant advancement, in combination with large scale studies of genomic and transcriptomic data, allowed the identification of important genetic changes that take place in the breast cancer genome, including somatic mutations, copy number aberrations and genomic rearrangements.The overall aim of this thesis is to explore the presence of genetic changes that take place in the breast cancer transcriptome and their possible contribution to carcinogenesis. The aim of the first research study was the identification of expressed gene fusions in breast cancer and the study of their association with other genomic events. For achieving this, transcriptome sequencing and Single Nucleotide Polymorphism arrays data for a cohort of 55 tumors and 10 normal breast tissues were combined. Gene fusions were detected in the majority of the samples, with evident differences between breast cancer subtypes, where HER2+ samples had significantly more fusions than the other subtypes. The genome-wide analysis uncovered localization of fusion genes in specific chromosomes like 17, 8 or 20. Additionally, a positive correlation between the number of gene fusions and the number of amplifications was observed, including the association between fusions on chromosome 17 and the amplifications in HER2+ samples, which can be attributed to the highly rearranged genomes of these subtypes. Finally, the absence of highly recurrent fusions across this cohort adds to the notion that gene fusions in breast cancer are most likely private events, with the majority being “passenger” events. In the second research study, the aim was to identify a connection between viral infections and breast cancer by devising five different computational methods for the analysis of both transcriptome and exome data in a cohort of 58 breast tumors. Despite being able to detect viral sequences in our testing dataset, no significantly high numbers of viral sequences were detected in our samples. Specifically, viral sequences (~2-30 reads) were extracted belonging to viruses EBV, HHV6 and Merkel cell polyomavirus. Such low levels of viral expression direct against a viral etiology for breast cancer but one should not exclude possible cases of integrated but silent viruses.In the third research project, we analyzed in silico the transcriptional profiles of human endogenous retroviruses in breast cancer. Despite being scattered across the genome in large numbers, a number of ERVs are actively transcribed, consisting of a small percentage of the total mapped reads. Alongside protein coding genes and lncRNAs, they show distinct expression profiles across the different breast cancer subtypes with luminal and basal-like samples clear separating from each other. Additionally, distinct profiles between ER+ and ER- samples were observed. Tumor specific ERV loci show an association with the immune status of the tumors, indicating that ERVs are reactivated in tumors and could play a role in the activation of the immune response cascade.The results presented in this thesis exhibit only in a small fragment the diversity and heterogeneity of the breast cancer transcriptome. The strength of the sequencing techniques allows the in depth detection of different genomic events. Gene fusions should be considered as part of the breast cancer transcriptome but their low recurrence across samples indicates for a role as passenger events. Under the light of existing results, viral infections do not play a significant role in breast cancer. On the other hand, human endogenous retroviruses, despite originating from exogenous viruses, seems to exhibit transcriptional profiles similar to those of normal genes, indicating that they are part of the genome’s transcriptional machinery.
Doctorat en Sciences biomédicales et pharmaceutiques (Médecine)
info:eu-repo/semantics/nonPublished

Styles APA, Harvard, Vancouver, ISO, etc.

20

Seshasayee, Aswin Sai Narain. « A computational study of bacterial gene regulation and adaptation on a genomic scale ». Thesis, University of Cambridge, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611810.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

21

Alatabbi, Ali. « Advances in stringology and applications : from combinatorics via genomic analysis to computational linguistics ». Thesis, King's College London (University of London), 2015. http://kclpure.kcl.ac.uk/portal/en/theses/advances-in-stringology-and-applications(b0d93606-09a0-4dce-b7bb-6372d4479369).html.

Texte intégral

Résumé :

Written text is considered as one of the oldest methods to represent knowledge. A text can be defined as a logical and consistent sequence of symbols which encodes information in a certain language. A straightforward example are natural languages, which are typically used by humans to communicate in spoken or written form. Other underlying examples are DNA, RNA and proteins sequences; DNA and RNA are nucleic acids that carry the genetic instructions, specifies the sequence of the amino acids within proteins, regulate the development and functionality of living organisms specifies the sequence of the amino acids within proteins. Proteins are molecules consisting of one or more chains of amino acids participate in virtually every process within cells. DNA and RNA can be represented as sequences of the nucleo-bases of their nucleotides and proteins and can be represented by the sequence of amino acids encoded in the corresponding gene. A natural problem which emerges when processing such sequences is determine weather a specific patterns occur within another string (known as exact string matching problem); as far as natural language texts are concerned, an important problem in computational linguistics is finding the occurrences of a given word or sentence in a volume of text; Similarly, in computational biology identifying given features in DNA sequences is a important of great significance, on the other side, one is often interested in quantifying the likelihood that two pairs of strings have the same underlying features based on explicit similarity/dissimilarity measurement (known as approximate string matching). Both instance of the string matching problem have been studied thoroughly since early 1960s. This thesis contributes several efficient novel and derived solutions (algorithms and/or data structures), for complex problems which have been originated either out of theoretical considerations or practical problems, and study their experimental performance and compare the proposed solutions with some existing solutions. Among the latter originated introduced solution several ones motivated by realworld problems in the fields of molecular biology and computational linguistics. Despite the fact that studied problems and their proposed solutions differs in research motivation paradigm, yet still utilise similar tools and methodologies for solving the corresponding problems. For example the seminal “Aho-Corasick” Automaton is employed for finding a set of motifs in a biological sequence and detecting spelling mistakes in Arabic text. Similarly, employing the bit-masking trick to extend the DNA symbols to accelerate equivalency testing of degenerate characters in the same way to extend the Arabic alphabet to measure similarity between a stem and derived/inflected forms a given word.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Bezuidt, K. I. O. (Keoagile Ignatius Oliver). « Development of novel computational tools based on analysis of DNA compositional biases to identify and study the distribution of mobile genomic elements among bacteria ». Diss., University of Pretoria, 2009. http://hdl.handle.net/2263/27297.

Texte intégral

Résumé :

Horizontal gene transfer, well characterized as the transfer of genomic material between organisms contributes hugely in the evolution and speciation of bacteria. The transfer of such material brings about bacteria that are virulent and also in possession of genes that render them resistant to antibiotics. This helps to spread about and recombine genes of their kind to other bacteria. Horizontally acquired genomic elements exhibit compositional features that are deviant from the rest of the other genes in a recipient genome. They possess features such as unusual GC%, atypical codon usage, oligonucleotide usage bias and direct repeats at their flanks that can be used to distinguish them from native genes in a genome. This work focused on the developments of statistical and computational methods to aid with the detection of genes that have undergone horizontal transfer, to help track down genes that could be of medical and environmental importance. Therefore, SeqWord Gene Island Sniffer (SWGIS), a statistically driven computational tool for the prediction of genomic islands, and GEI-DB, a comprehensive database of horizontally transferred genomic elements were established. The SWGIS tool allows the precise predictions of precise inserts of horizontally acquired gene clusters in prokaryotic genomic sequences. Thus, the GEI-DB stores all the foreign genomic inserts that have been detected in the study, together with their annotations and evolutionary measures, such as groups of genomic islands that share similarities in DNA and amino acids features. Copyright
Dissertation (MSc)--University of Pretoria, 2009.
Biochemistry
unrestricted

Styles APA, Harvard, Vancouver, ISO, etc.

23

Palaniappan, Krishnaveni. « Predicting "Essential" Genes in Microbial Genomes : A Machine Learning Approach to Knowledge Discovery in Microbial Genomic Data ». NSUWorks, 2010. http://nsuworks.nova.edu/gscis_etd/268.

Texte intégral

Résumé :

Essential genes constitute the minimal gene set of an organism that is indispensable for its survival under most favorable conditions. The problem of accurately identifying and predicting genes essential for survival of an organism has both theoretical and practical relevance in genome biology and medicine. From a theoretical perspective it provides insights in the understanding of the minimal requirements for cellular life and plays a key role in the emerging field of synthetic biology; from a practical perspective, it facilitates efficient identification of potential drug targets (e.g., antibiotics) in novel pathogens. However, characterizing essential genes of an organism requires sophisticated experimental studies that are expensive and time consuming. The goal of this research study was to investigate machine learning methods to accurately classify/predict "essential genes" in newly sequenced microbial genomes based solely on their genomic sequence data. This study formulates the predication of essential genes problem as a binary classification problem and systematically investigates applicability of three different supervised classification methods for this task. In particular, Decision Tree (DT), Support Vector Machine (SVM), and Artificial Neural Network (ANN) based classifier models were constructed and trained on genomic features derived solely from gene sequence data of 14 experimentally validated microbial genomes whose essential genes are known. A set of 52 relevant genomic sequence derived features (including gene and protein sequence features, protein physio-chemical features and protein sub-cellular features) was used as input for the learners to learn the classifier models. The training and test datasets used in this study reflected between-class imbalance (i.e. skewed majority class vs. minority class) that is intrinsic to this data domain and essential genes prediction problem. Two imbalance reduction techniques (homology reduction and random under sampling of 50% of the majority class) were devised without artificially balancing the datasets and compromising classifier generalizability. The classifier models were trained and evaluated using 10-fold stratified cross validation strategy on both the full multi-genome datasets and its class imbalance reduced variants to assess their predictive ability of discriminating essential genes from non-essential genes. In addition, the classifiers were also evaluated using a novel blind testing strategy, called LOGO (Leave-One-Genome-Out) and LOTO (Leave-One-Taxon group-Out) tests on carefully constructed held-out datasets (both genome-wise (LOGO) and taxonomic group-wise (LOTO)) that were not used in training of the classifier models. Prediction performance metrics, accuracy, sensitivity, specificity, precision and area under the Receiver Operating Characteristics (AU-ROC) were assessed for DT, SVM and ANN derived models. Empirical results from 10 X 10-fold stratified cross validation, Leave-One-Genome-Out (LOGO) and Leave-One-Taxon group-Out (LOTO) blind testing experiments indicate SVM and ANN based models perform better than Decision Tree based models. On 10 X 10-fold cross validations, the SVM based models achieved an AU-ROC score of 0.80, while ANN and DT achieved 0.79 and 0.68 respectively. Both LOGO (genome-wise) and LOTO (taxonwise) blind tests revealed the generalization extent of these classifiers across different genomes and taxonomic orders. This study empirically demonstrated the merits of applying machine learning methods to predict essential genes in microbial genomes by using only gene sequence and features derived from it. It also demonstrated that it is possible to predict essential genes based on features derived from gene sequence without using homology information. LOGO and LOTO Blind test results reveal that the trained classifiers do generalize across genomes and taxonomic boundaries and provide first critical estimate of predictive performance on microbial genomes. Overall, this study provides a systematic assessment of applying DT, ANN and SVM to this prediction problem. An important potential application of this study will be to apply the resultant predictive model/approach and integrate it as a genome annotation pipeline method for comparative microbial genome and metagenome analysis resources such as the Integrated Microbial Genome Systems (IMG and IMG/M).

Styles APA, Harvard, Vancouver, ISO, etc.

24

Wei, Yulong. « Microbes Carry Distinct Genomic Signatures in Adaptation to Their Translation Machinery and Host Environments ». Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42422.

Texte intégral

Résumé :

How do bacteria grow and replicate rapidly? How do viruses and phages adapt to their host environments? Bacteria require efficient translation to grow and replicate rapidly, and translation is often rate-limited by initiation. A feature that is conserved across bacterial lineages is the Shine-Dalgarno (SD) sequence at the mRNA 5’ UTR, which pairs with the anti-SD sequence located at the 3’ end of mature 16S rRNA. Nonetheless, much about this interaction remains unclear. Chapter 2 reveals evolutionary differences between Cyanobacteria and chloroplast translation initiation using a new model (DtoStart) that better define optimal SD sequence and an RNA-Seq-based approach that reliably characterize the 3’ end of mature 16S rRNAs. Efficacy of translation elongation depends much on tRNA-mediated codon adaptation. In Escherichia coli, selection favours major codons because they are rapidly decoded by abundantly available cognate tRNAs. Nonetheless, the degree codon bias correlates with tRNA availability is unclear in many bacterial species because tRNA abundance is often inadequately approximated by gene copy numbers. To better understand tRNA-mediated codon bias, Chapter 3 describes an RNA-Seq-based approach to robustly quantify tRNA abundance. Finally, Chapter 4 evaluates the degree optimal translation initiation and elongation signals affect ribosome dynamics. The emergence of COVID-19 pandemic poses a serious global health emergency. To establish infection during cell entry, the coronavirus Spike protein binds to the host ACE2 receptor, and a high binding potential between these two players is key to infectivity. While SARS-CoV-2 transmits efficiently in humans, it is less clear which other mammals are at risk of being infected. Chapter 5 investigates the host range of SARS-CoV-2 through comparative sequence analyses at the ACE2 receptors and the Spike proteins. As obligate parasites, coronaviruses regularly infect host tissues that express antiviral proteins (AVPs) in abundance and must evade or adapt to the host cellular environments post-entry. Two AVPs that shape viral genomes are ZAP that binds to CpG dinucleotides to facilitate viral transcript degradation, and APOBEC3 which deaminates C into U leading to dysfunctional transcripts. Chapter 6 shows that coronavirus genomes are CpG deficient to evade ZAP and are subjected to constant C to U deamination by APOBEC3. This thesis examines two key concepts of microbial genome evolution: 1) coevolution between gene features and the translation machinery in bacteria, and 2) adaptation of viruses to the hosts they infect. Chapters 2, 3, and 4 are aimed at improving our understanding in bacterial gene expression in the applications of transgenic biosynthesis and phage therapy. Chapters 5 and 6 are aimed at improving our understanding in the origin and evolution of SARS-CoV-2 and our ability to control the spread of infection.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Zhuang, Jiali. « Structural Variation Discovery and Genotyping from Whole Genome Sequencing : Methodology and Applications : A Dissertation ». eScholarship@UMMS, 2015. https://escholarship.umassmed.edu/gsbs_diss/875.

Texte intégral

Résumé :

A comprehensive understanding about how genetic variants and mutations contribute to phenotypic variations and alterations entails experimental technologies and analytical methodologies that are able to detect genetic variants/mutations from various biological samples in a timely and accurate manner. High-throughput sequencing technology represents the latest achievement in a series of efforts to facilitate genetic variants discovery and genotyping and promises to transform the way we tackle healthcare and biomedical problems. The tremendous amount of data generated by this new technology, however, needs to be processed and analyzed in an accurate and efficient way in order to fully harness its potential. Structural variation (SV) encompasses a wide range of genetic variations with different sizes and generated by diverse mechanisms. Due to the technical difficulties of reliably detecting SVs, their characterization lags behind that of SNPs and indels. In this dissertation I presented two novel computational methods: one for detecting transposable element (TE) transpositions and the other for detecting SVs in general using a local assembly approach. Both methods are able to pinpoint breakpoint junctions at single-nucleotide resolution and estimate variant allele frequencies in the sample. I also applied those methods to study the impact of TE transpositions on the genomic stability, the inheritance patterns of TE insertions in the population and the molecular mechanisms and potential functional consequences of somatic SVs in cancer genomes.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Zhuang, Jiali. « Structural Variation Discovery and Genotyping from Whole Genome Sequencing : Methodology and Applications : A Dissertation ». eScholarship@UMMS, 2009. http://escholarship.umassmed.edu/gsbs_diss/875.

Texte intégral

Résumé :

A comprehensive understanding about how genetic variants and mutations contribute to phenotypic variations and alterations entails experimental technologies and analytical methodologies that are able to detect genetic variants/mutations from various biological samples in a timely and accurate manner. High-throughput sequencing technology represents the latest achievement in a series of efforts to facilitate genetic variants discovery and genotyping and promises to transform the way we tackle healthcare and biomedical problems. The tremendous amount of data generated by this new technology, however, needs to be processed and analyzed in an accurate and efficient way in order to fully harness its potential. Structural variation (SV) encompasses a wide range of genetic variations with different sizes and generated by diverse mechanisms. Due to the technical difficulties of reliably detecting SVs, their characterization lags behind that of SNPs and indels. In this dissertation I presented two novel computational methods: one for detecting transposable element (TE) transpositions and the other for detecting SVs in general using a local assembly approach. Both methods are able to pinpoint breakpoint junctions at single-nucleotide resolution and estimate variant allele frequencies in the sample. I also applied those methods to study the impact of TE transpositions on the genomic stability, the inheritance patterns of TE insertions in the population and the molecular mechanisms and potential functional consequences of somatic SVs in cancer genomes.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Tsankov, Alex. « Evolution of nucleosome positioning and gene regulation in yeasts : a genomic and computational approach ». Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62464.

Texte intégral

Résumé :

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 107-111).
Chromatin organization plays a major role in gene regulation and can affect the function and evolution of new transcriptional programs. Here, we present the first multi-species comparative genomic analysis of the relationship between chromatin organization and gene expression by measuring mRNA abundance and nucleosome positions genome-wide in 13 Ascomycota yeast species. Our work introduces a host of new computational tools for studying chromatin structure, function, and evolution. We improved on existing methods for detecting nucleosome positions and developed a new approach for identifying nucleosome-free regions (NFRs) and characterizing chromatin organization at gene promoters. We used a general statistical approach for studying the evolution of chromatin and gene regulation at a functional level. We also introduced a new technique for discovering the DNA binding motifs of transacting General Regulatory Factors (GRFs) and developed a new technique for quantifying the relative contribution of intrinsic sequence, GRFs, and transcription to establishing NFRs. And finally, we built a computational framework to quantify the evolutionary interplay between nucleosome positions, transcription factor binding sites, and gene expression. Through our analysis, we found large conservation of global and functional chromatin organization. Chromatin organization has also substantially diverged in both global quantitative features and in functional groups of genes. We find that global usage of intrinsic anti-nucleosomal sequences such as PolyA varies over this phylogeny, and uncover that PolyG tracts also intrinsically repel nucleosomes. The specific sequences bound by GRFs are also highly plastic; we experimentally validate an evolutionary handover from Cbfl in pre-WGD yeasts to Rebi in post-WGD yeast. We also identify five mechanisms that couple chromatin organization to evolution of gene regulation, including (i) compensatory evolution of alternative modifiers associated with conserved chromatin organization; (ii) a gradual transition from constitutive to transregulated NFRs; (iii) a loss of intrinsic anti-nucleosomal sequences accompanying changes in chromatin organization and gene expression, (iv) repositioning of motifs from NFRs to nucleosome-occluded regions; and (v) the expanded use of NFRs by paralogous activator-repressor pairs. Our multi-species dataset and general computational framework provide a foundation for future studies on how chromatin structure changes over time and in evolution.
by Alexander Minchev Tsankov.
Ph.D.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Saluja, Sunil K. (Sunil Kumar) 1968. « A computational framework for the identification, cataloging, and classification of evolutionary conserved genomic DNA ». Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28590.

Texte intégral

Résumé :

Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2004.
Includes bibliographical references (leaves 27-29).
Evolutionarily conserved genomic regions (ecores) are understudied, and yet comprise a very large percentage of the Human Genome. Highly conserved human-mouse non-coding ecores, for example, are more abundant within the Human Genome than those regions, which are currently estimated to encode for proteins. Subsets of these ecores also exhibit conservation that extends across several species. These genomic regions have managed to survive millions of years of evolution despite the fact that they do not appear to directly encode for proteins. The survival of these regions compels us to investigate their potential function. Development of a computational framework for the classification and clustering of these regions may be the first step in understanding their function. The need for a standardized framework is underscored by the explosive growth in the number of publicly available, fully sequenced genomes, and the diverse set of methodologies used to generate cross-species alignments. This project describes the design and implementation of a system for the identification, classification and cataloguing of ecores across multiple species. A key feature of this system is its ability to quickly incorporate new genomes and assemblies as they become available. Additionally, this system provides investigators with a feature rich user interface, which facilitates the retrieval of ecores based on a wide range of parameters. The system returns a dynamically annotated list of evolutionarily conserved regions, which is used as input to several classification schemes, aimed at identifying families of ecores that share similar features, including depth of evolutionary conservation, position relative to known genes, sequence similarity,
(cont.) and content of transcription factor binding sites. Families of ecores have already been retrieved by the system and clustered using this feature space, and are currently awaiting biological validation.
by Sunil K. Saluja.
S.M.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Cameron, Michael, et mcam@mc-mc net. « Efficient Homology Search for Genomic Sequence Databases ». RMIT University. Computer Science and Information Technology, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20070509.162443.

Texte intégral

Résumé :

Genomic search tools can provide valuable insights into the chemical structure, evolutionary origin and biochemical function of genetic material. A homology search algorithm compares a protein or nucleotide query sequence to each entry in a large sequence database and reports alignments with highly similar sequences. The exponential growth of public data banks such as GenBank has necessitated the development of fast, heuristic approaches to homology search. The versatile and popular blast algorithm, developed by researchers at the US National Center for Biotechnology Information (NCBI), uses a four-stage heuristic approach to efficiently search large collections for analogous sequences while retaining a high degree of accuracy. Despite an abundance of alternative approaches to homology search, blast remains the only method to offer fast, sensitive search of large genomic collections on modern desktop hardware. As a result, the tool has found widespread use with millions of queries posed each day. A significant investment of computing resources is required to process this large volume of genomic searches and a cluster of over 200 workstations is employed by the NCBI to handle queries posed through the organisation's website. As the growth of sequence databases continues to outpace improvements in modern hardware, blast searches are becoming slower each year and novel, faster methods for sequence comparison are required. In this thesis we propose new techniques for fast yet accurate homology search that result in significantly faster blast searches. First, we describe improvements to the final, gapped alignment stages where the query and sequences from the collection are aligned to provide a fine-grain measure of similarity. We describe three new methods for aligning sequences that roughly halve the time required to perform this computationally expensive stage. Next, we investigate improvements to the first stage of search, where short regions of similarity between a pair of sequences are identified. We propose a novel deterministic finite automaton data structure that is significantly smaller than the codeword lookup table employed by ncbi-blast, resulting in improved cache performance and faster search times. We also discuss fast methods for nucleotide sequence comparison. We describe novel approaches for processing sequences that are compressed using the byte packed format already utilised by blast, where four nucleotide bases from a strand of DNA are stored in a single byte. Rather than decompress sequences to perform pairwise comparisons, our innovations permit sequences to be processed in their compressed form, four bases at a time. Our techniques roughly halve average query evaluation times for nucleotide searches with no effect on the sensitivity of blast. Finally, we present a new scheme for managing the high degree of redundancy that is prevalent in genomic collections. Near-duplicate entries in sequence data banks are highly detrimental to retrieval performance, however existing methods for managing redundancy are both slow, requiring almost ten hours to process the GenBank database, and crude, because they simply purge highly-similar sequences to reduce the level of internal redundancy. We describe a new approach for identifying near-duplicate entries that is roughly six times faster than the most successful existing approaches, and a novel approach to managing redundancy that reduces collection size and search times but still provides accurate and comprehensive search results. Our improvements to blast have been integrated into our own version of the tool. We find that our innovations more than halve average search times for nucleotide and protein searches, and have no signifcant effect on search accuracy. Given the enormous popularity of blast, this represents a very significant advance in computational methods to aid life science research.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Martinez, Juan Carlos. « Towards the Prediction of Mutations in Genomic Sequences ». FIU Digital Commons, 2013. http://digitalcommons.fiu.edu/etd/987.

Texte intégral

Résumé :

Bio-systems are inherently complex information processing systems. Furthermore, physiological complexities of biological systems limit the formation of a hypothesis in terms of behavior and the ability to test hypothesis. More importantly the identification and classification of mutation in patients are centric topics in today’s cancer research. Next generation sequencing (NGS) technologies can provide genome-wide coverage at a single nucleotide resolution and at reasonable speed and cost. The unprecedented molecular characterization provided by NGS offers the potential for an individualized approach to treatment. These advances in cancer genomics have enabled scientists to interrogate cancer-specific genomic variants and compare them with the normal variants in the same patient. Analysis of this data provides a catalog of somatic variants, present in tumor genome but not in the normal tissue DNA. In this dissertation, we present a new computational framework to the problem of predicting the number of mutations on a chromosome for a certain patient, which is a fundamental problem in clinical and research fields. We begin this dissertation with the development of a framework system that is capable of utilizing published data from a longitudinal study of patients with acute myeloid leukemia (AML), who’s DNA from both normal as well as malignant tissues was subjected to NGS analysis at various points in time. By processing the sequencing data at the time of cancer diagnosis using the components of our framework, we tested it by predicting the genomic regions to be mutated at the time of relapse and, later, by comparing our results with the actual regions that showed mutations (discovered at relapse time). We demonstrate that this coupling of the algorithm pipeline can drastically improve the predictive abilities of searching a reliable molecular signature. Arguably, the most important result of our research is its superior performance to other methods like Radial Basis Function Network, Sequential Minimal Optimization, and Gaussian Process. In the final part of this dissertation, we present a detailed significance, stability and statistical analysis of our model. A performance comparison of the results are presented. This work clearly lays a good foundation for future research for other types of cancer.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Sohiya, Yotsukura. « Computational Framework for the Dissection of Cancer Genomic Architecture and its Association in Different Biomarkers ». 京都大学 (Kyoto University), 2016. http://hdl.handle.net/2433/217149.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

32

Zaugg, Judith Barbara. « A computational study of promoter structure and transcriptional regulation in yeast on a genomic scale ». Thesis, University of Cambridge, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609838.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

33

Liang, Xiaoyu. « Computational Methods for Cis-Regulatory Module Discovery ». Ohio University / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1288578177.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

34

Copeland, Nancy Giang. « Computational analysis of high-replicate RNA-seq data in Saccharomyces cerevisiae : searching for new genomic features ». Thesis, University of Dundee, 2018. https://discovery.dundee.ac.uk/en/studentTheses/af2f83a4-3028-4925-9c99-81bd683067b0.

Texte intégral

Résumé :

In this study, RNA-seq and proteomics, two orthogonal high-throughput technologies, were used to search the Saccharomyces cerevisiae genome for new genomic features. RNA-seq data were aligned to the genome with three successively stringent set of parameters for the STAR aligner (Dobin et al., 2013). The varying levels of stringency elucidated some complexities in the RNA-seq data, such as the presence of read alignments that mapped to multiple genomic locations. The RNA-seq alignments indicated the presence of RNA transcripts derived from regions of the genome without annotations (un-annotated regions) in the Saccharomyces Genome Database (SGD). To ensure that all of the high-quality curated annotations within SGD were accounted for appropriately, these datasets were categorised as either Primary or Secondary Annotations. Annotations of genomic regions where the primary sequence produced a molecule (e.g. snoRNA or peptide) were designated as Primary. Annotations of regions where other types of activity were present (e.g. histone binding sites, double-strand break hotspots) were classified as Secondary. Only the Primary Annotations were used as boundaries for determining locations of un-annotated regions. Open reading frames (ORFs) were present in these un-annotated regions. Therefore, the regions were translated in six frames to build a database of all theoretical peptides. Proteomics tandem mass spectra were then searched against this peptide database to find the presence of any expressed ORFs within the un-annotated regions. Two preliminary target ORFs have been found to contain RNA-seq alignments and were detected by the proteomics analysis, evidence that their transcripts may have been present in the original sample. The next step would be to verify these two preliminary target regions in the experimental laboratory to determine if they are in fact expressed as peptides, and if so, what possible functions the peptides may have. Throughout this study, the Un-Annotated Region Pipeline (UAR-Pipeline) software was constructed to facilitate the analysis of un-annotated regions given a genome sequence, a set of genomic annotations, and RNA-seq data. In addition, a Quickload Site within the Integrated Genome Browser (Nicol et al., 2009) was created to store and effectively visualise un-annotated regions against RNA-seq alignments, annotations, and other tracks of information such as conservation. The vast majority of annotations contained within the Quickload Site are also hosted by SGD; therefore, the Site would serve as a new resource for the research community through anticipated public access.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Isa, Mohammad Nazrin. « High performance reconfigurable architectures for biological sequence alignment ». Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7721.

Texte intégral

Résumé :

Bioinformatics and computational biology (BCB) is a rapidly developing multidisciplinary field which encompasses a wide range of domains, including genomic sequence alignments. It is a fundamental tool in molecular biology in searching for homology between sequences. Sequence alignments are currently gaining close attention due to their great impact on the quality aspects of life such as facilitating early disease diagnosis, identifying the characteristics of a newly discovered sequence, and drug engineering. With the vast growth of genomic data, searching for a sequence homology over huge databases (often measured in gigabytes) is unable to produce results within a realistic time, hence the need for acceleration. Since the exponential increase of biological databases as a result of the human genome project (HGP), supercomputers and other parallel architectures such as the special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs) and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms. Nevertheless, there are always trade-off between area, speed, power, cost, development time and reusability when selecting an acceleration platform. FPGAs generally offer more flexibility, higher performance and lower overheads. However, they suffer from a relatively low level programming model as compared with off-the-shelf microprocessors such as standard microprocessors and GPUs. Due to the aforementioned limitations, the need has arisen for optimized FPGA core implementations which are crucial for this technology to become viable in high performance computing (HPC). This research proposes the use of state-of-the-art reprogrammable system-on-chip technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model (HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The three novel aspects of this research are firstly that the algorithms are designed and implemented in hardware, with each core achieving the highest performance compared to the state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering technique is adopted into the hardware architectures. Here, when the alignment matrix computation task is overlapped with the PE configuration in a folded systolic array, the overall throughput of the core is significantly increased. This is due to the bound PE configuration time and the parallel PE configuration approach irrespective of the number of PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without relying on restricted onboard memory resources. Finally, a new performance metric is devised, which facilitates the effective comparison of design performance between different FPGA devices and families. The normalized performance indicator (speed-up per area per process technology) takes out advantages of the area and lithography technology of any FPGA resulting in fairer comparisons. The cores have been designed using Verilog HDL and prototyped on the Alpha Data ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation results show that the proposed architectures achieved giga cell updates per second (GCUPS) performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In terms of speed-up improvements, comparisons were made on performance of the designed cores against their corresponding software and the reported FPGA implementations. In the case of comparison with equivalent software execution, acceleration of the optimal alignment algorithm in hardware yielded an average speed-up of 269x as compared to the SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up compared to the latest NCBI BLAST software. In terms of comparison against other reported FPGA implementations, the proposed normalized performance indicator was used to evaluate the designed architectures fairly. The results showed that the first architecture achieved more than 50 percent improvement, while acceleration of the profile HMM sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive platform for high performance computation with advantages of smaller area footprint as well as represent economic ‘green’ solution compared to the other acceleration platforms. Higher throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs with minimal design effort.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Hime, Paul Michael. « GENOMIC PERSPECTIVES ON AMPHIBIAN EVOLUTION ACROSS MULTIPLE PHYLOGENETIC SCALES ». UKnowledge, 2017. http://uknowledge.uky.edu/biology_etds/45.

Texte intégral

Résumé :

Genomes provide windows into the evolutionary histories of species. The recent accessibility of genome-scale data in non-model organisms and the proliferation of powerful statistical models are now providing unprecedented opportunities to uncover evolutionary relationships and to test hypotheses about the processes that generate and maintain biodiversity. This dissertation work reveals shallow-scale species boundaries and population genetic structure in two imperiled groups of salamanders and demonstrates that the number and information content of genomic regions used in species delimitation exert strong effects on the resulting inferences. Genome scans are employed to test hypotheses about the mechanisms of genetic sex determination in cryptobranchid salamanders, suggesting a conserved system of female heterogamety in this group. At much deeper scales, phylogenetic analyses of hundreds of protein-coding genes across all major amphibian lineages are employed to reveal the backbone topology and evolutionary timescales of the amphibian tree of life, suggesting a new set of hypotheses for relationships among extant amphibians. Yet, genomic data on their own are no panacea for the thorniest questions in evolutionary biology, and this work also demonstrates the power of a model testing framework to dissect support for different phylogenetic and population genetic hypotheses across different regions of the genome.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Keane, Michael. « Computational genomic analyses of long-lived mammals to study variation in cancer resistance, longevity and life history ». Thesis, University of Liverpool, 2018. http://livrepository.liverpool.ac.uk/3023853/.

Texte intégral

Résumé :

Little is known about the genetic and molecular mechanisms responsible for the great differences in mammalian longevity and life history. One potential source of novel insights is based on comparative analyses of the genomes of species which exhibit extreme longevity and an extended life history. As such, this work describes the results obtained from the analysis of the bowhead whale, naked mole rat (NMR) and human genomes, each of which are exceptionally long-lived compared to closelyrelated species. The bowhead whale genome was analysed with a focus on identifying genes with evidence of positive selection and proteins with unique amino acid residues when compared with other mammals. A number of genes that have previously been associated with cancer and ageing were found to exhibit evidence of positive selection on the bowhead lineage. In addition, bowheadspecific alterations in proteins linked to sensory perception of sound and size and development were also identified which are of potential relevance due to the phenotypic divergence of the bowhead whale associated with these traits. The analysis of the NMR assembly attempted to identify genes with a signal of positive selection by comparing synonymous and non-synonymous substitution rates. While positive selection on NMR genes has previously been analysed, we found additional signals of selection in several which have not previously been reported, including in regions of p53 and the hyaluronan receptors CD44 and HMMR. Finally, while the previous analyses focused on coding sequences, it is also likely that much of the genetic basis for the variation in longevity is to be found in non-coding regions of the genome. In order to assess this hypothesis, human data from both genome wide association studies (GWAS) and annotated 3'UTR sequences was analysed in order to identify genes with signals of molecular adaptation which correlate with trait divergence. The GWAS meta-analysis identified genes from a specific pathway which has previously been shown to regulate the timing of growth and development. The genes identified in the 3'UTR analysis were slightly below the level of statistical significance indicating that greater statistical power, most likely in the form of including sequences from addition species, is necessary. Overall, the results obtained offer novel insights regarding the molecular adaptations by which longevity and life history evolve and identify numerous genes which could be prioritised for future studies including potential functional characterisation. Furthermore, all the data and results generated have been made available on customised online portals in order to allow easy access to the scientific community and facilitate further research into these long-lived species.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Cronje, Louis. « Development of new computational approaches for analysis and visualization of fluxes of genomic islands through bacterial species ». Diss., University of Pretoria, 2015. http://hdl.handle.net/2263/53482.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

39

Wendler, Jason Patrick. « Accessing complex genomic variation in Plasmodium falciparum natural infections ». Thesis, University of Oxford, 2015. http://ora.ox.ac.uk/objects/uuid:c9f1ea37-7005-4757-a869-7eba82406a26.

Texte intégral

Résumé :

Genetic polymorphism in Plasmodium falciparum is a considerable obstacle to malaria intervention. Parasites have repeatedly evolved to overcome every front-line antimalarial deployed throughout history, and artemisinin resistant populations are expanding in Southeast Asia. Promising vaccine candidates routinely fail when challenged by the genetic diversity of natural parasite populations, and a recent trial using a blood-stage antigen showed immunity was allele specific. Modern sequencing technologies have revolutionized our understanding of parasite genomics and population genetics by providing access to single nucleotide variation, but characterizing more complex polymorphism remains a key challenge. Solving this problem is important because the selective pressures from drugs and host immunity often create complex polymorphism in the most clinically relevant genes that is missed using standard genotyping methods. In three sections, this thesis is a narrative about 1) encountering complex variation, 2) overcoming it with novel tools, and then 3) innovatively applying those tools to old and new questions. I first show examples of complex variation in a vaccine candidate (EBA-175) and a drug resistance gene (pfcrt) while reporting SNP based analyses of Kenyan and Tanzanian field isolates. While introducing this complex variation I also describe biological insights discovered in these populations. In Kenya I show evidence that chloroquine resistance selects for parasites that are primaquine sensitive, use a GWAS approach to discover new drug resistance loci, and catalogue variation in known resistance genes. In Tanzania I describe the population structure and allele frequencies of parasites from two geographic regions. In the second section of the thesis I develop methods for accessing complex variation and demonstrate their utility by producing de novo assemblies of eba-175, pfcrt, ama1, and msp3.4 from thousands of sequenced samples. Finally, in the third section I apply these tools in depth to eba-175. I comprehensively characterize the SNP and structural variation in eba-175 using an alignment of 1419 de novo assemblies. I use this resource to illustrate the profiles of positive selection across the gene, and corroborate these signals of balancing selection by showing the geographic distribution of the F/C indels and a lesser known 6bp indel positioned between the DBL domains. I then use the alignments to design Sequenom genotyping assays that facilitate a genome wide association study, testing for human associations with the eba-175 indels in the infecting parasite. I close by reporting a potential association on human chromosome 14 with the 6bp indel in eba-175.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Keller, Oliver [Verfasser], Stephan [Akademischer Betreuer] Waack et Burkhard [Akademischer Betreuer] Morgenstern. « Probabilistic Methods for Computational Annotation of Genomic Sequences / Oliver Keller. Gutachter : Stephan Waack ; Burkhard Morgenstern. Betreuer : Stephan Waack ». Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2011. http://d-nb.info/1043029583/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

41

PALOMBO, VALENTINO. « Genomics, Transcriptomics and Computational Biology : new insights into bovine and swine breeding and genetics ». Doctoral thesis, Università degli studi del Molise, 2019. http://hdl.handle.net/11695/91489.

Texte intégral

Résumé :

Enormi progressi sono stati fatti nella selezione degli animali per specifici caratteri di interesse zootecnico avvalendosi dei tradizionali approcci di genetica quantitativa. Tuttavia, una considerevole quantità di variabilità fenotipica resta ancora non completamente spiegata; in tal senso una migliore conoscenza delle sue basi molecolari e genetiche rappresenterebbe un ulteriore vantaggio. A tal proposito, il recente sviluppo di tecnologie high-throughput (HT), basate su metodi ad alta specificità di ibridazione e sulle ultime tecniche di sequenziamento (NGS), rappresenta una nuova opportunità per esplorare i più complessi meccanismi biologici. La rapida diffusione di queste tecnologie ha segnato l’inizio dell’era ‘omica’. Gli approcci ‘omici’ si basano sull’analisi complessiva di una specifica classe di molecole contenute in una cellula, un tessuto o un organismo; ovvero sono primariamente indirizzati all’analisi di tutti i geni (genomica), di tutti i trascritti (trascrittomica), di tutte le proteine (proteomica) o di tutti i metaboliti (metabolomica) presenti in un campione biologico. La convizione è che un sistema complesso può essere compreso più a fondo, e più fedelmente, se considerato nella sua globalità. La grandissima mole di dati generata, tuttavia, ha senso soltanto se si è equipaggiati con opportuni strumenti per esplorala. Per questo motivo, di pari passo con tali progressi tecnologici, la bioinformatica, conosciuta anche come biologia computazionale, sta acquisendo progressiva importanza. Anche la zootecnia e il miglioramento genetico si stanno avvalendo delle opportunità offerte da questo nuovo scenario. In particolare, ci si sta spostando dagli approcci tradizionali a quelli che prevedono l’uso integrato di analisi omiche. Ciò permette di meglio investigare e decifrate l’architettura genetica alla base dei caratteri di interesse zootecnico ed utilizzare questa informazione per la selezione dei candidati destinati alla riproduzione. L’obiettivo di questa tesi è stato quello di utilizzare le più innovative analisi genomiche e trascrittomiche per (1) investigare le differenze genetiche alla base del profilo acidico del latte in due razze bovine italiane; (2) individuare i geni e i fattori di trascrizione coinvolti nel controllo della colostrogenesi/lattogenesi suina. A tal fine, sono stati effettuati rispettivamente uno studio di associazione lungo tutto il genoma (GWAS) considerando gli acidi grassi del latte in Frisona e Pezzata Rossa Italiana ed è stato sequenziato il trascrittoma (RNA-Sequencing) di ghiandola mammaria suina. In aggiunta (3) è stato sviluppato un nuovo strumento bioinformatico interamente in R, chiamato PIA (Pathways Interaction Analysis), che consente un’originale analisi delle pathway metaboliche utile ad agevolare l’interpretazione dei risultati genomici e trascrittomici.
Enormous progress has been made in the selection of animals for specific traits using traditional quantitative genetic approaches. Nevertheless, a considerable amount of variation in phenotypes remains unexplained therefore a better knowledge of its genetic basis represents a potential additional gain for animal production. In this regard, the recently developed high-throughput (HT) technologies based on microarray and next-generation sequencing (NGS) methods are a powerful opportunity to prise open the ‘black box’ underlying complex biological processes. These technological advancements have marked the beginning of the ‘omic era’. Broadly, ‘omic’ approaches adopt a holistic view of the molecules that make up a cell, tissue or organism. They are aimed primarily at the universal detection of genes (genomics), RNA (transcriptomics), proteins (proteomics) and metabolites (metabolomics) in a specific biological sample. The basic aspect of these approaches is that a complex system can be understood more thoroughly if considered as a whole. At the same time, the large amount of data generated by these revolutionary approaches makes sense only if one is equipped with the necessary resources and tools to manage and explore it. For this reason, along with HT technical progresses, bioinformatics, often known as computational biology, is gaining immense importance. Animal breeding is gaining new momentum from this renewed scenario. Particularly it pushed to move away from traditional approaches toward systems approaches using integrative analysis of ‘omic’ data to better elucidate the genetic architecture controlling the traits of interest and ultimately use this knowledge for selection of candidates. The aim of this thesis is to (1) investigate the differences of genetic basis related to the milk fatty acids profiles in two Italian dairy cattle breeds and (2) delineate the genes and transcription regulators implicated in the control of the transition from colostrogenesis to lactogenesis in swine, using the state-of-art genomic and transcriptomic analyses. For these reasons, a genome-wide association study (GWAS) on milk fatty acids of Italian Holstein and Italian Simmental cattle breads and an RNASeq study on transcriptional profiles of swine mammary gland are conducted, respectively. In addition, (3) an in-house bioinformatics tool performing an original pathway analysis is presented. The tool, entirely built in R and named PIA (Pathways Interaction Analysis), is designed for post-genomic and transcriptomic data mining.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Khushi, Matloob. « Development of novel software tools and methods for investigating the significance of overlapping transcription factor genomic interactions ». Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14713.

Texte intégral

Résumé :

Identifying overlapping DNA binding patterns of different transcription factors is a major objective of genomic studies, but existing methods to archive large numbers of datasets in a personalised database lack sophistication and utility. To address this need, various database systems were benchmarked and a tool BiSA (Binding Sites Analyser) was developed for archiving of genomic regions and easy identification of overlap with or proximity to other regions of interest. BiSA can also calculate statistical significance of overlapping regions and can also identify genes located near binding regions of interest or genomic features near a gene or locus of interest. BiSA was populated with >1000 datasets from previously published genomic studies describing transcription factor binding sites and histone modifications. Using BiSA, the relationships between binding sites for a range of transcription factors were analysed and a number of statistically significant relationships were identified. This included an extensive comparison of estrogen receptor alpha (ERα) and progesterone receptor (PR) in breast cancer cells, which revealed a statistically significant functional relationship at a subset of sites. In summary, the BiSA comprehensive knowledge base contains publicly available datasets describing transcription factor binding sites and epigenetic modification and provides an easy graphical interface to biologists for advanced analysis of genomic interactions.

Styles APA, Harvard, Vancouver, ISO, etc.

43

Cui, Pin [Verfasser]. « Establishing high-throughput genomic and computational methods for the real time study of retroviral endogenization and evolution / Pin Cui ». Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1081935413/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

44

CROCI, OTTAVIO. « GENOMIC LANDSCAPE AND TRANSCRIPTIONAL REGULATION BY YAP AND MYC IN THE LIVER ». Doctoral thesis, Università degli Studi di Milano, 2018. http://hdl.handle.net/2434/556194.

Texte intégral

Résumé :

This thesis is divided in three sections; the main project is described in the first part, while additional projects are developed in two appendixes. In the main project we studied YAP, the downstream effector of the Hippo pathway, a transcriptional co-factor that plays a fundamental role in de-differentiation, cell proliferation and transformation. While its upstream regulation has been extensively studied, its role as transcriptional co-factor is still poorly understood. We show that YAP co-adjuvates the transcriptional responses of Myc oncogene to promote cell proliferation and transformation; when both YAP and Myc are overexpressed, YAP is recruited on genomic sites pre-marked by Myc, TEAD and active chromatin and potentiate the expression of cell cycle genes regulated by Myc. In addition, we show that YAP promotes cell de-differentiation by antagonizing in cis the expression of liver-specific genes controlled by HNF4A master regulator, thus providing a mechanism on how YAP can revert the phenotype of a differentiated hepatocyte into a progenitor cell. In the first appendix we explain the mechanism of BRD4 inhibition, a promising strategy for the treatment of Myc-driven tumors. The efficacy of this strategy relies on the control of transcriptional elongation mediated by BRD4 on gene promoters, independently of the downregulation of Myc oncogene. Although the inhibition of BRD4 causes its genome-wide displacement on promoters, the effects on transcription are restricted to a subset of sensitive genes. This specificity relies on the fact that while most genes compensate the drop in elongation caused by BRD4 inhibition with further recruitment of RNA Pol2 on promoters and maintain a proficient mRNA transcription, vulnerable genes are not able to promote these compensatory effects, because RNA Pol2 recruitment on these promoters is already maximized. Our results show how the impairment of elongation genome-wide can affect specific transcriptional programs. In the second appendix we describe a new web application, Chrokit, aimed at analyzing genomic data in a fast and intuitive way. Chrokit handles a set of genomic regions of interest and performs several tasks on them, such as selecting particular subsets, computing overlaps and visualize reads enrichment of specific chromatin features interactively. The application is multiplatform and can be run on dedicated servers to maximize computational power and provide accessibility to multiple users simultaneously.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Kaymaz, Yasin. « Genomic and Transcriptomic Investigation of Endemic Burkitt Lymphoma and Epstein Barr Virus ». eScholarship@UMMS, 2017. https://escholarship.umassmed.edu/gsbs_diss/914.

Texte intégral

Résumé :

Endemic Burkitt lymphoma (eBL) is the most common pediatric cancer in malaria-endemic equatorial Africa and nearly always contains Epstein-Barr virus (EBV), unlike sporadic Burkitt Lymphoma (sBL) that occurs with a lower incidence in developed countries. Despite this increased burden the study of eBL has lagged. Additionally, while EBV was isolated from an African Burkitt lymphoma tumor 50 years ago, however, the impact of viral variation in oncogenesis is just beginning to be fully explored. In my thesis research, I focused on investigating molecular genetics of the endemic form of this lymphoma with a particular emphasis on the role of the virus and its variation in pathogenesis using novel sequencing and bioinformatic strategies. First, we sought to understand pathogenesis by investigating transcriptomes using RNA sequencing (RNAseq) from 30 primary eBL tumors and compared to sBL tumors. BL tumor samples were prospectively obtained from 2009 until 2012 in Kenya. Within eBL tumors, minimal expression differences were found based on anatomical presentation site, in-hospital survival rates, and EBV genome type; suggesting that eBL tumors are homogeneous without marked subtypes. The outstanding difference detected using surrogate variable analysis was the significantly decreased expression of key genes in the immunoproteasome complex in eBL tumors carrying type 2 EBV compared to type 1 EBV. Secondly, in comparison to previously published pediatric sBL specimens, the majority of the expression and pathway differences were related to the PTEN/PI3K/mTOR signaling pathway and was correlated most strongly with EBV status rather than the geographic designation. Moreover, the common mutations were observed significantly less frequently in eBL tumors harboring EBV type 1, with mutation frequencies similar between tumors with EBV type 2 and without EBV. In addition to the previously reported genes, we identified a set of new genes mutated in BL. Overall, these suggested that EBV, particularly EBV type 1, supports BL oncogenesis alleviating the need for particular driver mutations in the human genome. Second, we sought to comprehensively define sequence variations of EBV across the viral genome in eBL tumor cells and normal infections, and correlate variations with clinical phenotypes and disease risk. We investigated the whole genome sequence of EBV from primary tumors (N=41) and plasma from eBL patients (N=21) as well as EBV in the blood of healthy children (N=29) within the same malaria endemic region. We conducted a genome wide association analysis study with viral genomes of healthy kids and BL kids. Furthermore, we found that the frequencies of EBV types among healthy kids were at equal levels while they were skewed in favor of type 1 (70%) among eBL kids. To pinpoint the fundamental divergence between viral genome subtypes, type 1 and type 2, we constructed phylogenetic trees comparing to all public EBV genomes. The pattern of variation defined the substructures correlated with the subtypes. This investigation not only deciphers the puzzling pathogenic differences between subtypes but also helps to understand how these two EBV types persist in the population at the same time. Overall, this research provides insight into the molecular underpinning of eBL and the role of EBV. It further provides the groundwork and means to unravel the complexity of EBV population structure and provide insight into the viral variation that may influence oncogenesis and outcomes in eBL and other EBV-associated diseases. In addition, genomic and mutational analyses of Burkitt lymphoma tumors identify key differences based on viral content and clinical outcomes suggesting new avenues for the development of prognostic molecular biomarkers and therapeutic interventions.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Kaymaz, Yasin. « Genomic and Transcriptomic Investigation of Endemic Burkitt Lymphoma and Epstein Barr Virus ». eScholarship@UMMS, 2007. http://escholarship.umassmed.edu/gsbs_diss/914.

Texte intégral

Résumé :

Endemic Burkitt lymphoma (eBL) is the most common pediatric cancer in malaria-endemic equatorial Africa and nearly always contains Epstein-Barr virus (EBV), unlike sporadic Burkitt Lymphoma (sBL) that occurs with a lower incidence in developed countries. Despite this increased burden the study of eBL has lagged. Additionally, while EBV was isolated from an African Burkitt lymphoma tumor 50 years ago, however, the impact of viral variation in oncogenesis is just beginning to be fully explored. In my thesis research, I focused on investigating molecular genetics of the endemic form of this lymphoma with a particular emphasis on the role of the virus and its variation in pathogenesis using novel sequencing and bioinformatic strategies. First, we sought to understand pathogenesis by investigating transcriptomes using RNA sequencing (RNAseq) from 30 primary eBL tumors and compared to sBL tumors. BL tumor samples were prospectively obtained from 2009 until 2012 in Kenya. Within eBL tumors, minimal expression differences were found based on anatomical presentation site, in-hospital survival rates, and EBV genome type; suggesting that eBL tumors are homogeneous without marked subtypes. The outstanding difference detected using surrogate variable analysis was the significantly decreased expression of key genes in the immunoproteasome complex in eBL tumors carrying type 2 EBV compared to type 1 EBV. Secondly, in comparison to previously published pediatric sBL specimens, the majority of the expression and pathway differences were related to the PTEN/PI3K/mTOR signaling pathway and was correlated most strongly with EBV status rather than the geographic designation. Moreover, the common mutations were observed significantly less frequently in eBL tumors harboring EBV type 1, with mutation frequencies similar between tumors with EBV type 2 and without EBV. In addition to the previously reported genes, we identified a set of new genes mutated in BL. Overall, these suggested that EBV, particularly EBV type 1, supports BL oncogenesis alleviating the need for particular driver mutations in the human genome. Second, we sought to comprehensively define sequence variations of EBV across the viral genome in eBL tumor cells and normal infections, and correlate variations with clinical phenotypes and disease risk. We investigated the whole genome sequence of EBV from primary tumors (N=41) and plasma from eBL patients (N=21) as well as EBV in the blood of healthy children (N=29) within the same malaria endemic region. We conducted a genome wide association analysis study with viral genomes of healthy kids and BL kids. Furthermore, we found that the frequencies of EBV types among healthy kids were at equal levels while they were skewed in favor of type 1 (70%) among eBL kids. To pinpoint the fundamental divergence between viral genome subtypes, type 1 and type 2, we constructed phylogenetic trees comparing to all public EBV genomes. The pattern of variation defined the substructures correlated with the subtypes. This investigation not only deciphers the puzzling pathogenic differences between subtypes but also helps to understand how these two EBV types persist in the population at the same time. Overall, this research provides insight into the molecular underpinning of eBL and the role of EBV. It further provides the groundwork and means to unravel the complexity of EBV population structure and provide insight into the viral variation that may influence oncogenesis and outcomes in eBL and other EBV-associated diseases. In addition, genomic and mutational analyses of Burkitt lymphoma tumors identify key differences based on viral content and clinical outcomes suggesting new avenues for the development of prognostic molecular biomarkers and therapeutic interventions.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Bezuidt, K. I. O. (Keoagile Ignatius Oliver). « Development of novel computational tools to infer the distribution patterns of bacterial accessory genomic elements and the implications of microevolution towards pathogenicity ». Thesis, University of Pretoria, 2013. http://hdl.handle.net/2263/40248.

Texte intégral

Résumé :

Bacterial diversity has always been associated with micro-evolutionary events such as horizontal gene transfer and DNA mutations. Such events influence the rapid evolution of bacteria as a result of the environmental conditions which they encounter. They further establish beneficial phenotypic effects that allow bacteria to specialize in new habitats. Due to the increase in number of bacterial genomic sequences, studying microbial evolution has been made possible, and the impact of micro-evolution on bacterial diversity is becoming more apparent. To gain biological information from this ever increasing genomic data, a variety of computational tools are required. This thesis therefore, focuses on the development and application of computational approaches to identify genomic regions of divergence which have resulted from horizontal gene transfer or small mutational changes. The first and major part of the thesis describes the application of DNA patterns, termed oligonucleotide signatures to identify horizontally acquired genomic regions in prokaryotes. These DNA patterns are demonstrated to differentiate between signatures of the core genome and those which have been acquired through horizontal transfer events. DNA patterns are further demonstrated to: reveal the distribution patterns of horizontally acquired genomic elements, determine their acquisition periods, and predict their putative donor organisms. The second part of the thesis focuses on the evaluation of modern short read sequence data of geographically unrelated Pseudomonas aeruginosa to study their intraclonal genomic diversity. The work described in the thesis was purely in silico driven and performed at Hannover Medical School and the Bioinformatics and Computation Biology Unit at the University of Pretoria.
Thesis (PhD)--University of Pretoria, 2013.
gm2014
Biochemistry
unrestricted

Styles APA, Harvard, Vancouver, ISO, etc.

48

DeConti, Derrick K. « Systematic Analysis of Duplications and Deletions in the Malaria Parasite P. falciparum : A Dissertation ». eScholarship@UMMS, 2015. https://escholarship.umassmed.edu/gsbs_diss/869.

Texte intégral

Résumé :

Duplications and deletions are a major source of genomic variation. Duplications, specifically, have a significant impact on gene genesis and dosage, and the malaria parasite P. falciparum has developed resistance to a growing number of anti-malarial drugs via gene duplication. It also contains highly duplicated families of antigenically variable allelic genes. While specific genes and families have been studied, a comprehensive analysis of duplications and deletions within the reference genome and population has not been performed. We analyzed the extent of segmental duplications (SD) in the reference genome for P. falciparum, primarily by a whole genome self alignment. We discovered that while 5% of the genome identified as SD, the distribution within the genome was partition clustered, with the vast majority localized to the subtelomeres. Within the SDs, we found an overrepresentation of genes encoding antigenically diverse proteins exposed to the extracellular membrane, specifically the var, rifin, and stevor gene families. To examine variation of duplications and deletions within the parasite populations, we designed a novel computational methodology to identify copy number variants (CNVs) from high throughput sequencing, using a read depth based approach refined with discordant read pairs. After validating the program against in vitro lab cultures, we analyzed isolates from Senegal for initial tests into clinical isolates. We then expanded our search to a global sample of 610 strains from Africa and South East Asia, identifying 68 CNV regions. Geographically, genic CNV were found on average in less than 10% of the population, indicating that CNV are rare. However, CNVs at high frequency were almost exclusively duplications associated with known drug resistant CNVs. We also identified the novel biallelic duplication of the crt gene – containing both the chloroquine resistant and sensitive allele. The synthesis of our SD and CNV analysis indicates a CNV conservative P. falciparum genome except where drug and human immune pressure select for gene duplication.

Styles APA, Harvard, Vancouver, ISO, etc.

49

DeConti, Derrick K. « Systematic Analysis of Duplications and Deletions in the Malaria Parasite P. falciparum : A Dissertation ». eScholarship@UMMS, 2004. http://escholarship.umassmed.edu/gsbs_diss/869.

Texte intégral

Résumé :

Duplications and deletions are a major source of genomic variation. Duplications, specifically, have a significant impact on gene genesis and dosage, and the malaria parasite P. falciparum has developed resistance to a growing number of anti-malarial drugs via gene duplication. It also contains highly duplicated families of antigenically variable allelic genes. While specific genes and families have been studied, a comprehensive analysis of duplications and deletions within the reference genome and population has not been performed. We analyzed the extent of segmental duplications (SD) in the reference genome for P. falciparum, primarily by a whole genome self alignment. We discovered that while 5% of the genome identified as SD, the distribution within the genome was partition clustered, with the vast majority localized to the subtelomeres. Within the SDs, we found an overrepresentation of genes encoding antigenically diverse proteins exposed to the extracellular membrane, specifically the var, rifin, and stevor gene families. To examine variation of duplications and deletions within the parasite populations, we designed a novel computational methodology to identify copy number variants (CNVs) from high throughput sequencing, using a read depth based approach refined with discordant read pairs. After validating the program against in vitro lab cultures, we analyzed isolates from Senegal for initial tests into clinical isolates. We then expanded our search to a global sample of 610 strains from Africa and South East Asia, identifying 68 CNV regions. Geographically, genic CNV were found on average in less than 10% of the population, indicating that CNV are rare. However, CNVs at high frequency were almost exclusively duplications associated with known drug resistant CNVs. We also identified the novel biallelic duplication of the crt gene – containing both the chloroquine resistant and sensitive allele. The synthesis of our SD and CNV analysis indicates a CNV conservative P. falciparum genome except where drug and human immune pressure select for gene duplication.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Choi, Ickwon. « Computational Modeling for Censored Time to Event Data Using Data Integration in Biomedical Research ». Case Western Reserve University School of Graduate Studies / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=case1307969890.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Computational Genomic »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres