Thèses : « Sequence analysis methods »

1

Park, Jong Hwa. « Genome sequence analysis and methods ». Thesis, University of Cambridge, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627329.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Isgro, Francesco. « Geometric methods for video sequence analysis and applications ». Thesis, Heriot-Watt University, 2001. http://hdl.handle.net/10399/495.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Verzotto, Davide. « Advanced Computational Methods for Massive Biological Sequence Analysis ». Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3426282.

Texte intégral

Résumé :

With the advent of modern sequencing technologies massive amounts of biological data, from protein sequences to entire genomes, are becoming increasingly available. This poses the need for the automatic analysis and classification of such a huge collection of data, in order to enhance knowledge in the Life Sciences. Although many research efforts have been made to mathematically model this information, for example finding patterns and similarities among protein or genome sequences, these approaches often lack structures that address specific biological issues. In this thesis, we present novel computational methods for three fundamental problems in molecular biology: the detection of remote evolutionary relationships among protein sequences, the identification of subtle biological signals in related genome or protein functional sites, and the phylogeny reconstruction by means of whole-genome comparisons. The main contribution is given by a systematic analysis of patterns that may affect these tasks, leading to the design of practical and efficient new pattern discovery tools. We thus introduce two advanced paradigms of pattern discovery and filtering based on the insight that functional and conserved biological motifs, or patterns, should lie in different sites of sequences. This enables to carry out space-conscious approaches that avoid a multiple counting of the same patterns. The first paradigm considered, namely irredundant common motifs, concerns the discovery of common patterns, for two sequences, that have occurrences not covered by other patterns, whose coverage is defined by means of specificity and extension. The second paradigm, namely underlying motifs, concerns the filtering of patterns, from a given set, that have occurrences not overlapping other patterns with higher priority, where priority is defined by lexicographic properties of patterns on the boundary between pattern matching and statistical analysis. We develop three practical methods directly based on these advanced paradigms. Experimental results indicate that we are able to identify subtle similarities among biological sequences, using the same type of information only once. In particular, we employ the irredundant common motifs and the statistics based on these patterns to solve the remote protein homology detection problem. Results show that our approach, called Irredundant Class, outperforms the state-of-the-art methods in a challenging benchmark for protein analysis. Afterwards, we establish how to compare and filter a large number of complex motifs (e.g., degenerate motifs) obtained from modern motif discovery tools, in order to identify subtle signals in different biological contexts. In this case we employ the notion of underlying motifs. Tests on large protein families indicate that we drastically reduce the number of motifs that scientists should manually inspect, further highlighting the actual functional motifs. Finally, we combine the two proposed paradigms to allow the comparison of whole genomes, and thus the construction of a novel and practical distance function. With our method, called Unic Subword Approach, we relate to each other the regions of two genome sequences by selecting conserved motifs during evolution. Experimental results show that our approach achieves better performance than other state-of-the-art methods in the whole-genome phylogeny reconstruction of viruses, prokaryotes, and unicellular eukaryotes, further identifying the major clades of these organisms.
Con l'avvento delle moderne tecnologie di sequenziamento, massive quantità di dati biologici, da sequenze proteiche fino a interi genomi, sono disponibili per la ricerca. Questo progresso richiede l'analisi e la classificazione automatica di tali collezioni di dati, al fine di migliorare la conoscenza nel campo delle Scienze della Vita. Nonostante finora siano stati proposti molti approcci per modellare matematicamente le sequenze biologiche, ad esempio cercando pattern e similarità tra sequenze genomiche o proteiche, questi metodi spesso mancano di strutture in grado di indirizzare specifiche questioni biologiche. In questa tesi, presentiamo nuovi metodi computazionali per tre problemi fondamentali della biologia molecolare: la scoperta di relazioni evolutive remote tra sequenze proteiche, l'individuazione di segnali biologici complessi in siti funzionali tra loro correlati, e la ricostruzione della filogenesi di un insieme di organismi, attraverso la comparazione di interi genomi. Il principale contributo è dato dall'analisi sistematica dei pattern che possono interessare questi problemi, portando alla progettazione di nuovi strumenti computazionali efficaci ed efficienti. Vengono introdotti così due paradigmi avanzati per la scoperta e il filtraggio di pattern, basati sull'osservazione che i motivi biologici funzionali, o pattern, sono localizzati in differenti regioni delle sequenze in esame. Questa osservazione consente di realizzare approcci parsimoniosi in grado di evitare un conteggio multiplo degli stessi pattern. Il primo paradigma considerato, ovvero irredundant common motifs, riguarda la scoperta di pattern comuni a coppie di sequenze che hanno occorrenze non coperte da altri pattern, la cui copertura è definita da una maggiore specificità e/o possibile estensione dei pattern. Il secondo paradigma, ovvero underlying motifs, riguarda il filtraggio di pattern che hanno occorrenze non sovrapposte a quelle di altri pattern con maggiore priorità, dove la priorità è definita da proprietà lessicografiche dei pattern al confine tra pattern matching e analisi statistica. Sono stati sviluppati tre metodi computazionali basati su questi paradigmi avanzati. I risultati sperimentali indicano che i nostri metodi sono in grado di identificare le principali similitudini tra sequenze biologiche, utilizzando l'informazione presente in maniera non ridondante. In particolare, impiegando gli irredundant common motifs e le statistiche basate su questi pattern risolviamo il problema della rilevazione di omologie remote tra proteine. I risultati evidenziano che il nostro approccio, chiamato Irredundant Class, ottiene ottime prestazioni su un benchmark impegnativo, e migliora i metodi allo stato dell'arte. Inoltre, per individuare segnali biologici complessi utilizziamo la nozione di underlying motifs, definendo così alcune modalità per il confronto e il filtraggio di motivi degenerati ottenuti tramite moderni strumenti di pattern discovery. Esperimenti su grandi famiglie proteiche dimostrano che il nostro metodo riduce drasticamente il numero di motivi che gli scienziati dovrebbero altrimenti ispezionare manualmente, mettendo in luce inoltre i motivi funzionali identificati in letteratura. Infine, combinando i due paradigmi proposti presentiamo una nuova e pratica funzione di distanza tra interi genomi. Con il nostro metodo, chiamato Unic Subword Approach, relazioniamo tra loro le diverse regioni di due sequenze genomiche, selezionando i motivi conservati durante l'evoluzione. I risultati sperimentali evidenziano che il nostro approccio offre migliori prestazioni rispetto ad altri metodi allo stato dell'arte nella ricostruzione della filogenesi di organismi quali virus, procarioti ed eucarioti unicellulari, identificando inoltre le sottoclassi principali di queste specie.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Oppermann, Madalina. « Chemical and mass spectrometrical methods in protein analysis / ». Stockholm, 2000. http://diss.kib.ki.se/2000/91-628-4542-x/.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

5

Lovmar, Lovisa. « Methods for Analysis of Disease Associated Genomic Sequence Variation ». Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis : Univ.-bibl. [distributör], 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-4525.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Reinhardt, Astrid. « Neural network-based methods for large scale protein sequence analysis ». Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.624141.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Henderson, Daniel Adrian. « Modelling and analysis of non-coding DNA sequence data ». Thesis, University of Newcastle Upon Tyne, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.299427.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

8

Tanaka, Emi. « Statistical Methods for Improving Motif Evaluation ». Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/13922.

Texte intégral

Résumé :

Gene regulation, especially cis-regulation of gene expression by the binding of transcription factors, is a critical component of cellular physiology. Transcription regulation is heavily influenced by the binding of transcription factors, and as such, it is of great interest to characterise these binding sites. The binding sites of a transcription factor are collectively referred to as a regulatory motif. Recent advancement in sequencing technology generated vast amounts of biological data. Thus computational tools are required to process and analyse this massive information. In particular, computational tools were developed to search for over-represented words among a set of co-regulated sequences. Such tools would be somewhat incomplete without a statistical analysis that allows researchers to discern between real biological significant sites and random artefacts. By analogy, it is difficult to imagine evaluating a BLAST result without its accompanying E-value. Of the many motif finders, MEME, with over 9000 unique users recorded in the first half of 2013, is one of the most popular motif finding tools available. Currently MEME evaluates its candidate motifs using an extension of BLAST's E-value to the motif finding context. Ng et al. (2006) previously showed the drawbacks of MEME's current significance evaluation scheme, however because MEME relies on the same E-value to internally rank competing candidate motifs, the alternative evaluation offered by Keich and Ng (2007) was not a practical substitute. Here we offer a two-tiered significance analysis that can replace the E-value in selecting the best candidate motif as well as in evaluating its overall statistical significance. We show that our new approach substantially improves MEME's motif finding performance and also provides the user with a reliable significance analysis. In addition, for large input sets our new approach is faster than the currently implemented E-value analysis. After applying a motif finder to a set of co-regulated DNA sequences, researchers often are interested to know whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. (2008) pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why motif comparison tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this distinguishability problem Habib et al. (2008) suggested a new score, the BLiC. This score uses a Bayesian information criterion to penalise matches that are similar to the background distribution. We show that the BLiC score exhibits other, highly undesirable properties. Therefore as an alternative, we offer a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implemented our method in Tomtom and we show that, without significantly compromising Tomtom's retrieval accuracy or runtime, we drastically reduce the number of uninformative alignments. The modified Tomtom is currently available as part of the MEME Suite at http://meme.nbcr.net. A motif is not limited to sites regulating gene expression. A motif is a recurring nucleotide sequence pattern that has a biological significance. One such example is in the context of the origins of replication of Saccharomyces cerevisiae. Autonomously replicating sequences (ARSs) are DNA fragments that promote extrachromosomal maintenance of plasmids. These ARSs mostly coincide with origins of DNA replication and therefore we use the terms interchangeably. The origins of replication in Saccharomyces cerevisiae have a highly conserved sequence known as the ACS (ARS consensus sequence). Depending on the reference, its representation varies from the 11bp consensus sequence WTTTAYRTTTW to a 33bp position weight matrix. While the replication origins of some species, such as Schizosaccharomyces pombe and metazoans, do not have any known motif, Liachko et al. (2010) found that the replication origins of another budding yeast Kluyveromyces lactis share a 50-bp ACS motif which is inherently different to the ACS motif found in S. cerevisiae. Here we characterise ARSs in Lachancea (Saccharomyces) kluyveri - a pre-whole genome duplication budding yeast. In addition, we demonstrate that ARS function in L. kluyveri is dependent on a much longer sequence compared with S. cerevisiae and K. lactis. Furthermore, the system of replication initiation in L. kluyveri appears to be more permissive than in these other two species - it is able to initiate replication from all S. cerevisiae ARSs and most K. lactis ARSs, while only half of L. kluyveri ARSs function in S. cerevisiae and less than 10% function in K. lactis. Our findings demonstrate a replication initiation system with novel features and underscore its functional diversity within the budding yeasts.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Chen, Zhuo. « Smart Sequence Similarity Search (S⁴) system ». CSUSB ScholarWorks, 2004. https://scholarworks.lib.csusb.edu/etd-project/2458.

Texte intégral

Résumé :

Sequence similarity searching is commonly used to help clarify the biochemical and physiological features of newly discovered genes or proteins. An efficient similarity search relies on the choice of tools and their associated subprograms and numerous parameter settings. To assist researchers in selecting optimal programs and parameter settings for efficient sequence similarity searches, the web-based expert system, Smart Sequence Similarity Search (S4) was developed.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Holder, Mark Travis. « Using a complex model of sequence evolution to evaluate and improve phylogenetic methods ». Access restricted to users with UT Austin EID Full text (PDF) from UMI/Dissertation Abstracts International, 2001. http://wwwlib.umi.com/cr/utexas/fullcit?p3037500.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

11

Rausch, Tobias [Verfasser]. « Dissecting multiple sequence alignment methods : the analysis, design and development of generic multiple sequence alignment components in SeqAn / Tobias Rausch ». Berlin : Freie Universität Berlin, 2010. http://d-nb.info/1024541460/34.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

12

Wang, Kai. « Novel computational methods for accurate quantitative and qualitative protein function prediction / ». Thesis, Connect to this title online ; UW restricted, 2005. http://hdl.handle.net/1773/11488.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

13

Miao, Hanjin. « Intelligent system methods for energy management system and sequence-of-events recorder information analysis / ». Thesis, Connect to this title online ; UW restricted, 1996. http://hdl.handle.net/1773/6133.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

14

Roth, Christian [Verfasser]. « Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts / Christian Roth ». Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://nbn-resolving.de/urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

15

Gelfond, Jonathan A. L. Ibrahim Joseph George. « Bayesian model-based methods for the analysis of DNA microarrays with survival, genetic, and sequence data ». Chapel Hill, N.C. : University of North Carolina at Chapel Hill, 2007. http://dc.lib.unc.edu/u?/etd,972.

Texte intégral

Résumé :

Thesis (Ph. D.)--University of North Carolina at Chapel Hill, 2007.
Title from electronic title page (viewed Dec. 18, 2007). "... in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Biostatistics, School of Public Health." Discipline: Biostatistics; Department/School: Public Health.

Styles APA, Harvard, Vancouver, ISO, etc.

16

De, Groeve Johannes. « A wildlife journey in space and time : methodological advancements in the assessment and analysis of spatio-temporal patterns of animal movement across European landscapes ». Doctoral thesis, country:BE, 2018. http://hdl.handle.net/10449/52251.

Texte intégral

Résumé :

Movement is one of the most fundamental processes for living entities on earth at the core of scientific disciplines such as ecology and geography. In animal ecology, ongoing progress in tracking and remote sensing technologies has spurred an explosion of movement and environmental data collected at high spatial and temporal resolution, at a large scale, so that the interaction between animal movement and habitat features can now be investigated in much more detail. As a result, in recent years the field of animal ecology has produced a growing body of studies on movement-based patterns leading to habitat use and selection. In this regard, GIScience has contributed with several visual analytical approaches to study animals in relation to their environment and habitat. However, the pat - terns behind the sequential use of different habitat classes have remained largely unexplored. Sequential habitat use is defined as the consecutive use of habitat features along the trajectory of an animal, extracted from the context of its spatial movement. By account - ing for the sequence of use, it is possible to distinguish fundamentally different behavioural habitat use strategies that are important for the survival and fitness of an animal, such as habitat alternation versus random sequential use. Such distinctions would remain undetected by only considering the proportion of use. Sequential habitat use patterns occur in a spatial context, meaning sequential patterns are affected by what is actually available to the animal. In this dissertation we merge knowledge from different fields to present an innovative method to study the relation between animals and their environment by accounting for the sequential use of habitats, and animal movement rules. We developed a visually effective method to analyse and visualise sequential habitat use patterns of animals at multiple spatio- temporal scales by combining real and simulated sequences of habitat use. To study sequential habitat use patterns we use Sequence Analysis Methods (SAM), an approach widely applied in molecular biology, as well as many applications in different fields, to measure dissimilarity between sequences of characters. In brief, we use dissimilarity algorithms to measure the distance between all pairs of sequences, and then apply a cluster - ing algorithm to investigate how these sequences group together, which are visualised as dissimilarity trees. We propose a procedure consisting of three steps, including explo- ration, simulation and classification. In the exploration phase, we build exploratory trees, which visualise real sequential habitat use patterns. Second, by applying animal movement models we simulate expected sequential habitat use patterns, and assess how spatial context, and especially habitat availability, affects the clustering of sequential patterns. Third, we combine real and simulated sequences to identify which simulated pattern is most parsimonious with the real sequences. The research progress has been presented in three main chapters. In Chapter 3 we present seminal methodological development where SAM was applied to animal movement data. In Chapter 4 we introduce further methodological advancements to extend the applicability of SAM to animal ecology. In Chapter 5 we present a large-scale multi-population ecological application. All research was performed using GPS movement data of roe deer and environmental data provided by the Euroungulates database project. Chapter 3 presents the first application of SAM to identify ecologically relevant sequential patterns in animal habitat use. We exemplify the method using ecological data consisting of simulated and real trajectories from a roe deer population (Capreolus capreolus) in the Italian Alps, expressed as ordered sequences of four habitat use classes, i.e. high/open, high/closed, low/open, low/closed. In essence, the SAM framework identifies relevant sequential patterns in real trajectories by measuring their similarity to spatially-explicit simulated trajectories with known sequential patterns. Simulation trajectories were generated in arenas resembling the landscape structure of the roe deer population. Chapter 4 extends SAM to an individual-based approach (i.e. IM-SAM, Individual Movement – Sequence Analysis Methods), that is applicable over multiple populations. Specifically, instead of performing simulations in landscape-like arenas, we use real individual home ranges, thus accounting for individual spatial context, and landscape composition and structure. To assess usability of our advanced framework we investigate the sequential use of open and forest habitats for nine roe deer populations ranging in landscapes with different geographic contexts and anthropogenic disturbance. We also discuss implications for conservation and management. Chapter 5 addresses the functional role of landscapes throughout seasons by identifying both population level and individual level variability in the sequential habitat use patterns of roe deer, identified in the former nine roe deer populations. We show how identified sequential habitat use patterns can be treated as variables, and analysed with standard and well-accepted statistical methods. While the (IM-)SAM framework was developed for studying sequential habitat use in specific, we highlight that its methodological steps and study design can easily be gener- alised. Indeed, its dissimilarity and clustering algorithms, temporal resolution, sampling units, and number of classes for which sequential patterns are investigated can all be customised for the specific research questions in mind. (IM-)SAM is easily applicable to different types of sequential data that describe aspects of an animal's internal (e.g. heart rate) or external state (e.g. temperature). Through improvements in technology, including the growing number of information that can be collected through sensors (GPS trackers, biologgers and satellites), improving database infrastructures and the instant availability of advanced R packages dedicated to animal movement, (IM-)SAM could be easily integrated in a wide range of both local and broad-scaled behavioural spatio-temporal studies.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Pratas, Diogo. « Compression and analysis of genomic data ». Doctoral thesis, Universidade de Aveiro, 2016. http://hdl.handle.net/10773/16286.

Texte intégral

Résumé :

Doutoramento em Informática
Genomic sequences are large codi ed messages describing most of the structure of all known living organisms. Since the presentation of the rst genomic sequence, a huge amount of genomics data have been generated, with diversi ed characteristics, rendering the data deluge phenomenon a serious problem in most genomics centers. As such, most of the data are discarded (when possible), while other are compressed using general purpose algorithms, often attaining modest data reduction results. Several speci c algorithms have been proposed for the compression of genomic data, but unfortunately only a few of them have been made available as usable and reliable compression tools. From those, most have been developed to some speci c purpose. In this thesis, we propose a compressor for genomic sequences of multiple natures, able to function in a reference or reference-free mode. Besides, it is very exible and can cope with diverse hardware speci cations. It uses a mixture of nite-context models (FCMs) and eXtended FCMs. The results show improvements over state-of-the-art compressors. Since the compressor can be seen as a unsupervised alignment-free method to estimate algorithmic complexity of genomic sequences, it is the ideal candidate to perform analysis of and between sequences. Accordingly, we de ne a way to approximate directly the Normalized Information Distance, aiming to identify evolutionary similarities in intra- and inter-species. Moreover, we introduce a new concept, the Normalized Relative Compression, that is able to quantify and infer new characteristics of the data, previously undetected by other methods. We also investigate local measures, being able to locate speci c events, using complexity pro les. Furthermore, we present and explore a method based on complexity pro les to detect and visualize genomic rearrangements between sequences, identifying several insights of the genomic evolution of humans. Finally, we introduce the concept of relative uniqueness and apply it to the Ebolavirus, identifying three regions that appear in all the virus sequences outbreak but nowhere in the human genome. In fact, we show that these sequences are su cient to classify di erent sub-species. Also, we identify regions in human chromosomes that are absent from close primates DNA, specifying novel traits in human uniqueness.
As sequências genómicas podem ser vistas como grandes mensagens codificadas, descrevendo a maior parte da estrutura de todos os organismos vivos. Desde a apresentação da primeira sequência, um enorme número de dados genómicos tem sido gerado, com diversas características, originando um sério problema de excesso de dados nos principais centros de genómica. Por esta razão, a maioria dos dados é descartada (quando possível), enquanto outros são comprimidos usando algoritmos genéricos, quase sempre obtendo resultados de compressão modestos. Têm também sido propostos alguns algoritmos de compressão para sequências genómicas, mas infelizmente apenas alguns estão disponíveis como ferramentas eficientes e prontas para utilização. Destes, a maioria tem sido utilizada para propósitos específicos. Nesta tese, propomos um compressor para sequências genómicas de natureza múltipla, capaz de funcionar em modo referencial ou sem referência. Além disso, é bastante flexível e pode lidar com diversas especificações de hardware. O compressor usa uma mistura de modelos de contexto-finito (FCMs) e FCMs estendidos. Os resultados mostram melhorias relativamente a compressores estado-dearte. Uma vez que o compressor pode ser visto como um método não supervisionado, que não utiliza alinhamentos para estimar a complexidade algortímica das sequências genómicas, ele é o candidato ideal para realizar análise de e entre sequências. Em conformidade, definimos uma maneira de aproximar directamente a distância de informação normalizada (NID), visando a identificação evolucionária de similaridades em intra e interespécies. Além disso, introduzimos um novo conceito, a compressão relativa normalizada (NRC), que é capaz de quantificar e inferir novas características nos dados, anteriormente indetectados por outros métodos. Investigamos também medidas locais, localizando eventos específicos, usando perfis de complexidade. Propomos e exploramos um novo método baseado em perfis de complexidade para detectar e visualizar rearranjos genómicos entre sequências, identificando algumas características da evolução genómica humana. Por último, introduzimos um novo conceito de singularidade relativa e aplicamo-lo ao Ebolavirus, identificando três regiões presentes em todas as sequências do surto viral, mas ausentes do genoma humano. De facto, mostramos que as três sequências são suficientes para classificar diferentes sub-espécies. Também identificamos regiões nos cromossomas humanos que estão ausentes do ADN de primatas próximos, especificando novas características da singularidade humana.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Bajak, Edyta Zofia. « Genotoxic stress : novel biomarkers and detection methods : uncovering RNAs role in epigenetics of carcinogenesis / ». Stockholm, 2005. http://diss.kib.ki.se/2005/91-7140-415-5/.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

19

Marshall, Jean-Claude. « Transcriptional and genetic profiling of human uveal melanoma from an immunosuppressed rabbit model ». Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=103272.

Texte intégral

Résumé :

Uveal melanoma is the most common primary intraocular malignant tumour in adults. Despite improvements in the diagnosis and treatment of the primary tumour, patients continue to have the same mortality rate as several decades ago, reflecting our poor understanding of the mechanisms behind the formation of metastases in this disease. The purpose of this study was therefore to characterize an animal model of uveal melanoma and use this model to study the transcriptional changes that cells undergo from culture to intraocular tumour, to circulation and finally to the formation of a metastatic nodule.
Using microarrays we identified 314 changes in transcript abundance between the intraocular tumour and metastatic lesions. Principal Components Analysis was used to cluster these transcripts into four distinct groups. A further 61 gene transcripts showed statistically significant changes between re-cultured cells isolated from the model, with the circulating malignant cells representing an intermediate step between cells isolated from intraocular tumours and metastatic lesions. We have produced a detailed analysis of the molecular changes that take place as human uveal melanoma cells evolve from a primary tumour to metastasis in an animal model, including the decrease in expression of specific melanoma markers. These changes were verified using quantitative real time polymerase chain reaction and three different functional assays.
In addition we sought to describe the genetic changes that are present in these cells. Using comparative genomic hybridization arrays we were able to successfully describe the deletions and amplifications that are present in genomic DNA extracted from paraffin embedded sections of the primary tumour. This represents the first time that archival tissue has successfully been used for this sort of analysis in uveal melanoma. We identified several genomic amplifications and deletions including an area of amplification of Wnt2, which is involved in beta-catenin regulation and C-Met, which plays a role in tumour cell homing to the liver in patients.
To the best of our knowledge, this is the first time that a detailed genetic analysis has been carried out on the progression of uveal melanoma from intraocular tumour, to circulation, to the formation of metastases.

Styles APA, Harvard, Vancouver, ISO, etc.

20

Morozov, Vyacheslav. « Computational Methods for Inferring Transcription Factor Binding Sites ». Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23382.

Texte intégral

Résumé :

Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. PWMs are compiled from experimentally verified and aligned binding sequences. PWMs are then used to computationally discover novel putative binding sites for a given protein. DNA-binding proteins often show degeneracy in their binding requirement, the overall binding specificity of many proteins is unknown and remains an active area of research. Although PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. A previous study introduced a novel method to PWM training based on the known motifs to sample additional putative binding sites from a proximal promoter area. The core idea was further developed, implemented and tested in this thesis with a large scale application. Improved mono- and dinucleotide PWMs were computed for Drosophila melanogaster. The Matthews correlation coefficient was used as an optimization criterion in the PWM refinement algorithm. New PWMs keep an account of non-uniform background nucleotide distributions on the promoters and consider a larger number of new binding sites during the refinement steps. The optimization included the PWM motif length, the position on the promoter, the threshold value and the binding site location. The obtained predictions were compared for mono- and dinucleotide PWM versions with initial matrices and with conventional tools. The optimized PWMs predicted new binding sites with better accuracy than conventional PWMs.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Lu, Yang 1972. « High throughput study of the translational effect of human single nucleotide polymorphisms ». Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=116089.

Texte intégral

Résumé :

Introduction: As a part of the Gene Regulators in Disease project (GRID), this study aims to create a novel high throughput method to discover the genetic effect on gene translation, taking advantage of the rationale that efficiently translated mRNAs associate with multiple ribosomes, while less active ones with fewer or none.
Methods: Lymphoblastoid cell lines (LCLs) from 44 HapMap European individuals were used for polyribosomal fractionation and establishing the sample bank for the future study. The fractionated mRNA samples of 10 out of the 44 individuals were run on an Illumina GoldenGate Beadarray to detect allelic imbalance (developed by the group of T.J. Hudson and T.M. Pastinen).
Results: This study established a high-quality RNA bank, including 1,100 RNA fraction samples. By the Illumina chip, translational imbalance was detected in 75 out of 1483 (5.06%) assays, and 63 out of269 (23.4%) genes. The translational effect was well replicable by the resequencing method.
Conclusion: This study found that genetic effect on gene translation is a common mechanism of expression regulation. Our best hit found in the integrin beta 1 binding protein 1 gene (ITGB1BP1 ) highlights the role of mRNA 3'UTR secondary structure in gene translation.
Keywords: Gene translation, High throughput genotyping, Human genetics, Polyribosome, RNA, Single nucleotide polymorphism

Styles APA, Harvard, Vancouver, ISO, etc.

22

Yu, Xuesong. « Statistical methods for analyzing genomic data with consideration of spatial structures / ». Thesis, Connect to this title online ; UW restricted, 2007. http://hdl.handle.net/1773/9553.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

23

Wagner, Brandie D. « Permutation based microarray gene selection methods with covarience adjustment applicable to complex diseases / ». Connect to full text via ProQuest. Limited to UCD Anschutz Medical Campus, 2007.

Trouver le texte intégral

Résumé :

Thesis (Ph.D. in Analytic Health Sciences) -- University of Colorado Denver, 2007.
Typescript. Includes bibliographical references (leaves 57-60). Free to UCD affiliates. Online version available via ProQuest Digital Dissertations;

Styles APA, Harvard, Vancouver, ISO, etc.

24

Buttriss, Gary John Marketing Australian School of Business UNSW. « An analysis of the process of evolution and impact of internet technologies on firm behaviour and performance using narrative sequence methods ». Publisher:University of New South Wales. Marketing, 2009. http://handle.unsw.edu.au/1959.4/43561.

Texte intégral

Résumé :

This research suggests that to model the complex dynamics of the organisational change in a firm evolving as it implements internet technologies requires capturing diverse independent and interdependent processes across multiple temporal and spatial context both within and external to the firm. This presents both an ontological and epistemological challenge as dominant research methods are either atemporal in nature and attribute action to disembodied variables or are simply storytelling. To provide explanatory legitimacy requires going deeper to capture the action of actors 'acting' within multiple levels of context and to pinpoint deeper 'rock-bottom' causal mechanisms that drive the higher order processes that give rise to the 'organisational life' we observe. To accomplish explanatory legitimacy I develop an analytical method that makes processuality fundamental and allows for the examination and theorising about mechanisms. The first essential element of this method is a framework that guides the researcher in the systematic gathering together of what we already know from the multidisciplinary and eclectic research in e-business, and in the intensive work of gathering empirical evidence. I apply a new methodology I call narrative sequence analysis, that combines process tracing and sequence analyses to make processes intelligible and help illustrate how mechanisms drive these processes. I use this method to develop an explanatory account of the process of e-business development covering three episodes of change within the Commonwealth Bank of Australia from 1995 to 2006. The research finds that the firm evolves over time as it develops new capabilities and identifies and pursues development opportunities by assembling and committing resources to e-business though both technology development and business application. It draws on past experience and gradually learns to develop, integrate and implement technology into existing business operations, discovers new innovative opportunities in which to apply the technology or is drawn into new areas by others who identify opportunities in which to apply the firm's knowledge, resources and technology. The path to development depends on the firms starting position and the timing of the sequence of events encountered along the way. It is a coevolutionary process where the firm interacts, cooperates, adapts and responds to the actions and interactions of other actors, balanced by the uncertainty of e-business and business operation risk.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Rossiny, Vanessa Delphine. « Expression analysis of the 3p25.3-ptelomere genes in epithelial ovarian cancer ». Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=112355.

Texte intégral

Résumé :

Microarray expression analysis was carried out to identify genes with a role in epithelial ovarian cancer (EOC). The U133A Affymetrix GeneChipRTM was used to determine the expression patterns of the 3p25.3-ptel genes represented on the microarray in 14 primary cultures of normal ovarian surface epithelial (NOSE) samples, 25 frozen malignant ovarian tumor samples and four EOC cell lines. Seven genes with differential expression patterns in the tumor samples compared to the NOSE samples were identified as candidates for further analysis, starting with ARPC4, SRGAP3 and ATP2B2. Although none of the candidates had been previously studied in ovarian cancer, several had either family or pathway members that had. Expression patterns seemed unaffected by either tumor histopathological subtype or the allelic imbalances observed with loss of heterozygosity (LOH) analysis. The absence of association with genomic context suggested that differential expression was the result of transcriptional regulation rather than direct targeting.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Friedrich, Torben. « New statistical Methods of Genome-Scale Data Analysis in Life Science - Applications to enterobacterial Diagnostics, Meta-Analysis of Arabidopsis thaliana Gene Expression and functional Sequence Annotation ». kostenfrei, 2009. http://www.opus-bayern.de/uni-wuerzburg/volltexte/2009/3985/.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

27

Ellis, Stephen James. « An exploration of James Dreier’s Standard Tune Learning Sequence in a self-directed learning environment : an interpretative phenomenological analysis ». Thesis, Rhodes University, 2014. http://hdl.handle.net/10962/d1011312.

Texte intégral

Résumé :

This qualitative case study was undertaken in order to explore the experiences of drum set students who apply themselves to James Dreier’s Standard Tune Learning Sequence (STLS) in a self-directed learning environment. These experiences ultimately shed light on how best to implement Differentiated Instruction to the STLS. The study draws on the experience of three adult drum students under the instruction of the author. The students were provided with the STLS and left to proceed with it on their own. They were asked to keep a record of their progress in the form of a learning journal. These learning journals were used, in conjunction with transcribed interviews and learner profiles, as data for this study and as such were subjected to Interpretative Phenomenological Analysis. The study recognizes three factors which affect the student’s successful progression through the STLS: readiness, interest and meaning. Each factor is discussed in relation to literature on differentiated Instruction. Recommendations are made regarding the implementation of Differentiated Instruction to the STLS.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Wessel, Jennifer. « Human genetic-epidemiologic association analysis via allelic composition and DNA sequence similarity methods applications to blood-based gene expression biomarkers of disease / ». Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2006. http://wwwlib.umi.com/cr/ucsd/fullcit?p3237548.

Texte intégral

Résumé :

Thesis (Ph. D.)--University of California, San Diego and San Diego State University, 2006.
Title from first page of PDF file (viewed December 12, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Wang, Bo. « Novel statistical methods for evaluation of metabolic biomarkers applied to human cancer cell lines ». Miami University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=miami1399046331.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

30

Sacan, Ahmet. « Similarity Search And Analysis Of Protein Sequences And Structures : A Residue Contacts Based Approach ». Phd thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/12609754/index.pdf.

Texte intégral

Résumé :

The advent of high-throughput sequencing and structure determination techniques has had a tremendous impact on our quest in cracking the language of life. The genomic and protein data is now being accumulated at a phenomenal rate, with the motivation of deriving insights into the function, mechanism, and evolution of the biomolecules, through analysis of their similarities, differences, and interactions. The rapid increase in the size of the biomolecular databases, however, calls for development of new computational methods for sensitive and efficient management and analysis of this information. In this thesis, we propose and implement several approaches for accurate and highly efficient comparison and retrieval of protein sequences and structures. The observation that corresponding residues in related proteins share similar inter-residue contacts is exploited in derivation of a new set of biologically sensitive metric amino acid substitution matrices, yielding accurate alignment and comparison of proteins. The metricity of these matrices has allowed efficient indexing and retrieval of both protein sequences and structures. A landmark-guided embedding of protein sequences is developed to represent subsequences in a vector space for approximate, but extremely fast spatial indexing and similarity search. Whereas protein structure comparison and search tasks were hitherto handled separately, we propose an integrated approach that serves both of these tasks and performs comparable to or better than other available methods. Our approach hinges on identification of similar residue contacts using distance-based indexing and provides the best of the both worlds: the accuracy of detailed structure alignment algorithms, at a speed comparable to that of the structure retrieval algorithms. We expect that the methods and tools developed in this study will find use in a wide range of application areas including annotation of new proteins, discovery of functional motifs, discerning evolutionary relationships among genes and species, and drug design and targeting.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Coker, Jeffrey Scott. « The systemic response to fire damage in tomato plants a case study in the development of methods for gene expression analysis using sequence data / ». NCSU, 2004. http://www.lib.ncsu.edu/theses/available/etd-05072004-132534/.

Texte intégral

Résumé :

Fire is a natural component of most terrestrial ecosystems and can act as a local wound stimulus to plants. The ultimate goal of this work was to characterize the array of transcripts which systemically accumulate in plants after fire damage. Before this could be accomplished, substantial development of methods for gene expression analysis using sequence data was necessary. This involved developing methods for identifying contamination in DNA sequence data (Chapter 2), identifying over 78,000 false sequences in GenBank and several thousand more in the indica rice genome (Chapter 2), developing a novel method for identifying housekeeping controls using sequence data (Chapter 3), performing relative expression analyses for 127 potential housekeeping control transcripts (Chapter 3), and characterizing 23 transcripts which encode all 13 subunits of vacuolar H+-ATPases in tomato plants (Chapter 4). A subtractive cDNA library served as a starting point to identify and characterize 9 novel tomato transcripts systemically up-regulated in leaves in the first hour after a distant leaf is flame wounded (Chapters 5). Real-time RT-PCR using leaf RNA isolated at different times after flaming showed that the most common pattern of transcript accumulation was an increase within 30 to 60 minutes, followed by a return to basal levels within 3 hours. Expression analyses also showed that most up-regulated transcripts were already present in unwounded tissues. A total of 46 different transcripts were identified from the subtractive cDNA library (Chapters 6). Compared with the entire tomato transcriptome, these 46 transcripts are very highly conserved in plants. The vast majority fell into 5 classes: enzymes of general metabolism; protein synthesis, modification, and transport; transcription; membrane transport; and photosynthesis and respiration. At least half of the transcripts have been previously associated with wounding or stress, suggesting that the systemic response to fire damage has components similar to those of other wound and stress responses. On the other hand, 30% of transcripts were associated with photosynthesis and respiration, suggesting that part of the response to fire damage is notably different from other wound and stress responses. Conclusions and future directions are included in Chapter 7.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Thummadi, B. Veeresh. « SOFTWARE DESIGN METHODOLOGIES, ROUTINES AND ITERATIONS : A MULTIPLE-CASE STUDY OF AGILE AND WATERFALL PROCESSES ». Case Western Reserve University School of Graduate Studies / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=case1396363465.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

33

Chrysostomou, Charalambos. « Characterisation and classification of protein sequences by using enhanced amino acid indices and signal processing-based methods ». Thesis, De Montfort University, 2013. http://hdl.handle.net/2086/9895.

Texte intégral

Résumé :

Protein sequencing has produced overwhelming amount of protein sequences, especially in the last decade. Nevertheless, the majority of the proteins' functional and structural classes are still unknown, and experimental methods currently used to determine these properties are very expensive, laborious and time consuming. Therefore, automated computational methods are urgently required to accurately and reliably predict functional and structural classes of the proteins. Several bioinformatics methods have been developed to determine such properties of the proteins directly from their sequence information. Such methods that involve signal processing methods have recently become popular in the bioinformatics area and been investigated for the analysis of DNA and protein sequences and shown to be useful and generally help better characterise the sequences. However, there are various technical issues that need to be addressed in order to overcome problems associated with the signal processing methods for the analysis of the proteins sequences. Amino acid indices that are used to transform the protein sequences into signals have various applications and can represent diverse features of the protein sequences and amino acids. As the majority of indices have similar features, this project proposes a new set of computationally derived indices that better represent the original group of indices. A study is also carried out that resulted in finding a unique and universal set of best discriminating amino acid indices for the characterisation of allergenic proteins. This analysis extracts features directly from the protein sequences by using Discrete Fourier Transform (DFT) to build a classification model based on Support Vector Machines (SVM) for the allergenic proteins. The proposed predictive model yields a higher and more reliable accuracy than those of the existing methods. A new method is proposed for performing a multiple sequence alignment. For this method, DFT-based method is used to construct a new distance matrix in combination with multiple amino acid indices that were used to encode protein sequences into numerical sequences. Additionally, a new type of substitution matrix is proposed where the physicochemical similarities between any given amino acids is calculated. These similarities were calculated based on the 25 amino acids indices selected, where each one represents a unique biological protein feature. The proposed multiple sequence alignment method yields a better and more reliable alignment than the existing methods. In order to evaluate complex information that is generated as a result of DFT, Complex Informational Spectrum Analysis (CISA) is developed and presented. As the results show, when protein classes present similarities or differences according to the Common Frequency Peak (CFP) in specific amino acid indices, then it is probable that these classes are related to the protein feature that the specific amino acid represents. By using only the absolute spectrum in the analysis of protein sequences using the informational spectrum analysis is proven to be insufficient, as biologically related features can appear individually either in the real or the imaginary spectrum. This is successfully demonstrated over the analysis of influenza neuraminidase protein sequences. Upon identification of a new protein, it is important to single out amino acid responsible for the structural and functional classification of the protein, as well as the amino acids contributing to the protein's specific biological characterisation. In this work, a novel approach is presented to identify and quantify the relationship between individual amino acids and the protein. This is successfully demonstrated over the analysis of influenza neuraminidase protein sequences. Characterisation and identification problem of the Influenza A virus protein sequences is tackled through a Subgroup Discovery (SD) algorithm, which can provide ancillary knowledge to the experts. The main objective of the case study was to derive interpretable knowledge for the influenza A virus problem and to consequently better describe the relationships between subtypes of this virus. Finally, by using DFT-based sequence-driven features a Support Vector Machine (SVM)-based classification model was built and tested, that yields higher predictive accuracy than that of SD. The methods developed and presented in this study yield promising results and can be easily applied to proteomic fields.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Lacroix, Céline. « Nrg1p and Rfg1p in Candida albicans yeast-to-hyphae transition ». Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=112528.

Texte intégral

Résumé :

The ability of Candida albicans to change morphology plays several roles in its virulence and as a human commensal. The yeast-to-hyphae transition is tightly regulated by several sets of activating and repressing pathways. The DNA-binding proteins Rfg1p, Nrg1p and the global repressor Tup1p are part of the repressors found to regulate this morphogenesis. Knowledge of these repressors is based on extrapolations from homology to S. cerevisiae and from expression studies of mutants in inducing conditions, all of which are indirect means of determining a protein's function. We proposed a genome-wide location study of the Nrg1 and Rfg1 transcription factors to obtain direct data to identify their in vivo targets. Our results suggest different avenues for Nrg1p function and a regulation behaviour diverging from the previously suggested model: Nrg1p acts not only as a repressor but also as a transcription activator. Furthermore it regulates its target genes through binding in their coding regions instead binding to the expected regulatory elements on promoters.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Maskos, Uwe. « A novel method of nucleic acid sequence analysis ». Thesis, University of Oxford, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.306792.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

36

Milani, Cintia. « Expressão gênica diferencial das células estromais obtidas de medula óssea na presença ou ausência de célula tumoral oculta em pacientes com câncer de mama ». Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/5/5155/tde-19032010-115152/.

Texte intégral

Résumé :

A célula estromal pode influenciar o desenvolvimento do tumor no sítio primário e secundário, mas pouco é conhecido sobre as características moleculares das células estromais presentes na medula óssea de pacientes com câncer de mama. Nosso objetivo foi avaliar a expressão gênica diferencial entre as células estromais oriundas de medula óssea na presença ou ausência de célula tumoral oculta. Coletamos dez aspirados de medula óssea das pacientes com câncer de mama. A classificação do comprometimento da medula por células tumorais ocultas foi realizada pela detecção da expressão de CK19 por Nested-RT-PCR e quatro entre dez pacientes apresentaram presença de célula tumoral na medula óssea. Estabelecemos culturas primárias de células estromais de todas as amostras e, selecionamos amostras originárias de duas pacientes contendo linfonodos comprometidos e presença de célula tumoral oculta em medula e também de duas pacientes que não apresentavam linfonodos comprometidos e nem célula tumoral oculta na medula. As pacientes selecionadas eram pós-menopausadas com diagnóstico de carcinoma ductal invasor e expressão imunohistoquímica positiva para receptor de estrógeno e progesterona. Realizamos avaliação do perfil de expressão gênica entre estes dois grupos, o que nos revelou 21 genes diferencialmente expressos dentre os 4.608 genes imobilizados em lâmina de cDNA microarray; nove genes hiperexpressos em célula estromal de medula comprometida (PTHLH, TLOC1, NCOA6, C17orf57, ANAPC11, MAST4, POLR3E, CPNE1 e B4GALT5) e doze genes hipoexpressos em célula estromal de medula comprometida (MRPL2, NAT10, DAP, RNF2, FLOT2, FKBP10, SLIT3, EBNA1BP2, SLC35B2, MICAL2, GPR3, TSPAN17). Nossos dados sugerem que apesar da expressão gênica de células estromais oriundas de medula óssea comprometida ou não por micrometástases ser semelhante, algumas diferenças podem ser identificadas.
Stromal cells may influence tumor development in primary and secundary sites, however, molecular characteristics of bone marrow stromal cells from breast cancer patients are almost unknown. Our aim was to evaluate the differential gene expression of bone marrow stromal cells from breast cancer patients in the presence or abscence of occult tumor cells. Bone marrow (BM) aspirates were obtained from 10 breast cancer patients. The presence of occult bone marrow disseminated tumor cells was detected by CK19 expression quantified by reverse transcriptase polymerase chain reaction (RT-PCR). Presence of tumoral cell was detected in four of ten BM samples. Stromal cells primary cultures were established and samples from two patients with positive lymph nodes and presence of occult tumor cells in bone marrow and samples from two patients with negative lymph nodes and abscence of occult tumor cells in bone marrow were selected. All the included patients were postmenopausal with invasive ductal carcinoma and positive estrogen and progesterone receptors detected by immunohistochemical analysis. Gene profile evaluated in cDNA microarray slides containing 4.608 spotted genes revealed 21 differencially expressed genes, nine upregulated (PTHLH, TLOC1, NCOA6, C17orf57, ANAPC11, MAST4, POLR3E, CPNE1 e B4GALT5) and twelve downregulated (MRPL2, NAT10, DAP, RNF2, FLOT2, FKBP10, SLIT3, EBNA1BP2, SLC35B2, MICAL2, GPR3, TSPAN17) in stromal cell derived from bone marrow in the presence of tumor breast cancer cell. Our data suggest that gene expression from bone marrow derived stromall cells in the presence or abscence of occult tumor cells seems similar, however small differences may be identified.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Lamzin, Sergey. « Computational methods for the analysis of next generation viral sequences ». Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/59666/.

Texte intégral

Résumé :

Recent advances in sequencing technologies have brought a renewed impetus to the development of bioinformatics tools necessary for sequence processing and analysis. Along with the constant requirement to be able to assemble more complex genomes from ever evolving sequencing experiments and technologies there also exists a lack in visually accessible representations of information generated by analysis tools. Most of the novel algorithms, specifically for de novo genome assembly of next generation sequencing (NGS) data, are not able to efficiently handle data generated on large populations. We have assessed the common methods for genome assembly used today both from a theoretical point of view and their practical implementations. In this dissertation we present StarK (stands for k�), a novel assembly algorithm with a new data structure designed to overcome some of the limitations that we observed in established methods enabling higher quality NGS data processing. The StarK approach structurally combines de Brujin graphs for all possible dimensions in one supergraph. Although the technique to join reads remains in concept the same, the dimension k is no longer fixed. StarK is designed in such a way that it allows the assembler to dynamically adjust the de Brujin graph dimension k on the fly and at any given nucleotide position without losing connections between graph vertices or doing complicated calculations. The new graph uses localised coverage difference evaluation to create connected sub graphs which allows higher resolution of genomic differences and helps differentiate errors from potential variants within the sequencing sample. In addition to this we present a bioinformatics analysis pipeline for high-variation viral population analysis (including transmission studies), which, using both new and established methods, creates easily interpretable visual representations of the underlying data analysis. Together we provide a solid framework for biologists for extracting more information from sequencing data with less effort and faster than before.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Zhu, Jun. « Analysis of transmission system faults in the phase domain ». Texas A&M University, 2004. http://hdl.handle.net/1969.1/1061.

Texte intégral

Résumé :

In order to maintain a continuous power suppply, nowadays relays in transmission systems are required to be able to deal with complicated faults involving non-conventional connections, which poses a challenge to the short circuit analysis performed for the data settings of the relay. The traditional sequence domain method has congenital defects to treat such cases, which leads to a trend of using the actual phase domain method in fault calculation. Although the calculation speed of the phase domain method is not so fast and is memory consumable, it perfomrs well when handling complicated faults. Today more and more commercial software involves phase domain calculation in their short circuit analysis to treat complicated cases. With the advanced development of computers, there is a possibility to totally get rid of the sequence method. In this thesis, a short circuit analysis method based on phase domain is developed. After the three sequence admittance matrices of the system are built, all the data are transformed into phase domain to get the phase domain admittance matrix. The following fault calculations are performed purely in phase domain. The test results of different types of faults in 3 bus, 14 bus, and 30 bus transmission systems are presented and compared with the results of a commercial fault analysis software. The validation of this program is also presented.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Atalar, Deniz. « Functional failure sequences in traffic accidents ». Thesis, Loughborough University, 2018. https://dspace.lboro.ac.uk/2134/32727.

Texte intégral

Résumé :

This thesis examines the interactions between road users and the factors that contribute to the occurrence of traffic accidents, and discusses the implications of these interactions with regards to driver behaviour and accident prevention measures. Traffic accident data is collected on a macroscopic level by local police authorities throughout the UK. This data provides a description of accident related factors on a macroscopic level which does not allow for a complete understanding of the interaction between the various road users or the influence of errors made by active road users. Traffic accident data collected on a microscopic level analysis of real world accident data, explaining why and how an accident occurred, can further contribute to a data driven approach to provide safety measures. This data allows for a better understanding of the interaction of factors for all road users within an accident that is not possible with other data collection methods. In the first part of the thesis, a literature review presents relevant research in traffic accident analysis and accident causation research, afterwards three accident causation models used to understand behaviour and factors leading to traffic accidents are introduced. A comparison study of these accident causation coding models that classify road user error was carried out to determine a model that would be best suited to code the accident data according to the thesis aims. Latent class cluster analyses were made of two separate datasets, the UK On the Spot (OTS) in-depth accident investigation study and the STATS19 national accident database. A comparison between microscopic (in-depth) accident data and macroscopic (national) accident data was carried out. This analysis allowed for the interactions between all relevant factors for the road users involved in the accident to be grouped into specific accident segmentations based on the cluster analysis results. First, all of the cases that were collected by the OTS team between the years 2000 to 2003 were analysed. Results suggested that for single vehicle accidents males and females typically made failures related to detection and execution issues, whereas male road users made diagnosis failures with speed as a particularly important factor. In terms of the multiple vehicle accidents the interactions between the first two road users and the subsequent accident sequence were demonstrated. A cluster analysis of all two vehicle accidents in Great Britain in the year 2005 and recorded within the STATS19 accident database was carried out as a comparison to the multiple vehicle accident OTS data. This analysis demonstrated the necessity of in-depth accident causation data in interpreting accident scenarios, as the resulting accident clusters did not provide significant differences between the groups to usefully segment the crash population. Relevant human factors were not coded for these cases and the level of detail in the accident cases did not allow for a discussion of countermeasure implications. An analysis of 428 Powered Two Wheeler accidents that were collected by the OTS team between the years 2000 to 2010 was carried out. Results identified 7 specific scenarios, the main types of which identified two particular looked but did not see accidents and two types of single vehicle PTW accidents. In cases where the PTW lost control, diagnosis failures were more common, for road users other than the PTW rider, detection issues were of particular relevance. In these cases the interaction between all relevant road users was interpreted in relation to one another. The subsequent study analysed 248 Pedestrian accidents that were collected by the OTS team between the years 2000 to 2010. Results identified scenarios related to pedestrians as being in a hurry and making detection errors, impairment due to alcohol, and young children playing in the roadside. For accidents that were initiated by the other road user s behaviour pedestrians were either struck after an accident had already occurred or due to the manoeuvre that a road user was making, older pedestrians were over-represented in this accident type. This thesis concludes by discussing how (1) microscopic in-depth accident data is needed to understand accident mechanisms, (2) a data mining approach using latent class clustering can benefit the understanding of failure mechanisms, (3) accident causation analysis is necessary to understand the types of failures that road users make and (4) accident scenario development helps quantify accidents and allows for accident countermeasure implication discussion. The original contribution to knowledge is the demonstration that when relevant data is available there is a possibility to understand the interactions that are occurring between road users before the crash, that is not possible otherwise. This contribution has been demonstrated by highlighting how latent class cluster analysis combined with accident causation data allows for relevant interactions between road users to be observed. Finally implications for this work and future considerations are outlined.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Qin, Li-Xuan. « The clustering of regression models method with applications in gene expression data / ». Thesis, Connect to this title online ; UW restricted, 2005. http://hdl.handle.net/1773/9591.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

41

Tsang, Yee-man Vivien. « Development of a multilocus sequence typing method for analysis of Laribacter hongkongensis ». Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31972238.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

42

Tsang, Yee-man Vivien, et 曾綺雯. « Development of a multilocus sequence typing method for analysis of Laribacter hongkongensis ». Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31972238.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

43

Figueiredo, Dulce Sachiko Yamamoto de. « Identificação fenotípica e molecular, perfil de suscetibilidade aos antifúngicos e detecção de glucuronoxilomanana em isolados clínicos de Trichosporon ». Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/5/5133/tde-24022014-161059/.

Texte intégral

Résumé :

Infecções invasivas por Trichosporon spp. ocorrem com maior frequência em pacientes neutropênicos, principalmente portadores de doenças hematológicas malignas, e estão associadas a elevados índices de mortalidade devido às dificuldades na identificação do patógeno e à resistência aos fármacos mais empregados na terapêutica antifúngica. A identificação das espécies de Trichosporon é importante tanto para estudos epidemiológicos, como para associar aspectos clínicos com as espécies causadoras das infecções. Além disso, auxilia no tratamento da enfermidade, uma vez que a suscetibilidades aos fármacos antifúngicos pode variar de acordo com a espécie. Além disso, as leveduras do gênero Trichosporon sintetizam a glucuronoxilomanana (GXM) em sua parede celular, que pode estar envolvida no mecanismo de virulência do patógeno. Este estudo teve como objetivo determinar, por identificação fenotípica e molecular, espécies isoladas de pacientes internados em unidades hospitalares, comparando os resultados obtidos por ambos os métodos; avaliar diferenças na distribuição dessas espécies em relação às formas invasivas e não invasivas da infecção; determinar o perfil de suscetibilidade dessas leveduras aos antifúngicos, empregando um método de micro-diluição de referência e um método comercial; e avaliar a presença de GXM na parede celular dos isolados. Foram avaliados 74 isolados obtidos de amostras clínicas de pacientes do Hospital das Clinicas da FMUSP e de outras unidades hospitalares do Estado de São Paulo, no período de 2003 a 2011. Dezenove amostras foram isoladas de sítios estéreis do organismo (infecções invasivas) e 55 foram isoladas de urina e cateter (isolados não invasivos). Para a identificação das espécies, os isolados foram submetidos a análises fenotípicas, que incluíram estudo macro e micromorfológico, provas fisiológicas e avaliação do perfil bioquímico por sistema automatizado VITEK 2. A identificação molecular foi realizada pelo sequenciamento das regiões IGS e D1/D2 do DNA ribossomal. O perfil de suscetibilidade dos 74 isolados foi analisado pelo método de micro-diluição EUCAST (referência) com os fármacos fluconazol (FCZ), itraconazol (ITZ), voriconazol (VCZ), cetoconazol (CTZ), anfotericina B (AMB) e 5-fluocitosina (5FC); e pelo método de micro-diluição comercial Sensititre YeastOne, com os mesmos fármacos empregados no EUCAST, acrescidos do posaconazol (POS) e caspofungina (CAS). Os valores das concentrações inibitórias mínimas (CIM), erros categórico e essencial, bem como outros parâmetros foram comparados entre os dois métodos. A presença de GXM na parede celular dos 74 isolados foi determinada por citometria de fluxo, empregando anticorpo monoclonal anti-GXM. Os resultados dos estudos morfológicos e fisiológicos foram insuficientes para definir as espécies dos 74 isolados. Pela assimilação de carboidratos analisada pelo sistema VITEK 2, verificou-se que 71 isolados foram identificados como T. asahii (17 de infecção invasiva e 54 não invasivos), um isolado como T. mucoides (invasivo), e para dois isolados (um invasivo e um não invasivo), a identificação não foi conclusiva. Para estes últimos foi realizado o auxanograma (método manual), e a identificação permaneceu inconclusiva, pois pelo perfil de assimilação, os isolados poderiam ser identificados como T. asahii ou T. faecale. Pela técnica de sequenciamento, 62 dos 74 isolados foram identificados como T. asahii, demonstrando 82,4% de concordância com o sistema VITEK 2. Onze isolados com identificações discordantes pertenciam às espécies T. inkin (8), T. faecale (2) e T. dermatis (1), como determinado por sequenciamento. Dos dois isolados com identificação inconclusiva pelo VITEK 2, um foi identificado pela técnica molecular como T. asahii, enquanto para o outro isolado não foi possível definir a espécie. Portanto, dos 74 isolados do estudo, 62 foram identificados como T. asahii, 8 como T. inkin, 2 como T. faecale e 1 T. dermatis; dois isolados permaneceram sem identificação conclusiva. Os resultados dos testes de suscetibilidade in vitro mostraram que, em ambos os métodos, VCZ apresentou a melhor atividade antifúngica. Pelo método EUCAST, foram obtidos valores elevados de CIM para AMB, enquanto o mesmo não foi observado no teste comercial. Neste último, foram observados valores elevados de CIM para FCZ, POS e CAS. Em relação à 5FC, os valores de CIM 90% por ambos os testes foram elevados (16mg/L). Diferenças significantes foram observadas entre os valores de CIM obtidas pelos dois métodos, e percentuais relativamente elevados de erros categóricos graves quando o método comercial foi comparado ao de referência. Não houve diferença estatística significante de valores de CIM entre isolados de infecção invasiva e não invasiva, exceto para ITZ e 5FC. Cerca de 30% dos isolados obtidos de casos de infecção invasiva e não invasivos apresentaram resistência cruzada entre os azóis FCZ e VCZ, e uma pequena porcentagem apresentou multirresistência. Para a análise de GXM na parede celular dos 74 isolados do estudo, foi avaliada a intensidade de fluorescência emitida pela citometria de fluxo, não tendo sido observada diferença estatística significante entre isolados invasivos e não invasivos. O estudo permitiu concluir que T. asahii foi a espécie mais isolada das amostras clínicas obtidas de sítios estéreis e não estéreis. A metodologia clássica de identificação fenotípica não foi suficiente para definir as espécies do gênero Trichosporon, e o sistema VITEK 2 apresentou discordância quando comparado à técnica molecular para as espécies não T. asahii. Em relação aos testes de suscetibilidade in vitro, VCZ apresentou-se mais adequado para a inibição das leveduras, enquanto os fármacos AMB, FCZ e POS não foram eficazes para a maior parte dos isolados. As discordâncias encontradas entre o método de referência e o comercial sugerem que, para o segundo, são necessárias mais avaliações para seu emprego em rotina laboratorial para o gênero Trichosporon. A detecção de GXM não resultou em diferenças entre os isolados de ambos os grupos; no entanto, para se determinar o efeito protetor do polissacarídeo contra a ação de macrófagos, ensaios de fagocitose devem ser realizados
Invasive Trichosporon spp. infections occur more frequently in neutropenic patients, especially those with hematologic malignancies, and are associated with high mortality rates due to difficulties in identifying the pathogen and treating patients with drugs most currently employed in antifungal therapy. Trichosporon species identification is important for epidemiological studies and to better define eventual species-specific clinical association. Additionally, antifungal susceptibility may vary according to the species. Furthermore, glucuronoxylomannan (GXM) is a cell wall-associated polysaccharide produced by genus Trichosporon, which may be involved in virulence mechanisms of this pathogen. This study aimed (i) to identify Trichosporon species isolated from hospitalized patients by both phenotypic and molecular methods, comparing results; (ii) to verify the distribution of these species in invasive and non-invasive infection episodes; (iii) to determine the in vitro activities of various antifungals agents against the Trichosporon spp. isolates, employing a reference micro-dilution method and a commercial system; (iv) and to analyze the surface expression of GXM. Seventy-four Trichosporon spp. isolates obtained from clinical specimens of patients admitted to the Hospital das Clínicas-FMUSP and to other hospitals in the state of São Paulo, from 2003 to 2011, were included in the study. Nineteen samples were isolated from sterile deep sites (invasive infections) and 55 were isolated from catheter and urine samples (non-invasive isolates). All isolates were submitted to phenotypic analysis, which consisted in morphological features observation, physiological tests and determination of the biochemical profile by VITEK 2 system. Molecular identification was performed by sequencing of IGS1 and D1/D2 regions from the ribosomal DNA. The susceptibility antifungal profiles of the 74 isolates were analyzed by both the EUCAST micro-dilution method (reference) employing fluconazole (FCZ), itraconazole (ITZ), voriconazole (VCZ), ketoconazole (CTZ), amphotericin B (AMB) and 5 - flucytosine (5FC), and the commercial micro-dilution test Sensititre YeastOne, with the same drugs employed in EUCAST plus posaconazole (POS) and caspofungin (CAS). The minimum inhibitory concentration values (MIC), categorical and essential errors as well as other susceptibility parameters were compared between both methods. The cell wall expression of GXM of all isolates was measured by flow cytometry employing an anti-GXM monoclonal antibody. The morphological and physiological features of the Trichosporon spp. isolates were insufficientto define species. The carbohydrate assimilation analysis, performed by VITEK 2 system, has resulted in 71 isolates identified as T. asahii (17 from invasive infections and 54 non-invasive isolates) and one isolate as T. mucoides (invasive). The species identification for the two remaining isolates (one invasive and one non-invasive) was inconclusive. For this reason, a manual auxanogram was performed with these isolates, resulting again in non-conclusive species identification. By the automated sequencing method, 62 of the 74 isolates were identified as T. asahii, showing 82.4% of agreement with the VITEK 2 identification. Eleven isolates were identified by sequencing as T. inkin (8), T. faecale (2) and T. dermatis (1), showing disagreement identification with the VITEK 2 system. Regarding the two isolates with inconclusive results by the carbohydrate assimilation, the molecular technique identified one as T. asahii, whereas for the other isolate the sequencing was also unable to define species. Therefore, among the 74 studied isolates, 62 were identified as T. asahii, eight as T. inkin, two as T. faecale and one as T. dermatis; and two isolates remained with unconclusive identification. Almost all Trichosporon spp. isolates displayed susceptibility to VCZ with both methods. By the EUCAST method, high values of MIC were observed for AMB, while by the commercial test especially the invasive isolates showed susceptibility to this drug. Additionally, the Sensititre kit provided elevated MIC values for FCZ, POS and CAS. In regards to 5FC, the MIC 90% values were consistently high (16 mg/L) in both methodologies. The MIC values obtained by both EUCAST and commercial methods were compared, resulting in significant differences of MIC values for all tested antifungal drugs; major categorical errors occurred at relatively high percentage with the commercial method. No statistically significant differences in MIC values were verified when invasive and non-invasive isolates were compared. Around 30% of both invasive and non-invasive isolates showed cross-resistance to FCZ and VCZ, while a small number of isolates was multiresistant. The GXM analysis by cytometry demonstrated no significant differences between invasive and non- invasive isolates. This study demonstrated that T. asahii was the most frequently isolated species from both deep and non-sterile sites of the patients. The classical phenotypic methodology was not able to define Trichosporon species, and the VITEK 2 system identification showed disagreement with the sequencing technique for the non-T. asahii species. Regarding the in vitro susceptibility tests, VCZ was the most effective drug against the isolates, whereas most of them appear to be less susceptible to AMB, FCZ and POS. The discrepancies in the Trichosporon spp. susceptibility results between the reference and commercial methods suggest that the latter requires further evaluation tests before it can be used in routine laboratory. Although the GXM expression seemed to be equal in both invasive and non-invasive Trichosporon spp. isolates, phagocytic assays should be performed in order to determine the protective effect of the polysaccharide against phagocytosis

Styles APA, Harvard, Vancouver, ISO, etc.

44

Khouri, Raoul-Emil Roger. « Two-photon calcium imaging sequence Analysis Pipeline : a method for analyzing neuronal network activity ». Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119748.

Texte intégral

Résumé :

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 73).
Investigating the development of neuronal networks can help us to identify new therapies and treatments for conditions that affect the brain, such as autism and Alzheimer's disease. Two-photon calcium imaging has been a powerful tool for the investigation of the development of neuronal networks. However, one of the major challenges of working with two-photon calcium images is processing the large data sets, which often requires manual analysis by a skilled researcher. Here, we introduce a machine learning (ML) pipeline for the analysis of two-photon calcium image sequences. This semi-autonomous ML pipeline includes proposed methods for automatically identifying neurons, signal extraction, signal processing, event detection, feature extraction, and analysis. We run our ML pipeline on a dataset of two-photon calcium image sequences extracted by our team. This dataset includes two-photon calcium image sequences of spontaneous network activity from primary cortical cultures of Mecp2-deficient and wild-type mice. Loss-of-function mutation in the MECP2 gene, causes 95% of Rett syndrome cases and some cases of autism. We evaluate our ML pipeline using this dataset. Our ML pipeline reduces the time required to analyze two-photon calcium images from over 10 minutes to about 30 seconds per sample. Our goal is to accelerate the analysis of neuronal network function to aid in our understanding of neurological disorders and the identification of novel therapeutic targets.
by Raoul-Emil Roger Khouri.
M. Eng.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Stebbing, Richard. « Model-based segmentation methods for analysis of 2D and 3D ultrasound images and sequences ». Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:f0e855ca-5ed9-4e40-994c-9b470d5594bf.

Texte intégral

Résumé :

This thesis describes extensions to 2D and 3D model-based segmentation algorithms for the analysis of ultrasound images and sequences. Starting from a common 2D+t "track-to-last" algorithm, it is shown that the typical method of searching for boundary candidates perpendicular to the model contour is unnecessary if, for each boundary candidate, its corresponding position on the model contour is optimised jointly with the model contour geometry. With this observation, two 2D+t segmentation algorithms, which accurately recover boundary displacements and are capable of segmenting arbitrarily long sequences, are formulated and validated. Generalising to 3D, subdivision surfaces are shown to be natural choices for continuous model surfaces, and the algorithms necessary for joint optimisation of the correspondences and model surface geometry are described. Three applications of 3D model-based segmentation for ultrasound image analysis are subsequently presented and assessed: skull segmentation for fetal brain image analysis; face segmentation for shape analysis, and single-frame left ventricle (LV) segmentation from echocardiography images for volume measurement. A framework to perform model-based segmentation of multiple 3D sequences - while jointly optimising an underlying linear basis shape model - is subsequently presented for the challenging application of right ventricle (RV) segmentation from 3D+t echocardiography sequences. Finally, an algorithm to automatically select boundary candidates independent of a model surface estimate is described and presented for the task of LV segmentation. Although motivated by challenges in ultrasound image analysis, the conceptual contributions of this thesis are general and applicable to model-based segmentation problems in many domains. Moreover, the components are modular, enabling straightforward construction of application-specific formulations for new clinical problems as they arise in the future.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Tan, Angela Y. C. « The development of an efficient method of mitochondrial DNA analysis ». Monash University, Dept. of Forensic Medicine, 2003. http://arrow.monash.edu.au/hdl/1959.1/9525.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

47

Lan, Yang. « Computational Approaches for Time Series Analysis and Prediction. Data-Driven Methods for Pseudo-Periodical Sequences ». Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/4317.

Texte intégral

Résumé :

Time series data mining is one branch of data mining. Time series analysis and prediction have always played an important role in human activities and natural sciences. A Pseudo-Periodical time series has a complex structure, with fluctuations and frequencies of the times series changing over time. Currently, Pseudo-Periodicity of time series brings new properties and challenges to time series analysis and prediction. This thesis proposes two original computational approaches for time series analysis and prediction: Moving Average of nth-order Difference (MANoD) and Series Features Extraction (SFE). Based on data-driven methods, the two original approaches open new insights in time series analysis and prediction contributing with new feature detection techniques. The proposed algorithms can reveal hidden patterns based on the characteristics of time series, and they can be applied for predicting forthcoming events. This thesis also presents the evaluation results of proposed algorithms on various pseudo-periodical time series, and compares the predicting results with classical time series prediction methods. The results of the original approaches applied to real world and synthetic time series are very good and show that the contributions open promising research directions.

Styles APA, Harvard, Vancouver, ISO, etc.

48

Carls, Stefan. « Optimization of pyrosequencing method for copy number analysis of CYP2D6 ». Thesis, Uppsala universitet, Institutionen för kvinnors och barns hälsa, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324758.

Texte intégral

Résumé :

CYP2D6, a member of the cytochrome P450 enzyme system, has a central role in drug metabolism, it metabolizes 25 % of clinically used drugs. The gene that codes for the enzyme displays a high degree of polymorphism, which effects enzyme functions to various degrees. Aside from smaller mutations like SNPs, alleles may also feature duplications or deletion of the whole gene. Due to the clinical relevance of these mutations, a simple and precise method for genotyping is needed. In this study, a method based on pyrosequencing for copy number analysis was evaluated, wherein the copy number was determined by relative quantification to a reference gene CYP2D8P. During evaluation of the method, several adjustments were tried for optimization, including adjustments of annealing temperature and primer concentration. The results showed a difficulty in distinguishing between copy numbers using the method, as well as a high coefficient of variation. Therefore, further optimization is required before the method could be implemented into clinical practice.

Styles APA, Harvard, Vancouver, ISO, etc.

49

Liu, Kwong Ip. « Digital net experimental designs, function interpolations using low discrepancy sequence and goodness of fit tests by discrepancy ». HKBU Institutional Repository, 2007. http://repository.hkbu.edu.hk/etd_ra/807.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

50

Paci, Giulia. « Statistical methods for the analysis of DNA sequences : application to dinucleotide distribution in the human genome ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7615/.

Texte intégral

Résumé :

Questa tesi si inserisce nell'ambito delle analisi statistiche e dei metodi stocastici applicati all'analisi delle sequenze di DNA. Nello specifico il nostro lavoro è incentrato sullo studio del dinucleotide CG (CpG) all'interno del genoma umano, che si trova raggruppato in zone specifiche denominate CpG islands. Queste sono legate alla metilazione del DNA, un processo che riveste un ruolo fondamentale nella regolazione genica. La prima parte dello studio è dedicata a una caratterizzazione globale del contenuto e della distribuzione dei 16 diversi dinucleotidi all'interno del genoma umano: in particolare viene studiata la distribuzione delle distanze tra occorrenze successive dello stesso dinucleotide lungo la sequenza. I risultati vengono confrontati con diversi modelli nulli: sequenze random generate con catene di Markov di ordine zero (basate sulle frequenze relative dei nucleotidi) e uno (basate sulle probabilità di transizione tra diversi nucleotidi) e la distribuzione geometrica per le distanze. Da questa analisi le proprietà caratteristiche del dinucleotide CpG emergono chiaramente, sia dal confronto con gli altri dinucleotidi che con i modelli random. A seguito di questa prima parte abbiamo scelto di concentrare le successive analisi in zone di interesse biologico, studiando l’abbondanza e la distribuzione di CpG al loro interno (CpG islands, promotori e Lamina Associated Domains). Nei primi due casi si osserva un forte arricchimento nel contenuto di CpG, e la distribuzione delle distanze è spostata verso valori inferiori, indicando che questo dinucleotide è clusterizzato. All’interno delle LADs si trovano mediamente meno CpG e questi presentano distanze maggiori. Infine abbiamo adottato una rappresentazione a random walk del DNA, costruita in base al posizionamento dei dinucleotidi: il walk ottenuto presenta caratteristiche drasticamente diverse all’interno e all’esterno di zone annotate come CpG island. Riteniamo pertanto che metodi basati su questo approccio potrebbero essere sfruttati per migliorare l’individuazione di queste aree di interesse nel genoma umano e di altri organismi.

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Sequence analysis methods »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres