To see the other types of publications on this topic, follow the link: Genomic classification.

Dissertations / Theses on the topic 'Genomic classification'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Genomic classification.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Pfaff, Florian [Verfasser]. "Expanding the virosphere : advanced genomic classification / Florian Pfaff." Greifswald : Universitätsbibliothek Greifswald, 2017. http://d-nb.info/114441251X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sonnhammer, Erik Leonard Laage. "Classification of protein domain families for genomic sequence analysis." Thesis, Open University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.336799.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Stone, Thomas John. "Genomic classification and analysis of epilepsy-associated glioneuronal tumours." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/10037593/.

Full text
Abstract:
INTRODUCTION: Glioneuronal tumours are a group of low-grade epilepsy-associated tumours with marked variability in their histological features, resulting in a lack of diagnostic consensus between institutions. This is confounded by a dearth of knowledge regarding their underlying biology, and subsequent lack of robust biologically informed diagnostic tools. This lack of understanding also impedes the development of novel and targeted treatment strategies. METHODS: I have undertaken a comprehensive molecular analysis of the most prevalent glioneuronal tumours: ganglioglioma and dysembryoplastic neuroepithelial tumours. I have used RNA sequencing and Illumina 450K methylation arrays to classify tumours in an unsupervised manner according to their genomic profiles. I then carried out in silico analyses on these datasets to identify genes, gene networks, and pathways that are differentially regulated between groups. Additionally, I have undertaken molecular assays to identify mutations that are specific to each group. Finally, I have used immunohistochemistry to assess a number of potential diagnostic markers revealed by expression profiling. RESULTS: Unsupervised clustering revealed glioneuronal tumours classify into two molecular groups (termed Group 1 and Group 2), which are only partially consistent with histological classification. Group 1 is defined by an astrocytic expression phenotype and an enrichment for BRAF-V600E mutations. Group 2 is defined by an oligodendrocyte precursor phenotype and an enrichment for FGFR1 mutations. A number of disease relevant networks and pathways are differentially regulated between these groups. Additionally, immunohistochemistry against Cyclin-D1 and PDGFRα can be used to distinguish tumour groups from one another. CONCLUSION: This is the first comprehensive genomic investigation of a large cohort of glioneuronal tumours without prior histological bias. I present data suggesting the current histological classification of these lesions is insufficient, and recommend a novel biologically informed strategy. My results also provide insight into the pathways underlying the development of these tumours. This information may assist in the development of novel treatment strategies.
APA, Harvard, Vancouver, ISO, and other styles
4

Hua, Jianping. "Topics in genomic image processing." Texas A&M University, 2004. http://hdl.handle.net/1969.1/3244.

Full text
Abstract:
The image processing methodologies that have been actively studied and developed now play a very significant role in the flourishing biotechnology research. This work studies, develops and implements several image processing techniques for M-FISH and cDNA microarray images. In particular, we focus on three important areas: M-FISH image compression, microarray image processing and expression-based classification. Two schemes, embedded M-FISH image coding (EMIC) and Microarray BASICA: Background Adjustment, Segmentation, Image Compression and Analysis, have been introduced for M-FISH image compression and microarray image processing, respectively. In the expression-based classification area, we investigate the relationship between optimal number of features and sample size, either analytically or through simulation, for various classifiers.
APA, Harvard, Vancouver, ISO, and other styles
5

Saluja, Sunil K. (Sunil Kumar) 1968. "A computational framework for the identification, cataloging, and classification of evolutionary conserved genomic DNA." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28590.

Full text
Abstract:
Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2004.
Includes bibliographical references (leaves 27-29).
Evolutionarily conserved genomic regions (ecores) are understudied, and yet comprise a very large percentage of the Human Genome. Highly conserved human-mouse non-coding ecores, for example, are more abundant within the Human Genome than those regions, which are currently estimated to encode for proteins. Subsets of these ecores also exhibit conservation that extends across several species. These genomic regions have managed to survive millions of years of evolution despite the fact that they do not appear to directly encode for proteins. The survival of these regions compels us to investigate their potential function. Development of a computational framework for the classification and clustering of these regions may be the first step in understanding their function. The need for a standardized framework is underscored by the explosive growth in the number of publicly available, fully sequenced genomes, and the diverse set of methodologies used to generate cross-species alignments. This project describes the design and implementation of a system for the identification, classification and cataloguing of ecores across multiple species. A key feature of this system is its ability to quickly incorporate new genomes and assemblies as they become available. Additionally, this system provides investigators with a feature rich user interface, which facilitates the retrieval of ecores based on a wide range of parameters. The system returns a dynamically annotated list of evolutionarily conserved regions, which is used as input to several classification schemes, aimed at identifying families of ecores that share similar features, including depth of evolutionary conservation, position relative to known genes, sequence similarity,
(cont.) and content of transcription factor binding sites. Families of ecores have already been retrieved by the system and clustered using this feature space, and are currently awaiting biological validation.
by Sunil K. Saluja.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
6

Sharma, Jason P. (Jason Poonam) 1979. "Classification performance of support vector machines on genomic data utilizing feature space selection techniques." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Stagni, Camilla. "Genomic analysis in cutaneous melanoma: a tool for predictive biomarker identification and molecular classification." Doctoral thesis, Università degli studi di Padova, 2017. http://hdl.handle.net/11577/3426683.

Full text
Abstract:
Project 1. Identification of molecular signatures associated with response to MAPK inhibitors. BRAF V600-mutated melanoma benefits from MAPK inhibitors-based therapy. Yet, the onset of resistance impacts long-term efficacy and can even be immediate. In this study, we examined the genetic alterations characterizing melanoma progression to identify predictive factors of response to MAPK inhibitors (MAPKi). Specifically, we evaluated BRAF copy number variation (CNV), BRAF mutant (BRAFmut) allele frequency, PTEN loss or mutations and TERT promoter mutations in pre-treatment melanoma specimens from MAPKi-treated patients (pts) and we analyzed their association with progression free survival (PFS). We also applied a comprehensive unbiased approach, using genome-wide CNV analysis, to identify additional genomic aberrations potentially associated with response to therapy. We found that 65% pts displayed BRAF gains, often supported by chromosome 7 polysomy. In addition, we observed that 64% pts had a balanced BRAF mutant/wild-type allele ratio, while 14% and 23% pts had low and high BRAFmut allele frequency, respectively. Notably, a significantly higher risk of progression was observed in pts with a diploid BRAF status vs. those with BRAF gains (HR = 2.86; 95% CI 1.29‒6.35; p = 0.01) and in pts with low vs. those with a balanced BRAFmut allele percentage (HR = 4.54, 95% CI 1.33‒15.53; p = 0.016). We identified PTEN gene mutations affecting the catalitic and C2 domains in 27% pts. Moreover, we observed a complete PTEN loss in 42% pts, partial loss in 35% pts and no loss in 23% pts. Of note, we found PTEN loss also in pre-treatment samples from pts with long PFS. Sequencing of TERT promoter gene disclosed mutations in 78% pts. The -124C>T and the -146C>T mutations were equally frequent (36%) while the -138-139CC>TT was present only in 5% pts. Fifty-one % pts carried also the neighboring polymorphism rs2853669, which reportedly counteracts the activating effect of the above-mentioned mutations on TERT expression. Upon stratification of the TERT promoter mutant cohort based on presence/absence of the polymorphism, TERTmutant/SNPcarrier pts showed a trend toward better PFS (median PFS 11.5 mo., 95% CI 3.12‒19.88) compared to TERTmutant/SNPnon-carrier pts (median PFS 7 mo., 95% CI 4.27‒9.72). When stratifying based on mutation type, the -146C>T mutation correlated with shorter PFS (median PFS 5.45 mo., 95% CI 2.80‒9.20) compared to the -124C>T one (median PFS 15.2 mo., 95% CI 5.57‒). Genome-wide CNV analysis pointed at chr3p24, chr3p21.2 and chr17p13.1, which are differently alterated between pts with long and short time to disease progression, as regions of potential interest to identify new genes involved in therapeutic resistance. Our data suggest that quantitative analysis of the BRAF gene and sequencing of the TERT promoter gene could be useful to select the melanoma pts who are most likely to benefit from MAPKi therapy. In addition, chromosome 3 and 17 could be regions that warrant further investigation. Conversely, because PTEN loss was present in pre-treatment samples from pts with both short and long PFS, the assessment of PTEN gene status does not seem to provide information about patient responsiveness to treatment. Project 2. Research of molecular biomarkers to classify the acral melanoma. Acral lentiginous melanoma (ALM) is a rare subtype of cutaneous melanoma with specific morphological, epidemiological, and genetic features. Since the genomic landscape of ALM is still incompletely described, we used whole genome CNV analysis to characterize ALM and detail the genomic signatures that differentiate ALM from non-acral melanoma (NAM). We observed that the most strikingly different copy number aberrations were a higher frequency of losses of chromosome 16q24.2-16q24.3 in ALM than in NAM (64.7% vs. 10%) and a lower frequency of gains of chromosome 7q21.2-7q33 in ALM than in NAM (26.5% vs.79.5%). We observed also that ALM more often (than NAM) harbored clusters of breakpoints and isochromosomes. Moreover, in ALM we identified focal amplification of TERT, CCND1, MDM2 and MITF. In NAM, instead, we found only two focal amplifications, involving BRAF and MITF. Focal homozygous copy losses affected especially the CDKN2A and PTEN genes, both in ALM and in NAM, even though they were more frequent in the latter group. In keeping with previous observations that led to classify ALM as a distinct molecular subtype of melanoma, we observed a peculiar genomic landscape in ALM (vs. NAM). Our study provides insights into the molecular characteristics of ALM, which is key to full elucidation of its pathogenesis.
Progetto 1: identificazione di signatures molecolari associate alla risposta al trattamento con inibitori del MAPK pathway. I melanomi portatori di una mutazione nel codone V600 del gene BRAF rispondono agli inibitori del MAPK pathway, ma l’efficacia a lungo termine di questa terapia è limitata dallo sviluppo di resistenza, talvolta immediata. In questo studio, abbiamo esaminato le alterazioni molecolari caratterizzanti la progressione del melanoma al fine di identificare fattori predittivi di risposta/resistenza ai MAPK-inibitori. Nello specifico, su una serie di campioni pretrattamento di pazienti affetti da melanoma, trattati con MAPK-inibitori, abbiamo valutato numero di copie del gene BRAF e percentuale di allele V600-mutato, delezione e mutazioni di PTEN, alterazioni del promotore di TERT, e ne abbiamo analizzato l’associazione con la risposta dei pazienti alla terapia. Inoltre, abbiamo determinato il copy number variation dell’intero genoma dei nostri campioni per individuare ulteriori aberrazioni non note potenzialmente associate con la risposta alla terapia. Abbiamo identificato un numero aumentato di copie (gain) del gene BRAF, spesso dovuto a polisomia del cromosoma 7, nel 65% dei pazienti; l’allele mutato è stato trovato in una percentuale compresa tra il 35% e il 65% nel 64% dei pazienti, inferiore al 35% nel 14% dei pazienti e superiore al 65% nel 23% dei pazienti. Dall’analisi di sopravvivenza, è risultato che i pazienti con BRAF diploide o una percentuale di allele mutato inferiore al 35% presentano un più alto rischio di progressione rispetto a coloro che presentano gain di BRAF (HR=2.86; 95% CI 1.29-6.35; p=0.01) o tra il 35% e il 65% di allele mutato (HR=4.54,CI 1.33-15.53; p=0.016), rispettivamente. L’analisi di PTEN ha rivelato la presenza di mutazioni nel 27% dei pazienti, localizzate a livello dei domini catalitico e C2 della proteina codificata; inoltre, il 42% dei casi valutati mostrava una delezione completa del gene, il 35% una delezione parziale, mentre nel 23% dei pazienti non è stata individuata alcuna aberrazione di PTEN. Da notare, delezioni di PTEN sono emerse sia nei casi di melanoma resistente alla terapia, che in quelli che avevano risposto a lungo. Il sequenziamento del promotore del gene TERT ha permesso l’identificazione di mutazioni nel 78% dei pazienti. Le mutazioni -124C>T e -146C>T mostravano la stessa frequenza (36%) nella nostra coorte, mentre la -138-139CC>TT è stata individuata solo nel 5% dei casi. Il 51% dei pazienti presentava inoltre lo SNP rs2853669, noto per contrastare l’effetto attivante delle mutazioni sull’espressione di TERT. Stratificando la coorte di pazienti mutati in base alla presenza/assenza del polimorfismo, i pazienti TERT mutati/SNP carriers mostravano un trend verso una migliore PFS (PFS mediana 11.5 mesi, 95% CI 3.12-19.88) rispetto ai TERT mutati/SNP non-carriers (PFS mediana 7 mesi, 95% CI 4.27-9.72). La mutazione -146C>T, inoltre, correlava con PFS più breve (PFS mediana 5.45 mesi, 95% CI 2.80-9.20) rispetto alla -124C>T (PFS mediana 15.2 mesi, 95% CI 5.57-). Dall’analisi del copy number variation (CNV) sull’intero genoma, le regioni chr3p24, chr3p21.2 e chr17p13.1 hanno mostrato pattern di alterazioni diverse in pazienti responsivi vs. non-responsivi alle terapie; risultano pertanto regioni di potenziale interesse per l’individuazione di nuovi geni coinvolti nella resistenza alla terapia. I nostri dati suggeriscono dunque che l’analisi quantitativa del gene BRAF e il sequenziamento del promotore di TERT costituiscono un utile strumento di selezione dei pazienti con maggiore probabilità di rispondere alla terapia con MAPK-inibitori, contrariamente alla valutazione dello status di PTEN. L’analisi genome-wide, invece, indica di approfondire lo studio dei cromosomi 3 e 17. Progetto 2: ricerca di marcatori biomolecolari per la classificazione del melanoma acrale. Il melanoma acrale lentigginoso è un raro sottotipo di melanoma cutaneo con specifiche caratteristiche morfologiche, epidemiologiche e genetiche. Poiché il genoma del melanoma acrale non è ancora stato pienamente caratterizzato, ne abbiamo analizzato il CNV per individuare quei caratteri genomici peculiari che lo differenziano dal melanoma non acrale. La nostra analisi genome-wide ha evidenziato una maggiore frequenza di delezioni della regione 16q24.2-16q24.3, gains meno frequenti nella regione 7q21.2-7q33, una più accentuata frammentazione genomica e numerosi isocromosomi come caratteri che distinguono il melanoma acrale dal non acrale. Abbiamo inoltre identificato amplificazioni focali nei geni TERT, CCND1, MDM2 e MITF, più rare nei non acrali, laddove interessavano altri geni, come BRAF e MITF. Delezioni focali sono state individuate soprattutto nei geni CDKN2A e PTEN in entrambi i sottotipi di melanoma, anche se più frequenti nei non acrali. I nostri dati, in accordo con il classificare il melanoma acrale come tipo distinto di melanoma, hanno consentito di delinearne alcune delle peculiarità genomiche, chiave per elucidarne anche la patogenesi.
APA, Harvard, Vancouver, ISO, and other styles
8

McConechy, Melissa. "PPP2R1A mutations in gynaecologic cancers: functional characterization and use in the genomic classification of tumours." Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/52829.

Full text
Abstract:
Endometrial carcinoma is the most common gynaecological cancer in developed countries. The current endometrial pathologic classification system lacks reproducibility, which has hampered the development of new treatments for these cancers. The PP2A phosphatase complexes are responsible for regulating many cellular pathways, and may play a role in the deregulation of endometrial cancer-associated pathways. In this thesis, the role of PPP2R1A mutations in the subtype-specific classification of gynaecological tumours was investigated. Additionally, mutational profiles will be used to improve the classification of the subtypes of endometrial carcinomas. Lastly, the functional effect of mutant PPP2R1A on PP2A-subunit protein interactions will be determined, in the context of endometrial cancer cell lines. Next-generation and Sanger sequencing was used to determine the presence of mutations in endometrial and ovarian carcinomas. PPP2R1A isogenic endometrial-specific cell lines were generated using somatic cell gene knockout by homologous recombination. Co-immunoprecipitation and mass spectrometry was used to determine effects of the PPP2R1A W257L mutation on its ability to interact with PP2A subunits. Subtype-specific somatic PPP2R1A mutations were identified in endometrial serous carcinomas. Low-grade endometrial endometrioid carcinomas were defined by mutations in the genes: ARID1A, PTEN, PIK3CA, CTNNB1, and KRAS, whereas high-grade endometrioid also harbor TP53 mutations. Endometrial serous carcinomas harbor mutations in PPP2R1A, FBXW7, PIK3CA and TP53. Consequently, the molecular profiles proved useful in assisting classification of tumours with overlapping morphological features that cause irreproducibility in diagnoses. Proteomic analysis of isogenic cell lines determined that the PPP2R1A W257L mutation disrupts interaction with PPP2R5C and PPP2R5D subunits. In addition, PPP2R1A mutated protein caused an increased interaction with the endogenous PP2A inhibitor SET/I2PP2A. The integration of mutational profiles and other genomic features will be used to improve clinical and pathological classification in endometrial tumours that are difficult to diagnose. PPP2R1A mutations are likely playing a role in the transformation of gynaecological carcinoma, by disrupting PP2A subunit interactions with tumour suppressor functions. Increased interaction of mutant PPP2R1A with SET/I2PP2A adds another layer of complexity to the tumour suppressive role of PP2A. In the future, targeting the PP2A complex with novel therapeutics could provide an alternative method for treating gyneacological cancers with poor outcomes.
Medicine, Faculty of
Pathology and Laboratory Medicine, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
9

Marisa, Laetitia. "Classification et caractérisation des cancers colorectaux par approches omiques." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066235/document.

Full text
Abstract:
Le cancer du côlon (CC) est l'un des cancers les plus fréquents et les plus mortels en France et dans le monde. Près de la moitié des patients décèdent dans les 5 ans suivant le diagnostic. La classification clinique en stade histologique et la classification moléculaire selon les formes d'instabilité du génome (l'instabilité des microsatellites (MSI), l'instabilité chromosomique (CIN) et l'hyperméthylation des promoteurs (CIMP)) ne suffisent pas à définir des entités homogènes du point de vue moléculaire et à prédire de manière efficace la récidive. Pour améliorer la prise en charge des patients, il apparaît indispensable de mieux appréhender la diversité de la maladie afin de trouver des marqueurs pronostiques et prédictifs efficaces. Mon travail de thèse a donc été d'étudier la diversité des CC à l'échelle moléculaire par l'utilisation d'approches omiques sur une large cohorte de patients. Il a abouti à l'établissement d'une classification transcriptomique robuste de ce cancer dans son ensemble, validée sur des données indépendantes, et à la caractérisation fine de chacun des sous-types. Six sous-types ont ainsi été définis présentant des caractéristiques clinico-pathologiques, des altérations moléculaires de l'ADN, des enrichissements de signatures liées aux lésions et cellules d'origines, des voies de signalisation dérégulées et des survies bien distinctes. Les résultats de ce travail ont été confortés par un travail de classification consensus mis en place avec un consortium de travail international auquel j'ai participé. Ces résultats ont permis de confirmer que le cancer colorectal n'est pas une maladie homogène. Ils ouvrent de nouvelles perspectives pour l'établissement de signatures pronostiques et la recherche de cibles pour de nouveaux traitements ainsi que pour l'évaluation de la réponse au traitement au sein d'essais cliniques
Colon cancer (CC) is one of the most frequent and most deadly cancer in France and worldwide. Nearly half of patients die within 5 years after diagnosis. Clinical stage based on histological features and molecular classification based genomic instabilities (microsatellite instability (MSI), chromosomal instability (CIN) and hypermethylation of the promoters (ICPM)) are not sufficient to define homogeneous molecular entities and to predict recurrence effectively. To improve patient care, it is essential to better understand the diversity of the disease so that effective prognostic and predictive markers could be found. My PhD work has been focused on studying the diversity of CC at the molecular level through the use of omics approaches on a large cohort of tumor samples. It led to the establishment of a robust transcriptomic classification of these cancers, validated on independent data sets, and to a detailed characterization of each of the subtypes. Six subtypes have been defined and were associated with distinct clinicopathological characteristics and molecular alterations, specific enrichments of supervised gene expression signatures related to cell and lesions of origin, specific deregulated signaling pathways and distinct survival. The results of this work have been strengthened by a consensus classification defined by an international consortium working group in which I've been involved. These results confirm that colorectal cancer is an heterogeneous disease. They provide a renewed framework to develop prognostic signatures, discover new treatment targets, identify new therapeutic strategies and assess response to treatment in clinical trials
APA, Harvard, Vancouver, ISO, and other styles
10

Pages, Mélanie. "Integrative genomic, epigenetic, radiologic and histological characterization of pediatric glioneuronal tumors." Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCB217.

Full text
Abstract:
Pas de résumé
The large-scale genomic studies performed recently has enabled the objective identification of numerous novel genomic alterations and highlighted that pediatric brain tumors often harbor quiet cancer genomes, with a single driver genomic alteration. This characteristic is of special interest in the current context of precision medicine development. Low-grade glioneuronal tumor group is highly heterogeneous and remains particularly challenging since it includes a broad spectrum of tumors, often poorly discriminated by their histopathological features and not completely molecularly characterized. We used targeted methods (IHC, FISH, targeted sequencing), and large scale genomic and epigenetic methodologies to perform an integrative analysis to further characterized papillary glioneuronal tumors (PGNT), midline gangliogliomas and dysembryoplastic neuroepithelial tumors (DNT). We demonstrated that PGNT is a distinct entity characterized by a PRKCA fusion. We highlighted that H3 K27M mutation can occur in association with BRAF V600E mutation in midline grade I glioneuronal tumors, showing that despite the presence of H3 K27M mutations, these cases should not be graded and treated as grade IV tumors because they have a better spontaneous outcome than classic diffuse midline H3 K27M-mutant glioma. The DNT study enable us 1) to specify that non-specific DNT corresponds to a clinico-histological tumor group encompassing diverse molecularly distinct entities and 2) to demonstrate that specific DNTs can be progressive tumors and harbored a distinct DNA methylation profile. Diagnosis and genomic profiling that can guide precision medicine require tissue acquisition by neurosurgical procedures that are often difficult or not possible. We validated a sample collection procedure and we developed methodologies to detect circulating tumor DNA (ctDNA) in CSF, plasma and urine to identify clinically relevant genomic alterations from a cohort of 235 pediatric patients with brain tumors. We optimized a method to process ctDNA and performed ultra-low pass whole genome sequencing (ULP-WGS) using unique molecular identifiers, confirming we can reliably construct sequencing libraries from CSF-, plasma- and urine-derived ctDNA. ULP-WGS has also been used to assess sequencing library quality, copy number variations (CNVs) and tumor fraction. The vast majority of samples undergoing ULPWGS exhibited no CNVs, consistent with either absence in the tumor or low levels of tumorderived cfDNA. To distinguish between these, we developed a hybrid capture sequencing panel allowing identification of specific mutations and fusions more common in pediatric brain tumors
APA, Harvard, Vancouver, ISO, and other styles
11

Zhao, Haitao. "Analyzing TCGA Genomic and Expression Data Using SVM with Embedded Parameter Tuning." University of Akron / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=akron1415629295.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Peng, Liang. "Neighborhood-Oriented feature selection and classification of Duke’s stages on colorectal Cancer using high density genomic data." Kansas State University, 2011. http://hdl.handle.net/2097/10751.

Full text
Abstract:
Master of Science
Department of Statistics
Haiyan Wang
The selection of relevant genes for classification of phenotypes for diseases with gene expression data have been extensively studied. Previously, most relevant gene selection was conducted on individual gene with limited sample size. Modern technology makes it possible to obtain microarray data with higher resolution of the chromosomes. Considering gene sets on an entire block of a chromosome rather than individual gene could help to reveal important connection of relevant genes with the disease phenotypes. In this report, we consider feature selection and classification while taking into account of the spatial location of probe sets in classification of Duke’s stages B and C using DNA copy number data or gene expression data from colorectal cancers. A novel method was presented for feature selection in this report. A chromosome was first partitioned into blocks after the probe sets were aligned along their chromosome locations. Then a test of interaction between Duke’s stage and probe sets was conducted on each block of probe sets to select significant blocks. For each significant block, a new multiple comparison procedure was carried out to identify truly relevant probe sets while preserving the neighborhood location information of the probe sets. Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classification using the selected final probe sets was conducted for all samples. Leave-One-Out Cross Validation (LOOCV) estimate of accuracy is reported as an evaluation of selected features. We applied the method on two large data sets, each containing more than 50,000 features. Excellent classification accuracy was achieved by the proposed procedure along with SVM or KNN for both data sets even though classification of prognosis stages (Duke’s stages B and C) is much more difficult than that for the normal or tumor types.
APA, Harvard, Vancouver, ISO, and other styles
13

Deng, Mario [Verfasser]. "Predicting Rules for Cancer Subtype Classification using Grammar-Based Genetic Programming on various Genomic Data Types / Mario Deng." Bonn : Universitäts- und Landesbibliothek Bonn, 2018. http://d-nb.info/115667946X/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Kittler, Ralf. "Functional genomic analysis of cell cycle progression in human tissue culture cells." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2006. http://nbn-resolving.de/urn:nbn:de:swb:14-1161253856455-48321.

Full text
Abstract:
The eukaryotic cell cycle orchestrates the precise duplication and distribution of the genetic material, cytoplasm and membranes to daughter cells. In multicellular eukaryotes, cell cycle regulation also governs various organisatorial processes ranging from gametogenesis over multicellular development to tissue formation and repair. Consequently, defects in cell cycle regulation provoke a variety of human cancers. A global view of genes and pathways governing the human cell cycle would advance many research areas and may also deliver novel cancer targets. Therefore this work aimed on the genome-wide identification and systematic characterisation of genes required for cell cycle progression in human cells. I developed a highly specific and efficient RNA interference (RNAi) technology to realize the potential of RNAi for genome-wide screening of the genes essential for cell cycle progression in human tissue culture cells. This approach is based on the large-scale enzymatic digestion of long dsRNAs for the rapid and cost-efficient generation of libraries of highly complex pools of endoribonuclease-prepared siRNAs (esiRNAs). The analysis of the silencing efficiency and specificity of esiRNAs and siRNAs revealed that esiRNAs are as efficient for mRNA degradation as chemically synthesized siRNA designed with state-of-the-art design algorithms, while exhibiting a markedly reduced number of off-target effects. After demonstrating the effectiveness of this approach in a proof-of-concept study, I screened a genome-wide esiRNA library and used three assays to generate a quantitative and reproducible multi-parameter profile for the 1389 identified genes. The resulting phenotypic signatures were used to assign novel cell cycle functions to genes by combining hierarchical clustering, bioinformatics and proteomic data mining. This global perspective on gene functions in the human cell cycle presents a framework for the systematic documentation necessary for the understanding of cell cycle progression and its misregulation in diseases. The identification of novel genes with a role in human cell cycle progression is a starting point for an in-depth analysis of their specific functions, which requires the validation of the observed RNAi phenotype by genetic rescue, the study of the subcellular localisation and the identification of interaction partners of the expressed protein. One strategy to achieve these experimental goals is the expression of RNAi resistant and/or tagged transgenes. A major obstacle for transgenesis in mammalian tissue culture cells is the lack of efficient homologous recombination limiting the use of cultured mammalian cells as a real genetic system like yeast. I developed a technology circumventing this problem by expressing an orthologous gene from a closely related species including its regulatory sequences carried on a bacterial artificial chromosome (BAC). This technology allows physiological expression of the transgene, which cannot be achieved with conventional cDNA expression constructs. The use of the orthologous gene from a closely related species confers RNAi resistance to the transgene allowing the depletion of the endogenous gene by RNAi. Thus, this technology mimics homologous recombination by replacing an endogenous gene with a transgene while maintaining normal gene expression. In combination with recombineering strategies this technology is useful for RNAi rescue experiments, protein localisation and the identification of protein interaction partners in mammalian tissue culture cells. In summary, this thesis presents a major technical advance for large-scale functional genomic studies in mammalian tissue culture cells and provides novel insights into various aspects of cell cycle progression. (Die Druckexemplare enthalten jeweils eine CD-ROM als Anlagenteil: 217 MB: Movies, Rohdaten - Nutzung: Referat Informationsvermittlung der SLUB)
APA, Harvard, Vancouver, ISO, and other styles
15

Sedlář, Karel. "Komprese genomických signálů pro klasifikaci a identifikaci organismů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220018.

Full text
Abstract:
Modern classification of organisms is performed on molecular data. These methods rely on multiple alignment of sequences of characters which make them computationally demanding. Only small parts of genomes can be compared in reasonable time. In this paper, the novel algorithm based on conversion of the whole genome sequences to cumulative phase signals is presented. Dyadic wavelet transform is used for lossy compression of signals by redundant frequency bands elimination. Signal classification is then performed as a cluster analysis using Euclidian metrics where multiple alignment is replaced by dynamic time warping.
APA, Harvard, Vancouver, ISO, and other styles
16

Mahfouz, Norhan, Serena Caucci, Eric Achatz, Torsten Semmler, Sebastian Guenther, Thomas U. Berendonk, and Michael Schroeder. "High genomic diversity of multi-drug resistant wastewater Escherichia coli." Nature Publishing Group, 2018. https://tud.qucosa.de/id/qucosa%3A32482.

Full text
Abstract:
Wastewater treatment plants play an important role in the emergence of antibiotic resistance. They provide a hot spot for exchange of resistance within and between species. Here, we analyse and quantify the genomic diversity of the indicator Escherichia coli in a German wastewater treatment plant and we relate it to isolates’ antibiotic resistance. Our results show a surprisingly large pan-genome, which mirrors how rich an environment a treatment plant is. We link the genomic analysis to a phenotypic resistance screen and pinpoint genomic hot spots, which correlate with a resistance phenotype. Besides well-known resistance genes, this forward genomics approach generates many novel genes, which correlated with resistance and which are partly completely unknown. A surprising overall finding of our analyses is that we do not see any difference in resistance and pan genome size between isolates taken from the inflow of the treatment plant and from the outflow. This means that while treatment plants reduce the amount of bacteria released into the environment, they do not reduce the potential for antibiotic resistance of these bacteria.
APA, Harvard, Vancouver, ISO, and other styles
17

Bermudez, Santana Clara Isabel. "tRNomics: Genomic Organization and Processing Patterns of tRNAs." Doctoral thesis, Universitätsbibliothek Leipzig, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-61063.

Full text
Abstract:
Surprisingly little is known about the organization and distribution of tRNAs and tRNA-related sequences on a genome-wide scale. While tRNA complements are usually reported in passing as part of genome annotation efforts, and peculiar features such as the tandem arrangements of tRNAs in Entamoeba histolytica have been described in some detail, comparative studies are rare. We therefore set out to systematically survey the genomic arrangement of tRNAs in a wide range of eukaryotes to identify common patterns and taxon-specific peculiarities. We found that tRNA complements evolve rapidly and that tRNA locations are subject to rapid turnover. At the phylum level, distributions of tRNA numbers are very broad, with standard deviations on the order of the mean. Even within fairly closely related species, we observe dramatic changes in local organization. Consistent with this variability, syntenic conservation of tRNAs is also poor in general, with turn-over rates comparable to those of unconstrained sequence elements. We conclude that the genomic organization of tRNAs shows complex, lineage-specific patterns characterized by extensive variability, and that this variability is in striking contrast to the extreme levels of sequence-conservation of the tRNA genes themselves. Our comprehensive analysis of eukaroyotic tRNA distributions provides a basis for further studies into the interplay between tRNA gene arrangements and genome organization in general. Secondly, we focused on the investigation of small non-coding RNAs (ncRNAs) from whole transcriptome data. Since ncRNAs constitute a significant part of the transcriptome, we explore this data to detect and classify patterns derived from transcriptome-associated loci. We selected three distinct ncRNA classes: microRNAs, snoRNAs and tRNAs, all of which undergo maturation processes that lead to the production of shorter RNAs. After mapping the sequences to the reference genome, specific patterns of short reads were observed. These read patterns appeared to reflect RNA processing and, if so, should specify the RNA transcripts from which they are derived. In order to investigate whether the short read patterns carry information on the particular ncRNA class from which they orginate, we performed a random forest classification on the three distinct ncRNA classes listed above. Then, after exploring the potential classification of general groups of ncRNAs, we focused on the identification of small RNA fragments derived from tRNAs. After mapping transcriptome sequence data to reference genomes, we searched for specific short read patterns reflecting tRNA processing. In this context, we devised a common tRNA coordinate system based on conservation and secondary structure information that allows vector representation of processing products and thus comparison of different tRNAs by anticodon and amino acid. We report patterns of tRNA processing that seem to be conserved across species. Though the mechanisms and functional implications underlying these patterns remain to be clarified, our analysis suggests that each type of tRNA exhibits a specific pattern and thus appears to undergo a characteristic maturation process.
APA, Harvard, Vancouver, ISO, and other styles
18

Hou, Jiayi. "Regularization Methods for Predicting an Ordinal Response using Longitudinal High-dimensional Genomic Data." VCU Scholars Compass, 2013. http://scholarscompass.vcu.edu/etd/3242.

Full text
Abstract:
Ordinal scales are commonly used to measure health status and disease related outcomes in hospital settings as well as in translational medical research. Notable examples include cancer staging, which is a five-category ordinal scale indicating tumor size, node involvement, and likelihood of metastasizing. Glasgow Coma Scale (GCS), which gives a reliable and objective assessment of conscious status of a patient, is an ordinal scaled measure. In addition, repeated measurements are common in clinical practice for tracking and monitoring the progression of complex diseases. Classical ordinal modeling methods based on the likelihood approach have contributed to the analysis of data in which the response categories are ordered and the number of covariates (p) is smaller than the sample size (n). With the emergence of genomic technologies being increasingly applied for obtaining a more accurate diagnosis and prognosis, a novel type of data, known as high-dimensional data where the number of covariates (p) is much larger than the number of samples (n), are generated. However, corresponding statistical methodologies as well as computational software are lacking for analyzing high-dimensional data with an ordinal or a longitudinal ordinal response. In this thesis, we develop a regularization algorithm to build a parsimonious model for predicting an ordinal response. In addition, we utilize the classical ordinal model with longitudinal measurements to incorporate the cutting-edge data mining tool for a comprehensive understanding of the causes of complex disease on both the molecular level and environmental level. Moreover, we develop the corresponding R package for general utilization. The algorithm was applied to several real datasets as well as to simulated data to demonstrate the efficiency in variable selection and precision in prediction and classification. The four real datasets are from: 1) the National Institute of Mental Health Schizophrenia Collaborative Study; 2) the San Diego Health Services Research Example; 3) A gene expression experiment to understand `Decreased Expression of Intelectin 1 in The Human Airway Epithelium of Smokers Compared to Nonsmokers' by Weill Cornell Medical College; and 4) the National Institute of General Medical Sciences Inflammation and the Host Response to Burn Injury Collaborative Study.
APA, Harvard, Vancouver, ISO, and other styles
19

Schnorrer, Frank, Pavel Tomancak, Cornelia Schönbauer, Radoslaw K. Ejsmont, and Christoph C. H. Langer. "In Vivo RNAi Rescue in Drosophila melanogaster with Genomic Transgenes from Drosophila pseudoobscura." PloS, 2010. https://tud.qucosa.de/id/qucosa%3A29009.

Full text
Abstract:
Background Systematic, large-scale RNA interference (RNAi) approaches are very valuable to systematically investigate biological processes in cell culture or in tissues of organisms such as Drosophila. A notorious pitfall of all RNAi technologies are potential false positives caused by unspecific knock-down of genes other than the intended target gene. The ultimate proof for RNAi specificity is a rescue by a construct immune to RNAi, typically originating from a related species. Methodology/Principal Findings We show that primary sequence divergence in areas targeted by Drosophila melanogaster RNAi hairpins in five non-melanogaster species is sufficient to identify orthologs for 81% of the genes that are predicted to be RNAi refractory. We use clones from a genomic fosmid library of Drosophila pseudoobscura to demonstrate the rescue of RNAi phenotypes in Drosophila melanogaster muscles. Four out of five fosmid clones we tested harbour cross-species functionality for the gene assayed, and three out of the four rescue a RNAi phenotype in Drosophila melanogaster. Conclusions/Significance The Drosophila pseudoobscura fosmid library is designed for seamless cross-species transgenesis and can be readily used to demonstrate specificity of RNAi phenotypes in a systematic manner.
APA, Harvard, Vancouver, ISO, and other styles
20

MONTEMURRO, MARILISA. "Algorithms for cancer genome data analysis - Learning techniques for ITH modeling and gene fusion classification." Doctoral thesis, Politecnico di Torino, 2022. http://hdl.handle.net/11583/2970978.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Ullrich, Sophie. "Genomic and transcriptomic characterization of novel iron oxidizing bacteria of the genus “Ferrovum“." Doctoral thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2016. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-205981.

Full text
Abstract:
Acidophilic iron oxidizing bacteria of the betaproteobacterial genus “Ferrovum” are ubiquitously distributed in acid mine drainage (AMD) habitats worldwide. Since their isolation and maintenance in the laboratory has proved to be extremely difficult, members of this genus are not accessible to a “classical” microbiological characterization with exception of the designated type strain “Ferrovum myxofaciens” P3G. The present study reports the characterization of “Ferrovum” strains at genome and transcriptome level. “Ferrovum” sp. JA12, “Ferrovum” sp. PN-J185 and “F. myxofaciens” Z-31 represent the iron oxidizers of the mixed cultures JA12, PN-J185 and Z-31. The mixed cultures were derived from the mine water treatment plant Tzschelln close to the lignite mining site in Nochten (Lusatia, Germany). The mixed cultures also contain a heterotrophic strain of the genus Acidiphilium. The genome analysis of Acidiphilium sp. JA12-A1, the heterotrophic contamination of the mixed culture JA12, indicates an interspecies carbon and phosphate transfer between Acidiphilium and “Ferrovum” in the mixed culture, and possibly also in their natural habitat. The comparison of the inferred metabolic potentials of four “Ferrovum” strains and the analysis of their phylogenetic relationships suggest the existence of two subgroups within the genus “Ferrovum” (i.e. the operational taxonomic units OTU-1 and OUT-2) harboring characteristic metabolic profiles. OTU-1 includes the “F. myxofaciens” strains P3G and Z-31, which are predicted to be motile and diazotrophic, and to have a higher acid tolerance than OTU-2. The latter includes two closely related proposed species represented by the strains JA12 and PN-J185, which appear to lack the abilities of motility, chemotaxis and molecular nitrogen fixation. Instead, both OTU-2 strains harbor the potential to use urea as alternative nitrogen source to ammonium, and even nitrate in case of the JA12-like species. The analysis of the genome architectures of the four “Ferrovum” strains suggests that horizontal gene transfer and loss of metabolic genes, accompanied by genome reduction, have contributed to the evolution of the OTUs. A trial transcriptome study of “Ferrovum” sp. JA12 supports the ferrous iron oxidation model inferred from its genome sequence, and reveals the potential relevance of several hypothetical proteins in ferrous iron oxidation. Although the inferred models in “Ferrovum” spp. share common features with the acidophilic iron oxidizers of the Acidithiobacillia, it appears to be more similar to the neutrophilic iron oxidizers Mariprofundus ferrooxydans (“Zetaproteobacteria”) and Sideroxydans lithotrophicus (Betaproteobacteria). These findings suggest a common origin of ferrous iron oxidation in the Beta- and “Zetaproteobacteria”, while the acidophilic lifestyle of “Ferrovum” spp. may have been acquired later, allowing them to also colonize acid mine drainage habitats.
APA, Harvard, Vancouver, ISO, and other styles
22

Hennart, Mélanie. "Taxonomie génomique des souches bactériennes et émergence de l'antibiorésistance." Electronic Thesis or Diss., Sorbonne université, 2022. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2022SORUS547.pdf.

Full text
Abstract:
Les maladies infectieuses font partie des préoccupations mondiales en santé publique, en particulier en raison de la résistance aux antimicrobiens chez certaines bactéries pathogènes. L'espèce Klebsiella pneumoniae est identifiée comme l'une des bactéries multirésistantes les plus préoccupantes. Corynebacterium diphtheriae, responsable de la diphtérie, reste largement sensible aux antibiotiques de première intention dont la pénicilline et peut être contrôlée par les vaccins, mais ré-émerge lorsque la couverture vaccinale est insuffisante. Parmi les moyens de contrôle des maladies infectieuses, la détection et l'identification précise de ces agents pathogènes, ainsi que leur suivi épidémiologique, jouent un rôle primordial. La mise en œuvre du séquençage génomique a révolutionné le génotypage bactérien grâce à son haut pouvoir discriminant, qui permet la distinction des agents pathogènes à l'échelle des souches. Le séquençage génomique permet également la détection de variants et de leurs caractéristiques importantes, telles que leur virulence ou leur résistance. Les travaux de recherche de cette thèse s'articulent sur deux axes principaux. Le premier axe apporte des analyses bio-informatiques de la structure populationnelle de la résistance aux antibiotiques chez C. diphtheriae. Une étude d'association génomique (GWAS) a été réalisée pour définir les bases moléculaires de la résistance, ainsi que les associations avec la production de toxine diphtérique et d'autres caractéristiques des souches. Un nouveau gène de résistance à la pénicilline a été découvert sur un élément mobile chez C. diphtheriae. Un outil de génotypage a été développé spécifiquement pour C. diphtheriae, pour laquelle les liens entre génotypes et phénotypes cliniques sont mal connus. Cet outil consolide et facilite la détection et le génotypage des principaux facteurs de virulence et des gènes de résistance, ainsi que l'usage des nomenclatures des souches à partir de génomes assemblés. Il permet également de prédire les biovars et la toxicité des souches. Le second axe est consacré à la taxonomie génomique infra-spécifique. Une nouvelle approche de classification et de nomenclature génomiques est proposée en utilisant comme modèle l'espèce K. pneumoniae. Ces travaux détaillent la conception et l'implémentation d'un système de codes-barres qui combine le regroupement par Single Linkage MultiLevel (MLSL) et les LIN (Life Identification Number) codes, tous deux basés sur le même schéma de typage core-genome MLST (cgMLST). Cette approche taxonomique innovante résulte en une nomenclature infra-spécifique précise et stable, qui est de plus largement déployable chez les autres espèces bactériennes. Une étude de la structure phylogénétique de C. diphtheriae a également été réalisée, avec la mise en œuvre d'un système cgMLST sur la base duquel une taxonomie génomique des souches a été proposée. Sur la base des nouveaux apports et concepts précédemment exposés, plusieurs études de cas ont été réalisées : mise en évidence et caractérisation d'une nouvelle espèce bactérienne (C. rouxii), précédemment confondue avec C. diphtheriae ; et épidémiologie génomique de la diphtérie dans différentes régions du monde, ou à partir de sources cliniques humaines et animales. Ces applications de la taxonomie génomique associée à la détection des gènes de résistance aux antibiotiques illustrent le potentiel des méthodes et des outils développés durant cette thèse afin de contribuer à la recherche et à la surveillance génomique des bactéries pathogènes
Infectious diseases are a global public health concern, particularly due to antimicrobial-resistance in some pathogenic bacteria. Klebsiella pneumoniae is one of the most worrying multiresistant bacteria. Corynebacterium diphtheriae, which causes diphtheria, remains largely susceptible to first-line antibiotics, including penicillin, and can be controlled through vaccination, but re-emerges when vaccination coverage is insufficient. Among the effective infection control measures, the accurate detection and identification of these pathogens, as well as their epidemiological monitoring, play a key role. In the recent years, the implementation of whole-genome sequencing (WGS) has revolutionised bacterial genotyping, by providing discrimination at the strain level. Genomic sequencing also enables the detection of variants and their important characteristics, such as virulence or antimicrobial resistance. The research work of this thesis is structured around two main axes. The first axis provides bioinformatic analyses of the population structure of antimicrobial resistance in C. diphtheriae. A genome-wide association study (GWAS) was performed to determine the genetic basis behind the resistance phenotypes, as well as the associations with diphtheria toxin production and other strain characteristics. A new penicillin resistance gene was discovered on a mobile element in C. diphtheriae. A genotyping tool was developed specifically for C. diphtheriae, for which the links between genotypes and clinical phenotypes are poorly known. This tool consolidates and facilitates the detection and genotyping of the main virulence factors and resistance genes, as well as the use of strain nomenclatures from assembled genomes. It also enables the prediction of biovars and toxicity of strains. The second axis relates to infra-species genomic taxonomy. A new approach of genome-based classification and nomenclature of strains was developed using K. pneumoniae as a model. This work describes the design and implementation of a barcoding system that combines Single Linkage MultiLevel (MLSL) clustering and Life Identification Number (LIN) codes, both based on the same core-genome MLST (cgMLST) typing scheme. This innovative taxonomic approach, widely applicable to other bacterial species, yields precise and stable nomenclatures. A study of the phylogenetic structure of C. diphtheriae was also carried out, with the implementation of a cgMLST scheme on the basis of which a genomic taxonomy of strains was proposed. Based on the contributions and concepts presented above, several case studies were carried out: identification and characterisation of a new species (C. rouxii), previously misidentified as C. diphtheriae; genomic epidemiology of diphtheria in different world regions or clinical sources. These applications of genomic taxonomy in combination with antimicrobial resistance gene detection illustrate the potential of the methods and tools developed during this thesis to support genomic research and surveillance of pathogenic bacteria
APA, Harvard, Vancouver, ISO, and other styles
23

Tanaka, Erica Akemi. "Uma adaptação do método Binary Relevance utilizando árvores de decisão para problemas de classificação multirrótulo aplicado à genômica funcional." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/95/95131/tde-22102013-145119/.

Full text
Abstract:
Muitos problemas de classificação descritos na literatura de aprendizado de máquina e mineração de dados dizem respeito à classificação em que cada exemplo pertence a um único rótulo. Porém, vários problemas de classificação, principalmente no campo de Bioinformática são associados a mais de um rótulo; esses problemas são conhecidos como problemas de classificação multirrótulo. O princípio básico da classificação multirrótulo é similar ao da classificação tradicional (que possui um único rótulo), sendo diferenciada no número de rótulos a serem preditos, na qual há dois ou mais rótulos. Na área da Bioinformática muitos problemas são compostos por uma grande quantidade de rótulos em que cada exemplo pode estar associado. Porém, algoritmos de classificação tradicionais são incapazes de lidar com um conjunto de exemplos mutirrótulo, uma vez que esses algoritmos foram projetados para predizer um único rótulo. Uma solução mais simples é utilizar o método conhecido como método Binary Relevance. Porém, estudos mostraram que tal abordagem não constitui uma boa solução para o problema da classificação multirrótulo, pois cada classe é tratada individualmente, ignorando as possíveis relações entre elas. Dessa maneira, o objetivo dessa pesquisa foi propor uma nova adaptação do método Binary Relevance que leva em consideração relações entre os rótulos para tentar minimizar sua desvantagem, além de também considerar a capacidade de interpretabilidade do modelo gerado, não só o desempenho. Os resultados experimentais mostraram que esse novo método é capaz de gerar árvores que relacionam os rótulos correlacionados e também possui um desempenho comparável ao de outros métodos, obtendo bons resultados usando a medida-F.
Many classification problems described in the literature on Machine Learning and Data Mining relate to the classification in which each example belongs to a single class. However, many classification problems, especially in the field of Bioinformatics, are associated with more than one class; these problems are known as multi-label classification problems. The basic principle of multi-label classification is similar to the traditional classification (single label), and distinguished by the number of classes to be predicted, in this case, in which there are two or more labels. In Bioinformatics many problems are composed of a large number of labels that can be associated with each example. However, traditional classification algorithms are unable to cope with a set of multi-label examples, since these algorithms are designed to predict a single label. A simpler solution is to use the method known as Binary Relevance. However, studies have shown that this approach is not a good solution to the problem of multi-label classification because each class is treated individually, ignoring possible relations between them. Thus, the objective of this research was to propose a new adaptation of Binary Relevance method that took into account relations between labels trying to minimize its disadvantage, and also consider the ability of interpretability of the model generated, not just its performance. The experimental results show that this new method is capable of generating trees that relate labels and also has a performance comparable to other methods, obtaining good results using F-measure.
APA, Harvard, Vancouver, ISO, and other styles
24

Niciura, Simone Cristina Méo. "Interação núcleo-citoplasmática em embriões e expressão de genes "imprinted" em fetos bovinos produzidos in vivo, in vitro e partenogenéticos /." Jaboticabal : [s.n.], 2005. http://hdl.handle.net/11449/105949.

Full text
Abstract:
Orientador: Joaquim Mansano Garcia
Banca: Flávio Vieira Meirelles
Banca: Claudia Lima Verde Leal
Banca: Vera Fernanda Martins Hossepian de Lima
Banca: Gisele Zoccal Mingoti
Resumo: A maturação oocitária é marcada pela retomada da primeira divisão da meiose, com progressão do estádio de Vesícula Germinativa (GV) da Prófase I até a Metáfase II (MII), e inclui todos os eventos necessários para que o oócito expresse seu potencial máximo de desenvolvimento após a fecundação. Para avaliarmos a eficiência da maturação in vitro (MIV), utilizamos oócitos classificados em viáveis (graus I, II e III) e inviáveis (atrésico e desnudo), e acompanhamos a progressão nuclear e a distribuição dos grânulos corticais (GC) como indício de maturação citoplasmática, após MIV em TCM 199 com soro fetal bovino, hormônios, antibiótico e piruvato, por 24h em 5% de CO2 em ar. Maturação nuclear (78,4-87,8%) e citoplasmática (GC periféricos; 67,2-79,3%) foram semelhantes entre as diferentes classes de oócitos e apresentaramse como eventos independentes. Para o acompanhamento dos eventos desencadeados pelo espermatozóide, avaliamos a dinâmica nuclear e de microtúbulos, em intervalos de 2h, após fecundação in vitro (FIV), em meio TALP com heparina, PHE e sêmen preparado em gradiente de Percoll. Observamos que o estádio de MII foi predominante de 2 a 8h; MII e Anáfase/Telófase (A/T) predominaram às 10h; MII, A/T e estádio pronuclear (PN) de 14 a 16h; e PN a partir de 18h. A penetração do espermatozóide ocorreu após 4h da inseminação dos oócitos; a diferenciação dos PN 14 masculino e feminino pelo tamanho foi possível de 14 a 18h e a singamia ocorreu a partir de 24h. O período de 10h pode ser suficiente para que a FIV seja efetiva em oócitos bovinos, nas condições aqui descritas.
Abstract: We aimed to evaluate events involved in in vitro maturation, fertilization and development, and parthenogenetic activation of bovine oocytes assessed by nuclear-cytoplasmic interaction and gene expression. Oocyte morphological selection did not affect nuclear maturation (78.4-87.8%) and cytoplasmic cortical granule distribution (67.2-79.3%). Following nuclear and microtubular dynamics after fertilization (IVF), we observed sperm penetration 4h after insemination; male and female pronuclei differentiation by size from 14 to 18h; syngamy after 24h; and sufficient co-incubation of spermatozoa and oocytes for 10h. Pronuclear transfer to study the interaction between nucleus (N) and cytoplasm (C) in parthenogenetic embryos produced by ionomycin followed by strontium (S) or 6-DMAP (D) was assessed by cleavage, eight-cell, and blastocyst development rates: CSND (76.5, 36.4, and 6.8%) and CDNS (69.5, 25.0, and 4.9%). S cytoplasm promoted dominant effect on D nucleus. Higher rates of developmental arrest up to the eight-cell stage were observed by the combination of cytoplasm and nucleus produced by the two different activation treatments. We recovered parthenogenetic D fetuses on Day 35, which were small but normal in formation and in appearance of chorio-alantoic membranes. Genomic imprinting of IGF2 was observed, but XIST was maternally expressed in extra-embryonic tissues. In vitro culture promoted higher expression of IGF2 and H19 genes and also increased IGF2/IGF2r ratio in IVF embryos compared to in vivo produced ones.
Doutor
APA, Harvard, Vancouver, ISO, and other styles
25

Niciura, Simone Cristina Méo [UNESP]. "Interação núcleo-citoplasmática em embriões e expressão de genes imprinted em fetos bovinos produzidos in vivo, in vitro e partenogenéticos." Universidade Estadual Paulista (UNESP), 2005. http://hdl.handle.net/11449/105949.

Full text
Abstract:
Made available in DSpace on 2014-06-11T19:35:11Z (GMT). No. of bitstreams: 0 Previous issue date: 2005-12-19Bitstream added on 2014-06-13T19:24:36Z : No. of bitstreams: 1 niciura_scm_dr_jabo.pdf: 1029270 bytes, checksum: 8a93cc4207fdb3426be68df4fa064ef9 (MD5)
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
A maturação oocitária é marcada pela retomada da primeira divisão da meiose, com progressão do estádio de Vesícula Germinativa (GV) da Prófase I até a Metáfase II (MII), e inclui todos os eventos necessários para que o oócito expresse seu potencial máximo de desenvolvimento após a fecundação. Para avaliarmos a eficiência da maturação in vitro (MIV), utilizamos oócitos classificados em viáveis (graus I, II e III) e inviáveis (atrésico e desnudo), e acompanhamos a progressão nuclear e a distribuição dos grânulos corticais (GC) como indício de maturação citoplasmática, após MIV em TCM 199 com soro fetal bovino, hormônios, antibiótico e piruvato, por 24h em 5% de CO2 em ar. Maturação nuclear (78,4-87,8%) e citoplasmática (GC periféricos; 67,2-79,3%) foram semelhantes entre as diferentes classes de oócitos e apresentaramse como eventos independentes. Para o acompanhamento dos eventos desencadeados pelo espermatozóide, avaliamos a dinâmica nuclear e de microtúbulos, em intervalos de 2h, após fecundação in vitro (FIV), em meio TALP com heparina, PHE e sêmen preparado em gradiente de Percoll. Observamos que o estádio de MII foi predominante de 2 a 8h; MII e Anáfase/Telófase (A/T) predominaram às 10h; MII, A/T e estádio pronuclear (PN) de 14 a 16h; e PN a partir de 18h. A penetração do espermatozóide ocorreu após 4h da inseminação dos oócitos; a diferenciação dos PN 14 masculino e feminino pelo tamanho foi possível de 14 a 18h e a singamia ocorreu a partir de 24h. O período de 10h pode ser suficiente para que a FIV seja efetiva em oócitos bovinos, nas condições aqui descritas.
We aimed to evaluate events involved in in vitro maturation, fertilization and development, and parthenogenetic activation of bovine oocytes assessed by nuclear-cytoplasmic interaction and gene expression. Oocyte morphological selection did not affect nuclear maturation (78.4-87.8%) and cytoplasmic cortical granule distribution (67.2-79.3%). Following nuclear and microtubular dynamics after fertilization (IVF), we observed sperm penetration 4h after insemination; male and female pronuclei differentiation by size from 14 to 18h; syngamy after 24h; and sufficient co-incubation of spermatozoa and oocytes for 10h. Pronuclear transfer to study the interaction between nucleus (N) and cytoplasm (C) in parthenogenetic embryos produced by ionomycin followed by strontium (S) or 6-DMAP (D) was assessed by cleavage, eight-cell, and blastocyst development rates: CSND (76.5, 36.4, and 6.8%) and CDNS (69.5, 25.0, and 4.9%). S cytoplasm promoted dominant effect on D nucleus. Higher rates of developmental arrest up to the eight-cell stage were observed by the combination of cytoplasm and nucleus produced by the two different activation treatments. We recovered parthenogenetic D fetuses on Day 35, which were small but normal in formation and in appearance of chorio-alantoic membranes. Genomic imprinting of IGF2 was observed, but XIST was maternally expressed in extra-embryonic tissues. In vitro culture promoted higher expression of IGF2 and H19 genes and also increased IGF2/IGF2r ratio in IVF embryos compared to in vivo produced ones.
APA, Harvard, Vancouver, ISO, and other styles
26

LOVINO, MARTA. "Algorithms for complex systems in the life sciences." Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2910082.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Ebenhöh, Oliver, and Thomas Handorf. "Functional classification of genome-scale metabolic networks." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4497/.

Full text
Abstract:
We propose two strategies to characterize organisms with respect to their metabolic capabilities. The first, investigative, strategy describes metabolic networks in terms of their capability to utilize different carbon sources, resulting in the concept of carbon utilization spectra. In the second, predictive, approach minimal nutrient combinations are predicted from the structure of the metabolic networks, resulting in a characteristic nutrient profile. Both strategies allow for a quantification of functional properties of metabolic networks, allowing to identify groups of organisms with similar functions. We investigate whether the functional description reflects the typical environments of the corresponding organisms by dividing all species into disjoint groups based on whether they are aerotolerant and/or photosynthetic. Despite differences in the underlying concepts, both measures display some common features. Closely related organisms often display a similar functional behavior and in both cases the functional measures appear to correlate with the considered classes of environments. Carbon utilization spectra and nutrient profiles are complementary approaches toward a functional classification of organism-wide metabolic networks. Both approaches contain different information and thus yield different clusterings, which are both different from the classical taxonomy of organisms. Our results indicate that a sophisticated combination of our approaches will allow for a quantitative description reflecting the lifestyles of organisms.
APA, Harvard, Vancouver, ISO, and other styles
28

Eldfjell, Yrin. "Identifying Mitochondrial Genomes in Draft Whole-Genome Shotgun Assemblies of Six Gymnosperm Species." Thesis, Stockholms universitet, Matematiska institutionen, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-175410.

Full text
Abstract:
Sequencing efforts for gymnosperm genomes typically focus on nuclear and chloroplast DNA, with only three complete mitochondrial genomes published as of 2017. The availability of additional mitochondrial genomes would aid biological and evolutionary understanding of gymnosperms. Identifying mtDNA from existing whole genome sequencing (WGS) data (i.e. contigs) negates the need for additional experimental work but previous classification methods show limitations in sensitivity or accuracy, particularly in difficult cases. In this thesis I present a classification pipeline based on (1) kmer probability scoring and (2) SVM classification applied to the available contigs. Using this pipeline the mitochondrial genomes of six gymnosperm species were obtained: Abies sibirica, Gnetum gnemon, Juniperus communis, Picea abies, Pinus sylvestris and Taxus baccata. Cross-validation experiments showed a satisfying and forsome species excellent degree of accuracy.
Vid sekvensering av gymnospermers arvsmassa har fokus oftast lagts på kärn- och kloroplast-DNA. Bara tre fullständiga mitokondriegenom har publicerats hittills (2017). Fler mitokondriegenom skulle kunna leda till nya kunskaper om gymnospermers biologi och evolution. Då mitokondriernas arvsmassa identifieras från tillgängliga sekvenser för hela organismen (så kallade “contiger”) behövs inget ytterligare laboratoriearbete, men detta förfarande har visat sig leda till bristfällig känslighet och korrekthet, särskilt i svåra fall. I denna avhandling presenterar jag en metod baserad på (1) kmer-sannolikheter och (2) SVM-klassificering applicerad på de tillgängliga contigerna. Med denna metod togs arvsmassan för mitokondrien hos sex gymnospermer fram: Abies sibirica, Gnetum gnemon, Juniperus communis, Picea abies, Pinus sylvestris och Taxus baccata. Korsvalideringsexperiment visade en tillfredställande och för vissa arter utmärkt precision.
APA, Harvard, Vancouver, ISO, and other styles
29

Nedvěd, Jiří. "Zpracování genomických signálů fraktály." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2012. http://www.nusl.cz/ntk/nusl-219634.

Full text
Abstract:
This diploma project is showen possibilities in classification of genomic sequences with CGR and FCGR methods in pictures. From this picture is computed classificator with BCM. Next here is written about the programme and its opportunities for classification. In the end is compared many of sequences computed in different options of programme.
APA, Harvard, Vancouver, ISO, and other styles
30

Fonseca, Flávio Luiz Engelke. "Utilização de métodos de comparação de sequências para a detecção de genes taxonomicamente restritos." Universidade do Estado do Rio de Janeiro, 2011. http://www.bdtd.uerj.br/tde_busca/arquivo.php?codArquivo=4712.

Full text
Abstract:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Desde a década de 1990, os esforços internacionais para a obtenção de genomas completos levaram à determinação do genoma de inúmeros organismos. Isto, aliado ao grande avanço da computação, tem permitido o uso de abordagens inovadoras no estudo da estrutura, organização e evolução dos genomas e na predição e classificação funcional de genes. Entre os métodos mais comumente empregados nestas análises está a busca por similaridades entre sequências biológicas. Análises comparativas entre genomas completamente sequenciados indicam que cada grupo taxonômico estudado até o momento contém de 10 a 20% de genes sem homólogos reconhecíveis em outras espécies. Acredita-se que estes genes taxonomicamente restritos (TRGs) tenham um papel importante na adaptação a nichos ecológicos particulares, podendo estar envolvidos em importantes processos evolutivos. Entretanto, seu reconhecimento não é simples, sendo necessário distingui-los de ORFs não-funcionais espúrias e/ou artefatos derivados dos processos de anotação gênica. Além disso, genes espécie- ou gêneroespecíficos podem representar uma oportunidade para o desenvolvimento de métodos de identificação e/ou tipagem, tarefa relativamente complicada no caso dos procariotos, onde o método padrão-ouro na atualidade envolve a análise de um grupo de vários genes (MultiLocus Sequence Typing MLST). Neste trabalho utilizamos dados produzidos através de análises comparativas de genomas e de sequências para identificar e caracterizar genes espécie- e gênero-específicos, os quais possam auxiliar no desenvolvimento de novos métodos para identificação e/ou tipagem, além de poderem lançar luz em importantes processos evolutivos (tais como a perda e ou origem de genes em linhagens particulares, bem como a expansão de famílias de genes em linhagens específicas) nos organismos estudados.
Since the 1990s, international efforts to obtain complete genomes led to the determination of the genome of many organisms. This, coupled with great advances in computing, has allowed the use of innovative approaches in the study of structure, organization and evolution of genomes and the prediction and functional classification of genes. Among the methods most commonly employed in such analysis is the search for similarities between biological sequences. Comparative analysis of whole genome sequences indicate that each taxonomic group studied so far contain 10 to 20% of genes with no recognizable homologues in other species. It is believed that these taxonomically restricted genes (TRGs) have an important role in adaptation to particular ecological niches and may be involved in important evolutionary processes. However, the recognition of such genes is not simple, being necessary to distinguish them from spurious ORFs nonfunctional and / or artifacts from the processes of gene annotation. Furthermore, species- or genus-specific genes may be an opportunity for the development of methods for identification and / or typing, a relatively complicated task in the case of prokaryotes, where the gold standard at present involves the analysis of a group of several genes (Multilocus Sequence Typing - MLST). This study used data generated through comparative analysis of genome sequences to identify and characterize species- and genusspecific genes, which may help in the development of new methods for identification and / or typing, and can possibly shed light on important evolutionary processes (such as loss and / or origin of genes in particular lineages, as well as expansion of gene families in specific strains) involving the studied organisms.
APA, Harvard, Vancouver, ISO, and other styles
31

Osterud, Erin Lee. "Gibbon classification : the issue of species and subspecies." PDXScholar, 1988. https://pdxscholar.library.pdx.edu/open_access_etds/3925.

Full text
Abstract:
Gibbon classification at the species and subspecies levels has been hotly debated for the last 200 years. This thesis explores the reasons for this debate. Authorities agree that siamang, concolor, kloss and hoolock are species, while there is complete lack of agreement on lar, agile, moloch, Mueller's and pileated. The disagreement results from the use and emphasis of different character traits, and from debate on the occurrence and importance of gene flow.
APA, Harvard, Vancouver, ISO, and other styles
32

Roger, Frédéric. "Mode d’évolution et taxonomie au sein du genre Aeromonas : que nous apprend l'étude de la diversité génétique et génomique ?" Thesis, Montpellier 1, 2012. http://www.theses.fr/2012MON13504/document.

Full text
Abstract:
L'étude des bactéries pathogènes opportunistes d'origine environnementale ayant des modes de vie variés, libre et autonome ou contraint à une niche spécifique représentée par l'hôte, présente un intérêt dans la compréhension de l'adaptation des bactéries à leurs hôtes et de l'apparition de nouveaux pathogènes. Le genre Aeromonas regroupe des bactéries communes des milieux aquatiques, principalement des eaux douces. Elles sont capables d'entretenir différents types de relations avec leurs hôtes (parasitisme/symbiose) et peuvent être hébergées par un large spectre d'organismes. Chez l'homme, elles sont la cause d'une large variété d'infections (gastroentérite, bactériémie, infection de la peau et des tissus mous, etc.) mais les difficultés d'identification des souches et une taxonomie confuse engendrent une méconnaissance de la pathogénicité réelle des différentes espèces décrites.Le but de ce travail était d'étudier les mécanismes d'évolution génomique et génétique à l'origine de la remarquable capacité d'adaptation des Aeromonas à leurs hôtes, notamment à l'homme. Une analyse comparative de la diversité génétique et génomique d'une large collection de 195 souches représentative des différentes espèces du genre et d'origines variées (humaine, animale et environnementale) a été menée. La diversité génétique a été appréhendée au moyen d'une approche multilocus incluant l'étude des séquences de 7 gènes de ménage (dnaK, gltA, gyrB, radA, rpoB, tsf, zipA). En parallèle, nous avons étudié la variabilité des copies multiples du gène rrs en explorant leur diversité génétique par une méthode d'électrophorèse en condition dénaturante (PCR-TTGE) et la variabilité du nombre et de la répartition des opérons rrn dans le chromosome de ces bactéries par électrophorèse en champ pulsé.Ces différentes approches nous ont permis de mettre en évidence : i) une diversité très élevée des 7 gènes de ménage analysés ainsi que l'existence de transferts latéraux, ii) l'existence de sous-groupes de souches adaptées à un hôte ou à une localisation anatomique particulière, iii) un nombre important d'opérons rrn (8 à 11), iv) l'existence de profils de distribution chromosomique des opérons rrn spécifique d'espèce ou de groupes d'espèces proches, v) une forte proportion (41,5%) des souches présentant une hétérogénéité de séquences des différentes copies du gène rrs. Nos résultats montrent également la valeur taxonomique de l'étude de la diversité génétique et génomique à l'aide des approches proposées au sein du genre Aeromonas.Nous montrons que : i) l'ARN ribosomique 16S est un marqueur informatif pour étudier les modes d'évolution et conduire des études de taxonomie mixte et consensuelle dans le genre Aeromonas à condition d'étudier la diversité de ses multiples copies, ii) A. caviae présente des caractéristiques génétiques particulières témoignant d'un processus d'adaptation en cours à une niche écologique que nous supposons être l'intestin humain. Nos résultats supportent également un mode d'évolution des bactéries du genre Aeromonas dit en complexes d'espèces accompagné de phénomènes de spéciation pouvant en partie expliquer les difficultés rencontrées pour établir une taxonomie claire du genre Aeromonas
Abstract :Studying opportunistic pathogenic bacteria with an environmental origin and a wide variety of lifestyles, either free-living or host-adapted, is useful to improve the understanding of bacterial adaptation to hosts and the emergence of novel pathogens. The genus Aeromonas groups water-living bacteria, mainly in freshwater. They are able to support several types of relations with their hosts (parasitism/ symbiosis) and are harbored by a large spectrum of hosts. In human, they are involved in a wide range of infections (gastroenteritis, bacteraemia, wound and soft tissue infection, etc.) but difficulties in identifying strains and a confused taxonomy results in incomplete knowledge of the real strain pathogenicity of each described species.The aim of this work was to study the mechanisms of genomic and genetic evolution related to the outstanding ability of Aeromonas adaptation to host, including human. We led a comparative analysis of the genetic and genomic diversity on a large strain collection (195 strains) representative of the species of the genus and from various sources (human, animal, environmental). We studied the genetic diversity using a 7 housekeeping gene multilocus strain analysis (dnaK, gltA, gyrB, radA, rpoB, tsf, zipA). We also described the variability in the i) rrs multiple gene copies using a PRC-TTGE method and ii) the number and distribution of the rrn operons within the chromosome using a pulse field gel electrophoresis. Our results also showed the taxonomic value of the study of genetic and genomic diversity using the approaches proposed in the genus Aeromonas.These various approaches enabled us to highlight: i) a high genetic diversity in the housekeeping genes together with horizontal gene transfers events, ii) some clusters that were either host-adapted or adapted to particular anatomical locations, iii) a high number of rrn operons (from 8 to 11), iv) the presence of patterns of rrn operon that were either species-specific or specific to groups of closely related species, v) a high frequency (41,5%) of strains harboring sequence heterogeneities between rrs copies. We showed that: i) 16 rRNA is a valuable marker for studying the modes of evolution of aeromonads and the taxonomy within the genus Aeromonas provided that multiple copy diversity is taken into account, ii) A. caviae displays particular genetic characteristic that suggested an ongoing process of adaptation to a niche that we supposed to be human digestive tract. Our results also support an evolution mode in complex of species with some speciation process that could at least in part explain difficulties for determining a clarified taxonomy within the genus Aeromonas
APA, Harvard, Vancouver, ISO, and other styles
33

Nasser, Sara. "Fuzzy methods for meta-genome sequence classification and assembly." abstract and full text PDF (free order & download UNR users only), 2008. http://0-gateway.proquest.com.innopac.library.unr.edu/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3307706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Gormley, Michael P. Tozeren Aydin. "Classification of tissues and disease subtypes using whole-genome signatures /." Philadelphia, Pa. : Drexel University, 2008. http://hdl.handle.net/1860/2922.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sassi, Mohamed. "La diversité des espèces du groupe Mycobacterium abscessus et leurs mycobactériophages." Thesis, Aix-Marseille, 2013. http://www.theses.fr/2013AIXM5041/document.

Full text
Abstract:
Premièrement, nous avons analysé 14 génomes publiés de M. abscessus montrant quece taxon comprend au moins cinq taxons différents spécifiés par des caractéristiques microbiologiques d’intérêt médical. Au cours d'un deuxième travail, nous avons développé une technique d’identification et de génotypage de M. abscessus qui a permis de distinguer sans ambiguïté M. massiliense de M. bolletii et M. abscessus.Nous avons ensuite analysé le bactériophage de M. bolletii que nous avons nommé Araucaria. La résolution de sa structure 3D a montré une capside et un connecteur similaires à ceux de plusieurs bactériophages de bactéries à Gram négatif et positif; et une queue hélicoïdale décorée par des pointes radiales. La partie basale (baseplate) du phage Araucaria présente des caractéristiques observées dans les phages se liant à des récepteurs de protéines. Araucaria se lie à son hôte en deux temps, un premier par liaison de la queue aux saccharides de l'hôte puis un deuxième par liaison de la baseplate aux protéines de la paroi cellulaire.Nous avons analysé la présence de séquence de phages dans 48 génomes disponibles de M. abscessus. Notre analyse phylogénétique suggère que les espèces de M. abscessus ont été infectées par différents mycobactériophages et ont une histoire évolutive différente de celle des hôtes mycobactériens et contiennent aussi des protéines acquises par transfert horizontal.Enfin, nous avons séquencé et analysé deux mycobactéries non-tuberculeuses responsables d’infections opportunistes, Mycobacterium simiae et Mycobacterium septicum
In a first step, we reviewed the published genomes of 14 M. abscessus strains showing that M. abscessus sensu lato comprises of five different taxons specified by particular characteristics of microbiological and medical interests. In a second step, based on sequencing of eight intergenic spaces, we developed a Multispacer Sequence Typing technique (MST) for M. abscessus group sub-species identification and strain genotyping. MST clearly differentiates formerly “M. massiliense” organisms from other M. abscessus subsp. bolletii organisms. We also analyzed a bacteriophage from M. bolletii that we named Araucaria. We resolved Araucaria 3D structure, its capsid and connector share close similarity with several phages from Gram- or Gram+ bacteria. The helical tail decorated by radial spikes, possibly host adhesion devices. Its host adsorption device, at the tail tip, assembles features observed in phages binding to protein receptors. All together, these results suggest that Araucaria may infect its mycobacterial host using a mechanism involving adhesion to cell wall saccharides and protein, a feature that remains to be further explored. We also analysed 48 M. abscessus sequenced genomes for encoding prophages. Our phylogenetic analyses suggested that M. abscessus species were infected by different mycobacteriophages and have a different evolutionary history than the bacterial hosts and some proteins that are acquired by horizontal gene transfer mostly mycobacteriophages’ proteins and hypothetical proteins. Finally, we sequenced and analyzed two non-tuberculosis mycobacterium causing human infections, Mycobacterium simiaie and Mycobacterium septicum
APA, Harvard, Vancouver, ISO, and other styles
36

Loveless, Ian. "Binary Classification With First Phase Feature Selection forGene Expression Survival Data." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555444873531262.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Fulop, Lynda Dorothy. "Molecular analysis of flavivirus genome sequences : implications for virus classification." Thesis, University of Surrey, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308496.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Xinan. "NOVEL COMPUTATIONAL METHODS FOR SEQUENCING DATA ANALYSIS: MAPPING, QUERY, AND CLASSIFICATION." UKnowledge, 2018. https://uknowledge.uky.edu/cs_etds/63.

Full text
Abstract:
Over the past decade, the evolution of next-generation sequencing technology has considerably advanced the genomics research. As a consequence, fast and accurate computational methods are needed for analyzing the large data in different applications. The research presented in this dissertation focuses on three areas: RNA-seq read mapping, large-scale data query, and metagenomics sequence classification. A critical step of RNA-seq data analysis is to map the RNA-seq reads onto a reference genome. This dissertation presents a novel splice alignment tool, MapSplice3. It achieves high read alignment and base mapping yields and is able to detect splice junctions, gene fusions, and circular RNAs comprehensively at the same time. Based on MapSplice3, we further extend a novel lightweight approach called iMapSplice that enables personalized mRNA transcriptional profiling. As huge amount of RNA-seq has been shared through public datasets, it provides invaluable resources for researchers to test hypotheses by reusing existing datasets. To meet the needs of efficiently querying large-scale sequencing data, a novel method, called SeqOthello, has been developed. It is able to efficiently query sequence k-mers against large-scale datasets and finally determines the existence of the given sequence. Metagenomics studies often generate tens of millions of reads to capture the presence of microbial organisms. Thus efficient and accurate algorithms are in high demand. In this dissertation, we introduce MetaOthello, a probabilistic hashing classifier for metagenomic sequences. It supports efficient query of a taxon using its k-mer signatures.
APA, Harvard, Vancouver, ISO, and other styles
39

Bing, Nan. "Statistical Analysis of Gene Expression Profile: Transcription Network Inference and Sample Classification." Diss., Virginia Tech, 2004. http://hdl.handle.net/10919/11139.

Full text
Abstract:
The copious information generated from transcriptomes gives us an opportunity to learn biological processes as integrated systems; however, due to numerous sources of variation, high dimensions of data structure, various levels of data quality, and different formats of the inputs, dissecting and interpreting such data presents daunting challenges to scientists. The goal of this research is to provide improved and new statistical tools for analyzing transcriptomes data to identify gene expression patterns for classifying samples, to discover regulatory gene networks using natural genetic perturbations, to develop statistical methods for model fitting and comparison of biochemical networks, and eventually to advance our capability to understand the principles of biological processes at the system level.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
40

Findeiß, Sven. "Expanding the repertoire of bacterial (non-)coding RNAs." Doctoral thesis, Universitätsbibliothek Leipzig, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-67816.

Full text
Abstract:
The detection of non-protein-coding RNA (ncRNA) genes in bacteria and their diverse regulatory mode of action moved the experimental and bio-computational analysis of ncRNAs into the focus of attention. Regulatory ncRNA transcripts are not translated to proteins but function directly on the RNA level. These typically small RNAs have been found to be involved in diverse processes such as (post-)transcriptional regulation and modification, translation, protein translocation, protein degradation and sequestration. Bacterial ncRNAs either arise from independent primary transcripts or their mature sequence is generated via processing from a precursor. Besides these autonomous transcripts, RNA regulators (e.g. riboswitches and RNA thermometers) also form chimera with protein-coding sequences. These structured regulatory elements are encoded within the messenger RNA and directly regulate the expression of their “host” gene. The quality and completeness of genome annotation is essential for all subsequent analyses. In contrast to protein-coding genes ncRNAs lack clear statistical signals on the sequence level. Thus, sophisticated tools have been developed to automatically identify ncRNA genes. Unfortunately, these tools are not part of generic genome annotation pipelines and therefore computational searches for known ncRNA genes are the starting point of each study. Moreover, prokaryotic genome annotation lacks essential features of protein-coding genes. Many known ncRNAs regulate translation via base-pairing to the 5’ UTR (untranslated region) of mRNA transcripts. Eukaryotic 5’ UTRs have been routinely annotated by sequencing of ESTs (expressed sequence tags) for more than a decade. Only recently, experimental setups have been developed to systematically identify these elements on a genome-wide scale in prokaryotes. The first part of this thesis, describes three experimental surveys of exploratory field studies to analyze transcript organization in pathogenic bacteria. To identify ncRNAs in Pseudomonas aeruginosa we used a combination of an experimental RNomics approach and ncRNA prediction. Besides already known ncRNAs we identified and validated the expression of six novel RNA genes. Global detection of transcripts by next generation RNA sequencing techniques unraveled an unexpectedly complex transcript organization in many bacteria. These ultra high-throughput methods give us the appealing opportunity to analyze the complete RNA output of any species at once. The development of the differential RNA sequencing (dRNA-seq) approach enabled us to analyze the primary transcriptome of Helicobacter pylori and Xanthomonas campestris. For the first time we generated a comprehensive and precise transcription start site (TSS) map for both species and provide a general framework for the analysis of dRNA-seq data. Focusing on computer-aided analysis we developed new tools to annotate TSS, detect small protein-coding genes and to infer homology of newly detected transcripts. We discovered hundreds of TSS in intergenic regions, upstream of protein-coding genes, within operons and antisense to annotated genes. Analysis of 5’ UTRs (spanning from the TSS to the start codon of the adjacent protein-coding gene) revealed an unexpected size diversity ranging from zero to several hundred nucleotides. We identified and validated the expression of about 60 and about 20 ncRNA candidates in Helicobacter and Xanthomonas, respectively. Among these ncRNA candidates we found several small protein-coding genes that have previously evaded annotation in both species. We showed that the combination of dRNA-seq and computational analysis is a powerful method to examine prokaryotic transcriptomes. Experimental setups are time consuming and often combined with huge costs. Another limitation of experimental approaches is that genes which are expressed in specific developmental stages or stress conditions are likely to be missed. Bioinformatic tools build an alternative to overcome such restraints. General approaches usually depend on comparative genomic data and evolutionary signatures are used to analyze the (non-)coding potential of multiple sequence alignments. In the second part of my thesis we present our major update of the widely used ncRNA gene finder RNAz and introduce RNAcode, an efficient tool to asses local protein-coding potential of genomic regions. RNAz has been successfully used to identify structured RNA elements in all domains of life. However, our own experience and the user feedback not only demonstrated the applicability of the RNAz approach, but also helped us to identify limitations of the current implementation. Using a much larger training set and a new classification model we significantly improved the prediction accuracy of RNAz. During transcriptome analysis we repeatedly identified small protein-coding genes that have not been annotated so far. Only a few of those genes are known to date and standard proteincoding gene finding tools suffer from the lack of training data. To avoid an excess of false positive predictions, gene finding software is usually run with an arbitrary cutoff of 40-50 amino acids and therefore misses the small sized protein-coding genes. We have implemented RNAcode which is optimized for emerging applications not covered by standard protein-coding gene annotation software. In addition to complementing classical protein gene annotation, a major field of application of RNAcode is the functional classification of transcribed regions. RNA sequencing analyses are likely to falsely report transcript fragments (e.g. mRNA degradation products) as non-coding. Hence, an evaluation of the protein-coding potential of these fragments is an essential task. RNAcode reports local regions of high coding potential instead of complete protein-coding genes. A training on known protein-coding sequences is not necessary and RNAcode can therefore be applied to any species. We showed this with our analysis of the Escherichia coli genome where the current annotation could be accurately reproduced. We furthermore identified novel small protein-coding genes with RNAcode in this extensively studied genome. Using transcriptome and proteome data we found compelling evidence that several of the identified candidates are bona fide proteins. In summary, this thesis clearly demonstrates that bioinformatic methods are mandatory to analyze the huge amount of transcriptome data and to identify novel (non-)coding RNA genes. With the major update of RNAz and the implementation of RNAcode we contributed to complete the repertoire of gene finding software which will help to unearth hidden treasures of the RNA World.
APA, Harvard, Vancouver, ISO, and other styles
41

Langer, Björn. "Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoring." Doctoral thesis, Center for Systems Biology Dresden, 2017. https://tud.qucosa.de/id/qucosa%3A31172.

Full text
Abstract:
Understanding the connection between an organism’s genotype and its phenotype is a key question in evolutionary biology and genetics. It has been shown that many changes of morphological or other complex phenotypic traits result from changes in the expression pattern of key developmental genes rather than from changes in the genes itself. Such altered gene expression arises often from changes in the gene regulatory regions. That usually means the loss of important transcription factor (TF) binding sites within these regulatory regions, because the interaction between TFs and specific sites on the DNA is a key element of gene regulation. An established approach for the genome-wide mapping of genomic regions to phenotypes is the Forward Genomics framework. This approach compares the genomic sequences of species with and without the phenotype of interest based upon two ideas. First, the initial loss of a phenotype relaxes selection on all phenotypically related genomic regions and, second, this can happen independently in multiple species. Of interest are such regions that diverged specifically in phenotype-loss species. Although this principle is general, the current implementation is only well-suited for the identification of phenotype related gene-coding regions and has a limited applicability on regulatory regions. The reason is its reliance on sequence conservation as divergence measure, which does not accurately measure functional divergence of regulatory elements. In this thesis, I developed REforge, a novel implementation of the Forward Genomics principle that takes functional information of regulatory elements in the form of known phenotype-related TF into account. The consideration of the flexible organization of TF binding sites within a regulatory region, both in terms of strength and order, allows the abstraction from the region’s sequence level to its functional level. Thus, functional divergence of regulatory regions is directly compared to phenotypical divergence, which tremendously improves performance compared to Forward Genomics, as I demonstrated on synthetic and real data. Additionally, I developed TFforge which follows the same approach but aims at identifying the TFs relevant for the given phenotype. Given a multi-species alignment with a phenotype annotation and a set of regulatory regions, TFforge systematically searches for TFs whose changes in binding affinity between species fit the phenotype signature. The reported output is a ranking of the TFs according to their level of correspondence. I prove the concept of this approach on both biological data and artificially generated regions. TFforge can be used as a standalone analysis tool and also to generate the input set of TFs for a subsequent REforge analysis. I demonstrate that REforge in combination with TFforge is able to substantially outperform standard Forward Genomics, i.e. even without foreknowledge of relevant TFs. Overall, the in this thesis introduced methods are examples for the power of computational tools in comparative genomics to catalyze biological insights. I did not only show a detailed description of the methods but also conducted a real data analysis as validation. REforge and TFforge have a wide applicability on endless phenotypes, both on their own in the association of TF and regulatory region to a phenotype. Moreover, particularly their combination constitutes in respect to gene regulatory network analyses a valuable tool set for evo-devo studies.
APA, Harvard, Vancouver, ISO, and other styles
42

Cattani, Philip Thomas. "Extending Cartesian genetic programming : multi-expression genomes and applications in image processing and classification." Thesis, University of Kent, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.655651.

Full text
Abstract:
Genetic Programming (GP) is an Evolutionary Computation technique. Genetic Programming refers to a programming strategy where an artificial population of individuals represent solutions to a problem in the form of programs, and where an iterative process of selection and reproduction is used in order to evolve increasingly better solutions. This strategy is inspired by Charles Darwin's theory of evolution through the mechanism of natural selection. Genetic Programming makes use of computational procedures analogous to some of the same biological processes which occur in natural evolution, namely, crossover, mutation, selection, and reproduction. Cartesian Genetic Programming (CGP) is a form of Genetic Programming that uses directed graphs to represent programs. It is called 'Cartesian', because this representation uses a grid of nodes that are addressed using a Cartesian co-ordinate system. This stands in contrast to GP systems which typically use a tree-based system to represent programs. In this thesis, we will show how it is possible to enhance and extend Cartesian Genetic Programming in two ways. Firstly, we show how CGP can be made to evolve programs which make use of image manipulation functions in order to create image manipulation programs. These programs can then be applied to image classification tasks as well as other image manipulation tasks such as segmentation, the creation of image filters, and transforming an input image in to a target image. Secondly, we show how the efficiency - the time it takes to solve a problem - of a CGP program can sometimes be increased by reinterpreting the semantics of a CGP genome string. We do this by applying Multi-Expression Programming to CGP.
APA, Harvard, Vancouver, ISO, and other styles
43

Froschauer, Alexander, Lisa Kube, Alexandra Kegler, Christiane Rieger, and Herwig O. Gutzeit. "Tunable Protein Stabilization In Vivo Mediated by Shield-1 in Transgenic Medaka: Research Article." Public Library of Science, 2015. https://tud.qucosa.de/id/qucosa%3A29122.

Full text
Abstract:
Techniques for conditional gene or protein expression are important tools in developmental biology and in the analysis of physiology and disease. On the protein level, the tunable and reversible expression of proteins can be achieved by the fusion of the protein of interest to a destabilizing domain (DD). In the absence of its specific ligand (Shield-1), the protein is degraded by the proteasome. The DD-Shield system has proven to be an excellent tool to regulate the expression of proteins of interests in mammalian systems but has not been applied in teleosts like the medaka. We present the application of the DD-Shield technique in transgenic medaka and show the ubiquitous conditional expression throughout life. Shield-1 administration to the water leads to concentration-dependent induction of a YFP reporter gene in various organs and in spermatogonia at the cellular level.
APA, Harvard, Vancouver, ISO, and other styles
44

Rose, Dominic. "The long and the short of computational ncRNA prediction." Doctoral thesis, Universitätsbibliothek Leipzig, 2010. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-62158.

Full text
Abstract:
Non-coding RNAs (ncRNAs) are transcripts that function directly as RNA molecule without ever being translated to protein. The transcriptional output of eukaryotic cells is diverse, pervasive, and multi-layered. It consists of spliced as well as unspliced transcripts of both protein-coding messenger RNAs and functional ncRNAs. However, it also contains degradable non-functional by-products and artefacts - certainly a reason why ncRNAs have long been wrongly disposed as transcriptional noise. Today, RNA-controlled regulatory processes are broadly recognized for a variety of ncRNA classes. The thermoresponsive ROSE ncRNA (repression of heat shock gene expression) is only one example of a regulatory ncRNA acting at the post-transcriptional level via conformational changes of its secondary structure. Bioinformatics helps to identify novel ncRNAs in the bulk of genomic and transcriptomic sequence data which are produced at ever increasing rates. However, ncRNA annotation is unfortunately not part of generic genome annotation pipelines. Dedicated computational searches for particular ncRNAs are veritable research projects in their own right. Despite best efforts, ncRNAs across the animal phylogeny remain to a large extent uncharted territory. This thesis describes a comprehensive collection of exploratory bioinformatic field studies designed to de novo predict ncRNA genes in a series of computational screens and in a multitude of newly sequenced genomes. Non-coding RNAs can be divided into subclasses (families) according to peculiar functional, structural, or compositional similarities. A simple but eligible and frequently applied criterion to classify RNA species is length. In line, the thesis is structured into two parts: We present a series of pilot-studies investigating (1) the short and (2) the long ncRNA repertoire of several model species by means of state-of-the-art bioinformatic techniques. In the first part of the thesis, we focus on the detection of short ncRNAs exhibiting thermodynamically stable and evolutionary conserved secondary structures. We provide evidence for the presence of short structured ncRNAs in a variety of different species, ranging from bacteria to insects and higher eukaryotes. In particular, we highlight drawbacks and opportunities of RNAz-based ncRNA prediction at several hitherto scarcely investigated scenarios, as for example ncRNA prediction in the light of whole genome duplications. A recent microarray study provides experimental evidence for our approach. Differential expression of at least one-sixth of our drosophilid RNAz predictions has been reported. Beyond the means of RNAz, we moreover manually compile sophisticated annotation of short ncRNAs in schistosomes. Obviously, accumulating knowledge about the genetic material of malaria causing parasites which infect millions of humans world-wide is of utmost scientific interest. Since the performance of any comparative genomics approach is limited by the quality of its input alignments, we introduce a novel light-weight and performant genome-wide alignment approach: NcDNAlign. Although the tool is optimized for speed rather than sensitivity and requires only a minor fraction of CPU time compared to existing programs, we demonstrate that it is basically as sensitive and specific as competing approaches when applied to genome-wide ncRNA gene finding and analysis of ultra-conserved regions. By design, however, prediction approaches that search for regions with an excess of mutations that maintain secondary structure motifs will miss ncRNAs that are unstructured or whose structure is not well conserved in evolution. In the second part of the thesis, we therefore overcome secondary structure prediction and, based on splice site detection, develop novel strategies specifically designed to identify long ncRNAs in genomic sequences - probably the open problem in current RNA research. We perform splice site anchored gene-finding in drosophilids, nematodes, and vertebrate genomes and, at least for a subset of obtained candidate genes, provide experimental evidence for expression and the existence of novel spliced transcripts undoubtedly confirming our approach. In summary, we found evidence for a large number of previously undescribed RNAs which consolidates the idea of non-coding RNAs as an abundant class of regulatory active transcripts. Certainly, ncRNA prediction is a complex task. This thesis, however, rationally advises how to unveil the RNA complement of newly sequenced genomes. Since our results have already established both subsequent computational as well as experimental studies, we believe to have enduringly stimulated the field of RNA research and to have contributed to an enriched view on the subject.
APA, Harvard, Vancouver, ISO, and other styles
45

Leonhardt, Sabrina, Enrico Büttner, Anna Maria Gebauer, Martin Hofrichter, and Harald Kellner. "Draft Genome Sequence of the Sordariomycete Lecythophora (Coniochaeta) hoffmannii CBS 245.38." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-235647.

Full text
Abstract:
Lecythophora (Coniochaeta) hoffmannii, a soil- and lignocellulose-inhabiting sordariomycete (Ascomycota) that can also live as a facultative tree pathogen causing soft rot, belongs to the family Coniochaetaceae. The strain CBS 245.38 sequenced here was assembled into 869 contigs, has a size of 30.8 Mb, and comprises 10,596 predicted protein-coding genes.
APA, Harvard, Vancouver, ISO, and other styles
46

Sarov, Mihail. "A recombineering pipeline for functional genomics applied to Caenorhabditis elegans." Doctoral thesis, Technische Universität Dresden, 2006. https://tud.qucosa.de/id/qucosa%3A24870.

Full text
Abstract:
Genome sequencing and annotation projects define the complete sets of RNA and protein components for living systems. They also present the challenge to generate functional information for thousands of previously uncharacterized genes. Protein tagging with fluorescent or affinity tags provides a generic way to describe protein expression and localization patterns and protein-protein interactions. The genome wide application of this approach in Saccharomyces cerevisiae has resulted in a comprehensive picture of the core proteome of a simple, well-studied model system. Extending these studies to more complex, multicellular model organisms, would allow us to place protein function onto a 4 dimensional space-time map, and will improve our understanding of the complex processes of development and differentiation. This will require efficient protein tagging methods and new high performance tags. Here we present a generic protein tagging approach for the model nematode Caenorhabditis elegans. The method is based on recombination mediated DNA engineering of genomic BAC clones into tagged transgenes for integrative transformation. C.elegans offers unique advantages for function discovery through protein tagging: compact and a well annotated genome, combined with a simple and well-understood anatomy and pattern of development. However, the methods for protein tagging in C.elegans have so far been inefficient and largely dependent on artificial cDNA based constructs, which can lack important regulatory elements. In contrast, our approach combines the advantages of authentic regulation with a new application of recombineering, which is simple, fast and efficient. For the first time we apply liquid culture cloning for multiple recombineering steps. This is particularly important when high throughput applications are considered, as it offers significant advantages in scale up and automation. We show that the BAC derived transgenes can be used for stable, integrative transformation in C. elegans. We show that the tagged transgene can take over the function of its endogenous counterpart. Using florescent reporter, we reproduce known and document new expression patterns. The second part of the thesis describes a project that we undertook to develop improved double affinity cassettes for protein purification. We evaluated the performance of 5 new double tag combinations in vitro and in mammalian culture cells. All of the new cassettes performed well and present a valuable tool for protein interaction studies in higher model systems.
APA, Harvard, Vancouver, ISO, and other styles
47

Campos, Lázara Pereira. "Genome relationships among Lotus species based on random amplified polymorphic DNA (RAPD)." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56888.

Full text
Abstract:
The usefulness of RAPDs (Random Amplified Polymorphic DNA) to distinguish among different taxa of Lotus was evaluated. The following species were included: L. corniculatus, L. tenuis, L. alpinus, L. japonicus, and L. uliginosus. Several accessions for each species were studied. Following DNA extraction, amplification reactions were performed in a Hybaid DNA Thermal Cycler, and the product visualized according to a standard procedure. Twenty primers were used for each species/accession. Clear bands and several polymorphisms were obtained for all primers. A phenogram was drawn based on the genetic distance among the species. L. alpinus appears as the most distant species from L. corniculatus, followed by L. uliginosus, L. tenuis, and L. japonicus. With the exception of L. alpinus, these findings are in agreement with previous experimental studies in the L. corniculatus group. The use of a greater number of primers and increased number of species may provide a greater resolution of the systematics of these taxa.
APA, Harvard, Vancouver, ISO, and other styles
48

Pardini, Amanda T. "Genome evolution and systematics of the Paenungulata (Afrotheria, Mammalia)." Thesis, Stellenbosch : Stellenbosch University, 2006. http://hdl.handle.net/10019.1/21697.

Full text
Abstract:
Dissertation (PhD)--University of Stellenbosch, 2006.
ENGLISH ABSTRACT: Increases in taxonomic sampling and the numbers and types of markers used in phylogenetic studies have resulted in a marked improvement in the interpretation of systematic relationships within Eutheria. However, relationships within several clades, including Paenungulata (Hyracoidea, Sirenia, Proboscidea), remain unresolved. Here the combination of i) a rapid radiation and ii) a deep divergence have resulted in limited phylogenetic signal available for analysis. Specifically i) a short internode separating successive branching events reduces the time available for changes to occur, while ii) the longer the time since divergence, the greater the opportunity for signal to be negatively affected by homoplasy. This is evident in both molecular and morphological data where an overall consensus on paenungulate relationships is lacking. Morphological analysis of anatomical and fossil evidence favours the association of Sirenia (S) and Proboscidea (P) (Tethytheria) to the exclusion of Hyracoidea (H); further, support for uniting these three taxa as Paenungulata is contentious. In contrast, molecular data provide strong support for Paenungulata but intra-ordinal relationships are ambiguous. Although results from mitochondrial DNA sequence data favour Tethytheria, there is no consensus of support for this clade from nuclear DNA. Nuclear DNA is typified by node instability but favours H+P in the largest concatenation of sequences. Due to the expected increased effect from homoplasy and consequently the increased likelihood for misleading signal, it is unclear which result is most likely to represent the “true” tree. An analysis of available and added intron sequences to characterise signal heterogeneity among nuclear DNA and mitochondrial DNA partitions indicated that the phylogenetic utility of partitions varies considerably. Subpartitioning of the data according to similar evolutionary processes/characteristics (e. g., mtDNA vs. nDNA and codon position) revealed new insights into the signal structure of the data set; specifically i) that nuclear DNA first codon positions, and to a lesser degree second codon sites, provide convincing support for H+P, and ii) that support for S+P by faster evolving sites within mtDNA suggests that this may be the result of misleading signal. If H+P represents the “true tree”, then support for this clade indicates that phylogenetic signal has been reduced over time as a result of multiple hits, which explains the presence of (hidden) support in slower evolving sites where homoplasy is less likely to occur, in contrast to faster evolving sites where no support for H+P was observed. In an attempt to provide further resolution from an alternative perspective to that possible with DNA sequence data, chromosomal rearrangements were identified among the three paenungulate lineages. Using comparative chromosome painting, unique changes within each order and specific to Paenungulata were characterised, however, intra-ordinal synapomorphies were not recovered. Although this may suggest a hard polytomy, the slow to moderate rate of evolution estimated from the data is likely not sufficient relative to the rapid radiation associated with the paenungulate node. Further examination of chromosomal rearrangements at a higher level of resolution may yet reveal informative changes.
AFRIKAANSE OPSOMMING: ‘n Toename in die aantal taksonomiese monsters sowel as die aantal en soort merkers wat in filogenetiese studies gebruik word, het tot ‘n merkbare verbetering in die vertolking van sistematiese verwantskappe binne die Eutheria gelei. Desondanks bly ‘n aantal klades (stamlyne), met inbegrip van Paenungulata (Hyracoidea, Sirenia, Proboscidea), steeds onopgelos. By laasgenoemde het die kombinasie van i) ‘n vinnige radiasie en ii) ‘n diep divergensie die filogenetiese sein wat vir analise beskikbaar is, beperk. Meer spesifiek sal i) opeenvolgende vertakkings wat deur kort internodusse geskei word die beskikbare tyd waartydens veranderings kan intree, verminder, terwyl ii) ‘n toename in tydsverloop sedert divergensie die kans dat die sein deur homoplasie nadelig beïnvloed sal word, vergroot. Dit word in sowel molekulêre en morfologiese data, waar ‘n oorhoofse konsensus t.o.v. verwantskappe van Paenungulata ontbreek, waargeneem. Morfologiese analise van anatomiese en fossielbewyse ondersteun die samevoeging van Sirenia (S) en Proboscidea (P) (Tethytheria) ten koste van Hyracoidea (H). Ondersteuning vir die samevoeging van dié drie taksa as Paenungulata is egter aanvegbaar. In teenstelling hiermee word Paenungulata sterk deur molekulêre data ondersteun, al bly die verwantstkappe op intra-orde vlak, steeds onduidelik. Alhoewel die resultate van mitochondriale DNA op Tethytheria dui, word die klade nie deur data van kern-DNA ondersteun nie. Kern-DNA word gekarakteriseer deur node instabiliteit maar verkies H+P in die grootste samevoeging van geen volgordes. Na aanleiding van die verwagte toename in die effek van homoplasie en die gevolglik groter kans op ‘n misleidende sein, is dit nie duidelik watter van die resultate die meer korrekte filogenetiese stamboom verteenwoordig nie. Analise van beskikbare en nuut toegevoegde intron-volgordes om sein-heterogeniteit tussen kern- en mitochondriale DNA verdelings te karakteriseer, toon dat die filogenetiese nut van verdelings beduidend verskil. Onderverdeling van die data op grond van soortgelyke evolusionêre prosesse/karaktereienskappe (bv. mtDNA vs. nDNA, en kodonposisie) het na nuwe insigte in die seinstruktuur van die datastel gelei. Meer spesifiek dat i) kern-DNA se eerste kodonposisies, en tot ‘n mindere mate die tweede kodonposisies, H+P oortuigend ondersteun en ii) dat ondersteuning vir S+P deur posisies binne mtDNA wat vinnig verander, op ‘n misleidende sein mag dui. As H+P die korrekte stamboom verteenwoordig dui ondersteuning vir die klade op ‘n filogenetiese sein wat met verloop van tyd as gevolg van veelvuldige seinvoorkomste verklein het. Dit verklaar die aanwesigheid van versluierde ondersteuning in stadig-veranderende posisies waar die neiging tot homoplasie klein is, in teenstelling met posisies wat vinniger verander en waar ondersteuning vir H+P nie waargeneem is nie. Op soek na verhoogde resolusie vanuit ‘n ander perspektief as DNA-volgordebepaling, is chromosomale herrangskikkings in die drie stamlyne van Paenungulata nagevors. Met behulp van vergelykende chromosoomkleuring is unieke veranderings binne elke orde en spesifiek binne Paenungulata gekarakteriseer, maar geen sinapomorfe kenmerke is op die intra-orde vlak gevind nie. Alhoewel dit op ‘n onopgeloste politomie mag dui, is die stadige tot matige evolusietempo wat van die data afgelei word, relatief tot die vinnige radiasie wat met die Paenungulata-nodus geassosieer word, waarskynlik onvoldoende vir ‘n oplossing. Verdere navorsing oor chromosomale herrangskikkings met ‘n hoër resolusievlak mag addisionele insiggewende veranderings aantoon.
APA, Harvard, Vancouver, ISO, and other styles
49

Foroughi, pour Ali. "Linear Approximations for Second Order High Dimensional Model Representation of the Log Likelihood Ratio." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1555419601408423.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Dehman, Alia. "Spatial clustering of linkage disequilibrium blocks for genome-wide association studies." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLE013/document.

Full text
Abstract:
Avec le développement récent des technologies de génotypage à haut débit, l'utilisation des études d'association pangénomiques (GWAS) est devenue très répandue dans la recherche génétique. Au moyen de criblage de grandes parties du génome, ces études visent à caractériser les facteurs génétiques impliqués dans le développement de maladies génétiques complexes. Les GWAS sont également basées sur l'existence de dépendances statistiques, appelées déséquilibre de liaison (DL), habituellement observées entre des loci qui sont proches dans l'ADN. Le DL est défini comme l'association non aléatoire d'allèles à des loci différents sur le même chromosome ou sur des chromosomes différents dans une population. Cette caractéristique biologique est d'une importance fondamentale dans les études d'association car elle permet la localisation précise des mutations causales en utilisant les marqueurs génétiques adjacents. Néanmoins, la structure de blocs complexe induite par le DL ainsi que le grand volume de données génétiques constituent les principaux enjeux soulevés par les études GWAS. Les contributions présentées dans ce manuscrit comportent un double aspect, à la fois méthodologique et algorithmique. Sur le plan méthodologie, nous proposons une approche en trois étapes qui tire profit de la structure de groupes induite par le DL afin d'identifier des variants communs qui pourraient avoir été manquées par l'analyse simple marqueur. Dans une première étape, nous effectuons une classification hiérarchique des SNPs avec une contrainte d'adjacence et en utilisant le DL comme mesure de similarité. Dans une seconde étape, nous appliquons une approche de sélection de modèle à la hiérarchie obtenue afin de définir des blocs de DL. Enfin, nous appliquons le modèle de régression Group Lasso sur les blocs de DL inférés. L'efficacité de l'approche proposée est comparée à celle des approches de régression standards sur des données simulées, semi-simulées et réelles de GWAS. Sur le plan algorithmique, nous nous concentrons sur l'algorithme de classification hiérarchique avec contrainte spatiale dont la complexité quadratique en temps n'est pas adaptée à la grande dimension des données GWAS. Ainsi, nous présentons, dans ce manuscrit, une mise en œuvre efficace d'un tel algorithme dans le contexte général de n'importe quelle mesure de similarité. En introduisant un paramètre $h$ défini par l'utilisateur et en utilisant la structure de tas-min, nous obtenons une complexité sous-quadratique en temps de l'algorithme de classification hiérarchie avec contrainte d'adjacence, ainsi qu'une complexité linéaire en mémoire en le nombre d'éléments à classer. L'intérêt de ce nouvel algorithme est illustré dans des applications GWAS
With recent development of high-throughput genotyping technologies, the usage of Genome-Wide Association Studies (GWAS) has become widespread in genetic research. By screening large portions of the genome, these studies aim to characterize genetic factors involved in the development of complex genetic diseases. GWAS are also based on the existence of statistical dependencies, called Linkage Disequilibrium (LD) usually observed between nearby loci on DNA. LD is defined as the non-random association of alleles at different loci on the same chromosome or on different chromosomes in a population. This biological feature is of fundamental importance in association studies as it provides a fine location of unobserved causal mutations using adjacent genetic markers. Nevertheless, the complex block structure induced by LD as well as the large volume of genetic data arekey issues that have arisen with GWA studies. The contributions presented in this manuscript are in twofold, both methodological and algorithmic. On the methodological part, we propose a three-step approach that explicitly takes advantage of the grouping structure induced by LD in order to identify common variants which may have been missed by single marker analyses. In thefirst step, we perform a hierarchical clustering of SNPs with anadjacency constraint using LD as a similarity measure. In the second step, we apply a model selection approach to the obtained hierarchy in order to define LD blocks. Finally, we perform Group Lasso regression on the inferred LD blocks. The efficiency of the proposed approach is investigated compared to state-of-the art regression methods on simulated, semi-simulated and real GWAS data. On the algorithmic part, we focus on the spatially-constrained hierarchical clustering algorithm whose quadratic time complexity is not adapted to the high-dimensionality of GWAS data. We then present, in this manuscript, an efficient implementation of such an algorithm in the general context of anysimilarity measure. By introducing a user-parameter $h$ and using the min-heap structure, we obtain a sub-quadratic time complexity of the adjacency-constrained hierarchical clustering algorithm, as well as a linear space complexity in thenumber of items to be clustered. The interest of this novel algorithm is illustrated in GWAS applications
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography