Dissertations / Theses on the topic 'Analyse RNAseq'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 39 dissertations / theses for your research on the topic 'Analyse RNAseq.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Benoit-Pilven, Clara. "Analyse de l’épissage alternatif dans les données RNAseq : développement et comparaison d’outils bioinformatiques." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1280/document.
Full textAlternative splicing is the biological process that explain the large diversity of the proteome compared to the limited number of genes. This process allow a qualitative regulation (expressed isoforms) and a quantitative regulation (expression level). The growth of high-trhoughtput sequencing methods enabled the analysis of these two aspects (quantitative and qualitative regulation) with the same experiment (RNA-Seq). During my PhD, I developped a new tool to analyse alternative splicing from RNA-Seq data. I also participated in the automatisation of the complet pipeline of RNA-Seq analysis (expression and splicing). This pipeline has been used to analyse various datasets. Then, we compared our mapping-first tool, FaRLine, with an assembly-first method, KisSplice. We found that the predictions of the two pipelines overlapped (70\% of exon skipping events were common), but with noticeable differences. The mapping-first approach allowed to find more lowly expressed splicing variants, and was better in predicting exons overlapping repeated elements. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in families of paralog genes. Our work point out where the bioinformatic improvment are still needed. Finally, I participated in the developpement of bioinformatics methods to help biologists to evualuate the fonctionnal impact of splicing alteration : at the level of the protein product by annotating fonctionnal domain at the exon level or at a more global level, by integrating splicing modifications in signaling pathways
Meunier, Léa. "Analyse de signatures transcriptomiques et épigénétiques des carcinomes hépatocellulaires." Thesis, Université de Paris (2019-....), 2020. http://www.theses.fr/2020UNIP7082.
Full textElucidating deregulated transcriptional and epigenetic processes in cancers is fundamental to better understand the biological pathways involved and to propose a therapy adapted to the molecular phenotype of each tumor. Classical unsupervised classification approaches define, for each tumor type, the main molecular groups. However, these methods, applied to complex tumors such as hepatocellular carcinoma (HCC), the 3rd cause of cancer-associated mortality worldwide, define groups that remain relatively heterogeneous and only imperfectly reflect the diversity of biological mechanisms at work in these tumors. During my PhD, I developed a, innovative strategy involving independent component analysis (ICA) to extract signatures of precise biological processes in large transcriptomic and epigenetic tumor data sets. This new approach allowed me to identify groups of co-regulated genes associated with specific phenotypes or molecular alterations. Similarly, independent component analysis of the methylomes of 738 HCC revealed 13 stable epigenetic signatures preferentially active in specific tumors and CpG sites. These signatures include signatures previously associated with ageing and cancer, but also new hyper- and hypomethylation signatures related to specific driver events and molecular subgroups. The work presented in this thesis sheds light on the diversity of molecular processes remodeling liver cancer transcriptomes and methylomes, improve the understanding of the molecular mechanisms involved in hepatic carcinogenesis and provides a statistical framework to unravel the signatures of these processes
Riquier, Sébastien. "Dans les abysses du transcriptome : découverte de nouveaux biomarqueurs de cellules souches mésenchymateuses par analyse approfondie du RNAseq." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTT004.
Full textThe development of RNA sequencing, or RNAseq, have opened the path of intensive biomarkers research in many areas of biology. The complete information of the transcriptome contained in the output data, allows a bioinformatician to surpass the current knowledge and to access, thanks to advanced computer pipelines, to signatures of new interest. In this thesis, we are showing that these potential markers, classically used in clinical and pathological conditions, can be used to characterize cell types without extensive markers profile. We have studied mesenchymal stem cells, a type of adult multipotent stem cells, strongly used in clinics but without strickly specific positive markers. Our study mainly focuses on the search for non-annotated, long non-coding RNAs. These RNAs, also called "lncRNA", constitute an emerging class of transcripts and are still lightly explored.In addition, this category presents a highly tissue-related specificity. We have developed an optimized RNAseq pipeline for the reconstruction and quantification of non-annotated lncRNAs.Using public data from RNAseq, coming from different sources of MSC and other cell types, we have identified new non-annotated lncRNAs clearly and specifically expressed in MSCs. to complete this project, we developed Kmerator.jl, a bioinformatical tool that allows to decompose a transcript in k-mer, and select specific sub-sequences, in order to search and quantify at a faster rate the signature of our candidates in a large number of RNAseq dataset. After validation of these new biomarkers of MSCs by qPCR, we used several computer tools to predict their potential functions. Finally, we analyzed single-cell RNAseq data to address the heterogeneity of expression within MSC populations
Gonzalez, Claudia. "Étude des mécanismes immunitaires impliqués dans le contrôle de l'infection par le virus Nipah." Electronic Thesis or Diss., Lyon, École normale supérieure, 2024. http://www.theses.fr/2024ENSL0035.
Full textNipah virus (NiV) is a highly pathogenic paramyxovirus for humans, listed as a priority for research and development by the WHO. Pteropus bats are the natural asymptomatic reservoir of NiV and we investigated on the mechanisms allowing them to control the infection. For this, we conducted a comparative transcriptomic analysis between bat and human cells. We observed distinct immune profiles at the basal state. Bat cells show high levels of receptors like TLR-3 and TLR-8, which may explain the rapid viral detection. Additionally, the kinetics of gene expression resulted to be different among the two species, as we detected early gene activation in bats, while the response in humans was delayed. Early activation of the NF-kB pathway was observed in bats, and among these factors, c-Rel was one of the most expressed genes. Functional analysis revealed that both human and bat c-Rel proteins induce NF-kB pathway activation and are inhibited by the non-structural protein NiV-W. We also demonstrated the ability of bat c-Rel, unlike human c-Rel, to modulate IFN response promoter (ISRE) activity after IFN stimulation. This study suggests that the rapid and transient response of Pteropus may promote better regulation of pro-inflammatory responses and contribute to their ability to control NiV infection. Since no treatment or vaccine is available for this virus, the work also focused on evaluating a fusion inhibitor acting on virus entry and a vaccine. The latter, combining the CD40 receptor with the ectodomain of the G protein, was validated in vivo, demonstrating complete protection Nipah virus (NiV) is a highly pathogenic paramyxovirus for humans, listed as a priority for research and development by the WHO. Pteropus bats are the natural asymptomatic reservoir of NiV and we investigated on the mechanisms allowing them to control the infection. For this, we conducted a comparative transcriptomic analysis between bat and human cells. We observed distinct immune profiles at the basal state. Bat cells show high levels of receptors like TLR-3 and TLR-8, which may explain the rapid viral detection. Additionally, the kinetics of gene expression resulted to be different among the two species, as we detected early gene activation in bats, while the response in humans was delayed. Early activation of the NF-kB pathway was observed in bats, and among these factors, c-Rel was one of the most expressed genes. Functional analysis revealed that both human and bat c-Rel proteins induce NF-kB pathway activation and are inhibited by the non-structural protein NiV-W. We also demonstrated the ability of bat c-Rel, unlike human c-Rel, to modulate IFN response promoter (ISRE) activity after IFN stimulation. This study suggests that the rapid and transient response of Pteropus may promote better regulation of pro-inflammatory responses and contribute to their ability to control NiV infection. Since no treatment or vaccine is available for this virus, the work also focused on evaluating a fusion inhibitor acting on virus entry and a vaccine. The latter, combining the CD40 receptor with the ectodomain of the G protein, was validated in vivo, demonstrating complete protection in immunized monkeys. These results open new perspectives for innovative antiviral approaches
Gössringer, Markus. "In-vivo-Analysen zur Funktion bakterieller RNase-P-Proteine in Bacillus subtilis." [S.l. : s.n.], 2004. http://archiv.ub.uni-marburg.de/diss/z2004/0529/.
Full textAhmed, Firdous. "Identification of potential biomarkers in lung cancer as possible diagnostic agents using bioinformatics and molecular approaches." University of the Western Cape, 2015. http://hdl.handle.net/11394/4862.
Full textLung cancer remains the leading cause of cancer deaths worldwide, with the majority of cases attributed to non-small cell lung carcinomas. At the time of diagnosis, a large percentage of patients present with advanced stage of disease, ultimately resulting in a poor prognosis. The identification circulatory markers, overexpressed by the tumour tissue, could facilitate the discovery of an early, specific, non-invasive diagnostic tool as well as improving prognosis and treatment protocols. The aim was to analyse gene expression data from both microarray and RNA sequencing platforms, using bioinformatics and statistical analysis tools. Enrichment analysis sought to identify genes, which were differentially expressed (p < 0.05, FC > 2) and had the potential to be secreted into the extracellular circulation, by using Gene Ontology terms of the Cellular Component. Results identified 1 657 statically significant genes between normal and early lung cancer tissue, with only 1 gene differentially expressed (DE) between the early and late stage disease. Following statistical analysis, 171 DE genes selected as potential early stage biomarkers. The overall sensitivity of RNAseq, in comparison to arrays enabled the identification of 57 potential serum markers. These genes of interest were all downregulated in the tumour tissue, and while they did not facilitate the discovery of an ideal diagnostic marker based on the set criteria in this study, their roles in disease initiation and progression require further analysis.
Mary, Catherine. "Utilisation séquentielle des sites accepteurs d'épissage lors de l'expression du provirus HIV-1 : analyse par cartographie à la RNAse." Lyon 1, 1994. http://www.theses.fr/1994LYO1T236.
Full textAhmed, Fathima Zuba. "Unravelling genes responsible for successful anthocyanin production in Nicotiana benthamiana." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/230763/1/Fathima%20Zuba_Ahmed_Thesis.pdf.
Full textMarchant, Axelle. "Le processus de domiciliation des punaises hématophages vectrices de la maladie de Chagas : apport de l’étude du transcriptome chimiosensoriel." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS008/document.
Full textIn Latin America, the bloodsucking bugs (Triatominae, Hemiptera, Reduviidae) are vectors of the parasite Trypanosoma cruzi, which causes Chagas disease. More than five million people are infected. Even if chemical control campaigns are effective against vectors, the disease persists due to the recolonization of human habitations by vectors from natural habitats. Some species have the capacity to adapt to anthroposystems (domiciliation process), while other related species do not. Understanding this capacity to adapt is crucial from an epidemiological perspective to target species at risk to humans. The capacity to adapt to a new habitat could be linked to changes in the repertoire of chemosensory system genes, particularly for odorant binding proteins (OBP) and chemosensory proteins (CSP), which are important proteins to detect various odor stimuli. This study is based on the chemosensory system of Triatominae to document the adaptation process and then the domiciliation of the vectors. Transcriptomic data obtained by high-throughput sequencing were used to annotate and list the chemosensory genes and also to compare their expression in bloodsucking bugs from different habitats. The relationship between changes in these genes in different Triatominae species and their ability to adapt to a new habitat was evaluated. The species T. brasiliensis, which is in the process of domiciliation in Brazil with sylvatic, peridomiciliary and domiciliary populations, and various species of the genus Rhodnius from diverse habitats were studied, especially the two sibling species R. robustus, sylvatic in the Amazonia and R. prolixus mostly domiciliary throughout its geographical range. In the absence of a reference genome for T. brasiliensis, a reference transcriptome via de novo assembly (data 454 and Illumina) was achieved. The reference transcriptomes for 10 Rhodnius species were also established using the de novo assembly method. A genome reference based method on R. prolixus was also used to assemble the transcriptome of the two species R. prolixus and R. robustus. In the different species of the Triatominae studied, the chemosensory gene repertoire showed a high diversity and genic expansions compared to that of others Paraneoptera, which could reflect adaptive process. Furthermore, a positive correlation was shown between the number of OBP genes in Rhodnius species and their domiciliation ability, suggesting that this gene family is involved in the adaptation to anthropogenic environment. The differential expression analyses on the T. brasiliensis populations and the R. prolixus / R. robustus species showed that some transcripts are differentially expressed according to the environment in which the bugs have evolved, especially the chemosensory genes (OBP, CSP) and also genes involved in the circadian rhythm and foraging behavior (Takeout), in the response to environmental stress such as detoxification genes (P450, glutathione S-transferase), in resistance to climatic changes (heat-shock proteins) and in protection from the external environment (cuticular proteins).This work has helped make available to the scientific community powerful tools for studying the process of domiciliation of Chagas disease vectors (transcriptome, gene repertoire). It also revealed genes that could be involved in the adaptation and/or phenotypic plasticity in response to a change in habitat. Understanding the molecular basis of vector adaptation to human dwellings opens the potential to develop new tools to control the disease vectors, for example by disrupting chemical communication
Loe-mie, Yann. "Contribution bioinformatique à l' analyse du transcriptome humain." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4002/document.
Full textIn first part of this thesis I have analysed small RNA-seq transcriptome data. I have noticed : - a large fraction of reads can't be aligned perfectly on reference genome - lot of reads are very short (15-18 nt) and don't match on previously known functionnal small RNAs. These experiments are designed for miRNA discovery and bioinformatics analysis of these data use alignments on genome or on known small RNA precursors sequences. I have eliminated the alignment and I have clustered these sequences. This clustering let me to observe these data with a new view in wich the genomic location is not central and open the gate to discover unconventional events. The second part is the analysis of deregulate genes by the silencing of the gene REST/NRSF in mouse N18 cell line. This gene is a transcription factor and it works as a repressor of neuronal genes in non neuronal cells. This deregulate genes repertoire potentially contains key genes in neuron biology. We found in this repertoire a network of genes centered on SWI/SNF complex including SMARCA2. This gene was associated to schizophrenia (SZ) in association studies and structural variation studies. In this network we found another genes associated to SZ. We show that these genes exhibit positive evolution in primate compare to rodents
Bosch, Andreas [Verfasser], and Michael [Akademischer Betreuer] Thomm. "Analyse von kleinen RNAs mit RNA-bindenden Proteinen in Pyrococcus furiosus / Andreas Bosch. Betreuer: Michael Thomm." Regensburg : Universitätsbibliothek Regensburg, 2014. http://d-nb.info/1054191646/34.
Full textHillebrand, Arne Thomas. "Funktionelle Analyse kleiner, nichtkodierender RNAs in den Organellen von Plasmodium falciparum und Charakterisierung neuer RNA-Bindeproteine in Apicomplexa." Doctoral thesis, Humboldt-Universität zu Berlin, 2019. http://dx.doi.org/10.18452/20433.
Full textMalaria is caused by a single celled parasite of the genus Plasmodium. Especially in Sub-Saharan Africa, -this disease is a huge challenge for the health system. The cells of the parasites contain two organelles of endosymbiotic origin, the apicoplast and the mitochondrion. Both organelles still contain a reduced genome. For the expression of the genome, the organelles depends on a large set of nuclear encoded proteins. The mitochondrial genome has a unique structure. With only 6 kb it is one of the smallest genomes discovered to date and it contains only three protein coding genes along with 34 small ribosomal genes. The regulation of expression, the processing of the polycistronic primary transcript and the regulation of the RNA metabolism in the mitochondria of Plasmodium remains largely unknown. Through high-throughput sequencing of cellular RNA, we discovered a population of small RNAs originating in the mitochondria of P. falciparum. Similar RNA accumulations can be detected in the organelles of higher plants and are caused by helical-hairpin repeat proteins like PPR proteins (pentatricopeptide repeat). To search for plant-like RNA binding proteins similar to PPR proteins we scanned the nuclear genome of P. falciparum for helical-hairpin repeat proteins. We found a novel protein family with repetitive helical elements of 37 amino acid length we termed heptatricopeptide repeat (HPR) proteins. In the rodent Malaria parasite P. berghei, the mitochondrial localization for 7 HPR-Proteins was verified. In knockout studies, we also showed that almost all HPR proteins are essential for blood stages of P. berghei. In RNA-binding assays, one recombinant HPR protein showed unspecific interaction with mitochondrial transcripts but not with DNA. By broadening the search, we discovered that HPR proteins are found in multiple eukaryotic taxa.
Köhler, Karen [Verfasser], and Roland Karl [Akademischer Betreuer] Hartmann. "Mechanistic and structural analyses of different non-coding RNAs in bacteria / Karen Köhler. Betreuer: Roland Karl Hartmann." Marburg : Philipps-Universität Marburg, 2014. http://d-nb.info/1059855682/34.
Full textFindeiß, Sven. "Expanding the repertoire of bacterial (non-)coding RNAs." Doctoral thesis, Universitätsbibliothek Leipzig, 2011. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-67816.
Full textCavé-Radet, Armand. "Évolution de la tolérance aux Hydrocarbures Aromatiques Polycycliques (HAPs) chez les spartines polyploïdes : analyses physiologiques et régulations transcriptomiques par les micro-ARNs." Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1B064/document.
Full textWe explored mechanisms involved in tolerance to organic xenobiotics belonging to PAHs (phenanthrene), in the context of allopolyploid speciation (hybrid genome duplication). We developed a comparative approach, using a recent allopolyploidization model including the hexaploid parental species S. alterniflora and S. maritima, and the allopolyploid S. anglica, which resulted from genome doubling of the F1 hybrid S. x townsendii. Integrative approach based on physiological and molecular analyses highlights that hybridization and genome doubling enhance tolerance to xenobiotics in Spartina. The paternal parent S. maritima exhibits higher sensitivity compared to the maternal parent S. alterniflora. Various transcriptomic analyses were performed, to identify de novo stress responsive transcripts, and to annotate small RNAs (miRNAs, their target genes, and siRNAs) involved in gene expression and transposable element regulations. Differential expression analyses in response to stress allowed us to develop a putative miRNA regulatory network (miRNA/target genes) in response to PAH, functionally validated in Arabidopsis as heterologous system. An exploratory profiling of Spartina rhizosphere microbiome exposed to phenanthrene was also performed to characterize environmental degradation abilities, in the perspective of optimizing green remediation strategies
Li, Siwei. "High Throughput Automated Comparative Analysis of RNAs Using Isotope Labeling and LC-MS/MS." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1384427990.
Full textBajak, Edyta Zofia. "Genotoxic stress: novel biomarkers and detection methods : uncovering RNAs role in epigenetics of carcinogenesis /." Stockholm, 2005. http://diss.kib.ki.se/2005/91-7140-415-5/.
Full textHillebrand, Arne Thomas [Verfasser], Christian [Gutachter] Schmitz-Linneweber, Kai [Gutachter] Matuschewski, and Volker [Gutachter] Knoop. "Funktionelle Analyse kleiner, nichtkodierender RNAs in den Organellen von Plasmodium falciparum und Charakterisierung neuer RNA-Bindeproteine in Apicomplexa / Arne Thomas Hillebrand ; Gutachter: Christian Schmitz-Linneweber, Kai Matuschewski, Volker Knoop." Berlin : Humboldt-Universität zu Berlin, 2019. http://d-nb.info/1193989310/34.
Full textTufail, Muhammad Aammar. "Use of plant growth promoting endophytic bacteria to alleviate the effects of individual and combined abiotic stresses on plants as an innovative approach to discover new delivery strategies for bacterial bio-stimulants." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/305571.
Full textGeißen, René [Verfasser], Rolf [Akademischer Betreuer] Wagner, and Michael [Akademischer Betreuer] Feldbrügge. "Analyse der physiologischen Rolle der 6S RNA aus Escherichia coli und Vergleich der molekularen Mechanismen zwischen 6S RNAs aus E. coli und Cyanobakterien / René Geißen. Gutachter: Michael Feldbrügge. Betreuer: Rolf Wagner." Düsseldorf : Universitäts- und Landesbibliothek der Heinrich-Heine-Universität Düsseldorf, 2011. http://d-nb.info/101543472X/34.
Full textAghamirzaie, Delasa. "Isoform-Specific Expression During Embryo Development in Arabidopsis and Soybean." Diss., Virginia Tech, 2016. http://hdl.handle.net/10919/73054.
Full textPh. D.
Soumya, R. Deo. "Characterization of the terminal region RNAs of the West Nile virus genome and their interaction with the small isoform of 2' 5'-oligoadenylate synthetases (OAS)." Plos.org, 2014. http://hdl.handle.net/1993/30733.
Full textOctober 2015
Tufail, Muhammad Aammar. "Use of plant growth promoting endophytic bacteria to alleviate the effects of individual and combined abiotic stresses on plants as an innovative approach to discover new delivery strategies for bacterial bio-stimulants." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/305571.
Full textVan, Den Elzen Antonia. "Etude du complexe Dom34-Hbs1 ressemblant aux facteurs de terminaison : analyse fonctionnelle de ses rôles dans le contrôle qualité des ARN et dans la stimulation de la traduction par dissociation des ribosomes inactifs." Thesis, Strasbourg, 2013. http://www.theses.fr/2013STRAJ110/document.
Full textProtein production is a cyclic process that consists of four stages: initiation, elongation, termination and recycling. During recycling the subunits of terminated ribosomes are dissociated, to make them available for new rounds of translation. If ribosomes stall during translation, ribosomes cannot terminate properly and canonical recycling cannot occur. Cells have mechanisms to rescue these stalled ribosomes. A complex formed by the factors Dom34 and Hbs1 induces their dissociation. This compex in RNA quality control, targeting RNAs that cause ribosomal stalling. In this thesis the importance of several functional sites of the Dom34-Hbs1 complex for the degradation of these RNA sis investigated. Details of and therelationship between RNA quality control pathways in which the complex functions are further investigated. Finally, a new role of this complex, dissociating inactive ribosomes and there by making their subunits available to re-enter the translation cycle is described
Moraes, Rodrigo de. "Uma investigação empírica e comparativa da aplicação de RNAs ao problema de mineração de opiniões e análise de sentimentos." Universidade do Vale do Rio dos Sinos, 2013. http://www.repositorio.jesuita.org.br/handle/UNISINOS/3411.
Full textMade available in DSpace on 2015-05-04T17:25:43Z (GMT). No. of bitstreams: 1 Rodrigo Morais.pdf: 5083865 bytes, checksum: 69563cc7178422ac20ff08fe38ee97de (MD5) Previous issue date: 2013
Nenhuma
A área de Mineração de Opiniões e Análise de Sentimentos surgiu da necessidade de processamento automatizado de informações textuais referentes a opiniões postadas na web. Como principal motivação está o constante crescimento do volume desse tipo de informação, proporcionado pelas tecnologia trazidas pela Web 2.0, que torna inviável o acompanhamento e análise dessas opiniões úteis tanto para usuários com pretensão de compra de novos produtos quanto para empresas para a identificação de demanda de mercado. Atualmente, a maioria dos estudos em Mineração de Opiniões e Análise de Sentimentos que fazem o uso de mineração de dados se voltam para o desenvolvimentos de técnicas que procuram uma melhor representação do conhecimento e acabam utilizando técnicas de classificação comumente aplicadas, não explorando outras que apresentam bons resultados em outros problemas. Sendo assim, este trabalho tem como objetivo uma investigação empírica e comparativa da aplicação do modelo clássico de Redes Neurais Artificiais (RNAs), o multilayer perceptron , no problema de Mineração de Opiniões e Análise de Sentimentos. Para isso, bases de dados de opiniões são definidas e técnicas de representação de conhecimento textual são aplicadas sobre essas objetivando uma igual representação dos textos para os classificadores através de unigramas. A partir dessa reresentação, os classificadores Support Vector Machines (SVM), Naïve Bayes (NB) e RNAs são aplicados considerandos três diferentes contextos de base de dados: (i) bases de dados balanceadas, (ii) bases com diferentes níveis de desbalanceamento e (iii) bases em que a técnica para o tratamento do desbalanceamento undersampling randômico é aplicada. A investigação do contexto desbalanceado e de outros originados dele se mostra relevante uma vez que bases de opiniões disponíveis na web normalmente apresentam mais opiniões positivas do que negativas. Para a avaliação dos classificadores são utilizadas métricas tanto para a mensuração de desempenho de classificação quanto para a de tempo de execução. Os resultados obtidos sobre o contexto balanceado indicam que as RNAs conseguem superar significativamente os resultados dos demais classificadores e, apesar de apresentarem um grande custo computacional para treinamento, proporcionam tempos de classificação significantemente inferiores aos do classificador que apresentou os resultados de classificação mais próximos aos dos resultados das RNAs. Já para o contexto desbalanceado, as RNAs se mostram sensíveis ao aumento de ruído na representação dos dados e ao aumento do desbalanceamento, se destacando nestes experimentos, o classificador NB. Com a aplicação de undersampling as RNAs conseguem ser equivalentes aos demais classificadores apresentando resultados competitivos. Porém, podem não ser o classificador mais adequado de se adotar nesse contexto quando considerados os tempos de treinamento e classificação, e também a diferença pouco expressiva de acerto de classificação.
The area of Opinion Mining and Sentiment Analysis emerges from the need for automated processing of textual information about reviews posted in the web. The main motivation of this area is the constant volume growth of such information, provided by the technologies brought by Web 2.0, that makes impossible the monitoring and analysis of these reviews that are useful for users, who desire to purchase new products, and for companies to identify market demand as well. Currently, the most studies of Opinion Mining and Sentiment Analysis that make use of data mining aims to the development of techniques that seek a better knowledge representation and using classification techniques commonly applied and they not explore others classifiers that work well in other problems. Thus, this work aims a comparative empirical research of the ap-plication of the classical model of Artificial Neural Networks (ANN), the multilayer perceptron, in the Opinion Mining and Sentiment Analysis problem. For this, reviews datasets are defined and techniques for textual knowledge representation applied to these aiming an equal texts rep-resentation for the classifiers. From this representation, the classifiers Support Vector Machines (SVM), Naïve Bayes (NB) and ANN are applied considering three data context: (i) balanced datasets, (ii) datasets with different unbalanced ratio and (iii) datasets with the application of random undersampling technique for the unbalanced handling. The unbalanced context inves-tigation and of others originated from it becomes relevant once datasets available in the web ordinarily contain more positive opinions than negative. For the classifiers evaluation, metrics both for the classification perform and for run time are used. The results obtained in the bal-anced context indicate that ANN outperformed significantly the others classifiers and, although it has a large computation cost for the training fase, the ANN classifier provides classification time (real-time) significantly less than the classifier that obtained the results closer than ANN. For the unbalanced context, the ANN are sensitive to the growth of noise representation and the unbalanced growth while the NB classifier stood out. With the undersampling application, the ANN classifier is equivalent to the others classifiers attaining competitive results. However, it can not be the most appropriate classifier to this context when the training and classification time and its little advantage of classification accuracy are considered.
Ferrarini, Margherita. "From exome to whole genome sequencing: mining for inconsistencies and functional elements in coding and non-coding regions." Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3424933.
Full textNel corso dell’ultimo ventennio l’avanzamento tecnologico nel campo del sequenziamento del DNA ha portato a un enorme aumento della quantità di dati di sequenziamento accessibili a ricercatori e genetisti. Questa crescita è stata accompagnata dallo sviluppo di strumenti necessari all’analisi dei dati; tra questi il genoma umano di riferimento è senza dubbio una risorsa indispensabile. È noto che il genoma di riferimento non sempre rappresenta la reale sequenza consenso della popolazione umana, poiché alleli rari ed errori di sequenziamento sono stati inclusi in essa. Inoltre, duplicazioni genomiche sono spesso mal assemblate e, di conseguenza, possono essere trovate nel genoma di riferimento come collassate, generando così false varianti. In questa tesi è descritta la ricerca approfondita di incongruenze tra il genoma umano di riferimento (GRCh37 e GRCh38) e alcune delle più popolari risorse di genetica umana, come il 1000 Genomes Project, per scovare alleli minori e inconsistenze genetiche. Per identificare duplicazioni genomiche non riportate nel genoma, è stata poi condotta un’ampia ricerca di eterozigosità sbilanciata. Questa analisi ha dimostrato che incongruenze ed errori sono molto più frequenti di quanto atteso. Infatti, alleli minori con una frequenza <10% sono stati trovati in media ogni ~7,000 basi e tra essi sono presenti molte varianti rare mai riportate nei database. Lo screening sistematico per l’eterozigosità sbilanciata ha mostrato inoltre che ~86,000 varianti possono derivare da duplicazioni genomiche non riportate nella sequenza di riferimento e che alcune di esse coinvolgono geni importanti come MAP2K3 e KCNJ12. I risultati descritti in questo lavoro possono contribuire alla definizione di una sequenza di riferimento del genoma umano altamente accurata. Inoltre, questi stessi risultati potranno essere utili ai genetisti umani nel processo di filtraggio e selezione delle varianti potenzialmente associate a malattie. L’avanzamento nel settore del sequenziamento del DNA ha condotto inoltre dell’utilizzo sempre maggiore degli approcci di sequenziamento dell’intero genoma, sia nel campo della ricerca sia nella diagnosi clinica, rivelando così che la gran parte degli SNP associati a malattia è localizzata nelle regioni non codificanti del genoma umano. Tuttavia, l’interpretazione funzionale delle varianti non codificanti è ancora una questione problematica. Parte del mio lavoro ha riguardato anche questo aspetto, con lo scopo di sviluppare un metodo per la prioritizzazione delle varianti non codificanti. Questo metodo, descritto nell’ultimo capitolo della tesi, si basa su un approccio di genomica comparata per l’identificazione di domini funzionali in geni ortologhi di organismi primati. I primi passaggi di questo approccio hanno dimostrato essere molto buoni per l’identificazione dei geni ortologhi, ma ulteriore lavoro è necessario per ottimizzare il processo di allineamento multiplo delle sequenze e l’identificazione dei domini conservati.
Laugier, Laurie. "Identification de marqueurs de susceptibilité dans les formes chroniques de la maladie de Chagas." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0226.
Full textChagas disease is a parasitic disease caused by the protozoan Trypanosoma cruzi and transmitted by the hematophagous insects. The disease is composed by acute and chronic phases. Among the infected individuals, 30 % develop chronic form. They suffer from heart, digestive (esophagus, colon) and cardiodigestives injury. Our study was focused on patients with dilated chagasic cardiomyopathy (CCC). Our goal is to identify susceptibility genes that may be involved in the development of chronic forms. Our study revealed a variation in the expression of certain genes between CCC group and controls. We are also interested in epigenetic processes that can regulate the expression of genes. A study of the DNA methylation crossed with the transcriptome allowed us to identify genes presenting both variations in expression and methylation. For some of these genes we demonstrated that methylation is responsible for the expression variation observed. Finally, we studied a long non-coding RNA called MIAT. Our study demonstrated that it is overexpressed in CCC compared to controls and in a murine model infected by T. cruzi. Furthermore, the analysis of the expression of micro-RNAs crossed with transcriptome analysis allowed us to identify several micro-RNAs whose functions are essential in the regulation of gene expression. Finally, a proteomic study allowed us to demonstrate an increase in the production of protein for certain genes, correlated with the increase in expression levels observed
Guidi, Mònica. "Micro RNA-Mediated regulation of the full-length and truncated isoforms of human neurotrophic tyrosine kinase receptor type 3 (NTRK 3)." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7114.
Full textnervous system. Neurotrophin-3 binds preferentially to its high-affinity receptor
NTRK3, which exists in two major isoforms in humans, the full-length kinaseactive
form (150 kDa) and a truncated non-catalytic form (50 kDa). The two
variants show different 3'UTR regions, indicating that they might be differentially
regulated at the post-transcriptional level. In this work we explore how
microRNAs take part in the regulation of full-length and truncated NTRK3,
demonstrating that the two isoforms are targeted by different sets of microRNAs.
We analyze the physiological consequences of the overexpression of some of the
regulating microRNAs in human neuroblastoma cells. Finally, we provide
preliminary evidence for a possible involvement of miR-124 - a microRNA with no
putative target site in either NTRK3 isoform - in the control of the alternative
spicing of NTRK3 through the downregulation of the splicing repressor PTBP1.
Las neurotrofinas y sus receptores constituyen una familia de factores cruciales
para el desarrollo del sistema nervioso. La neurotrofina 3 ejerce su función
principalmente a través de una unión de gran afinidad al receptor NTRK3, del cual
se conocen dos isoformas principales, una larga de 150KDa con actividad de tipo
tirosina kinasa y una truncada de 50KDa sin dicha actividad. Estas dos isoformas
no comparten la misma región 3'UTR, lo que sugiere la existencia de una
regulación postranscripcional diferente. En el presente trabajo se ha explorado
como los microRNAs intervienen en la regulación de NTRK3, demostrando que las
dos isoformas son reguladas por diferentes miRNAs. Se han analizado las
consecuencias fisiológicas de la sobrexpresión de dichos microRNAs utilizando
células de neuroblastoma. Finalmente, se ha estudiado la posible implicación del
microRNA miR-124 en el control del splicing alternativo de NTRK3 a través de la
regulación de represor de splicing PTBP1.
Dadkhahi, Sara [Verfasser]. "Expression von extrazellulärer RNase und Analyse ihrer Aktivität in Endothelzellen / vorgelegt von Sara Dadkhahi." 2008. http://d-nb.info/990946339/34.
Full textFu, Yu. "Computational analyses of small silencing RNAs." Thesis, 2018. https://hdl.handle.net/2144/33235.
Full text2020-12-11T00:00:00Z
Βρυζάκη, Ελευθερία. "Η ριβονουκλεάση Ρ (RNase P) των ανθρώπινων λεμφοκυττάρων." Thesis, 2008. http://nemertes.lis.upatras.gr/jspui/handle/10889/1596.
Full textRibonuclease P (RNase P) is the enzyme responsible for the 5΄ maturation of the precursor tRNA molecules, participating in tRNA biogenesis and therefore in protein synthesis. It catalyses the endonucleolytic cleavage of a phosphodiester bond in the presence of Mg2+ and results in the production of molecules that bear 3΄ hydroxyl and 5΄ phosphoric ends. Most forms of RNase P are ribonucleoproteins consisting of an essential RNA and protein subunits. The RNA component of the bacterial RNase P was one of the first identified catalytic RNAs. So far, RNase P and the ribosome are the only ribozymes known to be conserved in all kingdoms of life (bacteria, archaea and eucarya). In view of the vital importance of lymphocytes for an effective immune system, we proceeded to the RNase P isolation from human peripheral lymphocytes. The enzyme was purified with cation exchange phosphocellulose chromatography and the optimal conditions were determined. Herein, it was investigated the effect of the synthetic retinoids (cis-retinoic acid and acitretin), neomycin B, as well as chloroquine diphosphate, on the RNase P activity. Cis-retinoic acid, acitretin and neomycin B exerted a dose-dependent inhibitory effect on RNase P activity from human lymphocytes, wlile the activity was not affected in the presence of chloroquine diphosphate. A detailed kinetic analysis showed that the inhibition caused by acitretin was of competitive type, whereas that caused by neomycin B was of noncompetitive type. The kinetic constant Km of RNase P activity isolated from lymphocytes for the tRNA maturation reaction has been estimated equal to 245 nM and the Vmax value has been estimated equal to 0.42 pmol/min. Finally, the isolation of RNase P from human peripheral lymphocytes will enable the study of the possible involvement of this ribozyme in the pathogenetic mechanisms of diverse autoimmune, inflammatory and neoplastic cutaneous disorders and may facilitate the further development of RNase P-based technology for gene therapy of infectious and neoplastic dermatoses.
Gößringer, Markus [Verfasser]. "In-vivo-Analysen zur Funktion bakterieller RNase-P-Proteine in Bacillus subtilis / vorgelegt von Markus Gößringer." 2004. http://d-nb.info/973038152/34.
Full textDietrich, Sascha. "Analyse und Charakterisierung regulatorischer Vorgänge in Bacillus licheniformis." Doctoral thesis, 2015. http://hdl.handle.net/11858/00-1735-0000-0022-5FD3-B.
Full textWoodhams, Michael D., Peter F. Stadler, David Penny, and Lesley J. Collins. "RNase MRP and the RNA processing cascade in the eukaryotic ancestor." 2007. https://ul.qucosa.de/id/qucosa%3A31877.
Full textFerreira, Joana Catarina da Rocha. "Genomic and transcriptomic analyses in cancers related with viral infection." Master's thesis, 2016. http://hdl.handle.net/1822/47439.
Full textIn the past 30 years, accumulated evidence has been supporting viral infection as one factor responsible for 15-20% of human malignancies worldwide (W. S. Liang et al. 2014; McLaughlin-Drubin and Munger 2008). Studies on oncogenic viruses have proved their importance on cellular malfunction along the carcinogenic process, and showed that their association with cancer can amount from 15% to 100% (McLaughlin-Drubin and Munger 2008), depending on the type of tumour. With the large amount of genomic and metagenomic information available on public international consortia, such as TCGA database, it is nowadays possible to indirectly infer viral infections from the human centred omics studies, as a portion of the reads will align in viruses and bacteria. Taking as starting point the research made by Tang et al. 2013, we focused on cervical (CESC), hepatocellular (LIHC) and head and neck squamous cell (HNSC) carcinomas, which are known to show a high proportion of viral-positive cases (Tang et al. 2013). We downloaded RNAseq data from 309, 424 and 566 samples, respectively, and run the unmapped reads against a reference database of viruses (downloaded from NCBI) by using the tools Batch, SAMTOOLS, Bowtie and PRINTSEQ. Quantification of each virus was performed using parts per million reads (ppm) and only viruses with ppm above 10 were considered as positively infecting the sample. We confirmed that around 94% of CESC samples were infected, mostly by HPV (Human papillomavirus) and specifically by the HPV16 strain. Nearly 32% of LIHC were infected by HBV (hepatitis B virus). Almost 17% of HNSC samples were infected, and the HPV16 was the most common present virus. The evaluation of differential enrichment of metabolic pathways between infected and noninfected groups, for each cancer type, was performed in GSEA. Signs of enrichment for infection and immune related pathways were evident in CESC infected group, while in LIHC and HNSC infected groups the enrichment was mostly related with DNA replication and repair. This seems to indicate that infection is especially active in CESC, contradicting previous claims that tumorigenesis in cervix was not directly linked with infection. For the three cancer types, the viruses integrate their genome in the host genome, affecting DNA replication, maintenance and repair. In our investigation of integration of HPV16 genome in one HNSC tumor sample, we confirmed integration in the human RAD51B gene that codes a protein involved in DNA repair by homologous recombination. We thus confirmed that HPV16 can act both as indirect and direct carcinogen. The infection, most probably through the integration of the viral genome in the host genome, increased the amount of somatic mutations in the infected group in LIHC, but not in HNSC where tobacco consumption is also an important carcinogen. The low number of non-infected samples in CESC did not allow a reliable evaluation of changes in the amount of somatic mutations. Even so, in both LIHC and HNSC infected groups, some somatic mutations occurred in the context of immune-related pathways, showing that they can contribute to render these individuals susceptible to infection. Also, when checking expression of HPV16 genes in five samples each from CESC and HNSC, we confirmed that E6 and E7 genes are amongst the ones more expressed in many samples, while E2 is not expressed. E6 and E7 have been said to be preferentially integrated in the host genome, while E2, which controls their expression, is not integrated or it is disrupted. It is believed that the overexpression of E6 and E7 initiates carcinogenesis. The viral infection rates inferred here from mining the omics databases are very similar to the ones evaluated by standard methods (Tang et al. 2013), showing that public international consortia can indirectly provide interesting insights into the involvement of viral infection in tumorigenesis. The high number of samples per tumor, the wide geographic origin of the samples, and the high-throughput characterisation for different omics platforms allows multilayer comparisons and evaluations, in a scale not affordable before.
Nos últimos 30 anos foram-se acumulando evidências que têm vindo a apoiar a infecção viral como um factor responsável por 15-20% dos tumores malignos em humanos a nível mundial (W. S. Liang et al. 2014; McLaughlin-Drubin and Munger 2008). Estudos sobre os vírus oncogénicos demonstraram a sua importância no mau funcionamento celular ao longo do processo carcinogénico e demonstraram que a sua associação com o cancro varia entre 15% e 100% (McLaughlin-Drubin and Munger 2008), dependendo do tipo de tumor. Com a grande quantidade de informação genómica e metagenómica acessível nos consórcios internacionais públicos, tais como a base de dados TCGA, hoje em dia é possível inferir indiretamente infecções virais a partir de estudos genómicos centrados em humanos, uma vez que parte das reads irá alinhar com vírus e bactérias. Tomando como ponto de partida a pesquisa feita por Tang et al. 2013, concentramo-nos nos cancros cervical (CESC), hepatocelular (LIHC) e da cabeça e pescoço (HNSC), que são conhecidos por apresentar uma alta proporção de casos virais-positivos (Tang et al. 2013). Fizemos download de dados RNA-Seq de 309, 424 e 566 amostras, respectivamente, e comparamos unmapped reads contra uma base de dados viral de referência (retirada da base de dados do NCBI) usando as ferramentas Batch, SAMTOOLS, bowtie e PRINTSEQ. A quantificação de cada vírus foi feita usando partes por milhão (ppm) e apenas vírus com ppm acima de 10 foram considerados como estando a infectar positivamente uma amostra. Confirmamos que cerca de 94% das amostras de CESC foram infectadas, principalmente por HPV (papilomavírus humano) e, especificamente, pela estirpe HPV16. Quase 32% das amostras LIHC foram infectadas por HBV (vírus da hepatite B). E por volta de 17% de amostras HNSC foram infectadas e o HPV16 foi o vírus mais comum. A avaliação de enriquecimento diferencial de vias metabólicas entre grupos infectados e não infectados, para cada tipo de cancro, foi realizada por GSEA. Os sinais de enriquecimento para infecção e vias relacionadas com sistema imune eram evidentes no grupo infectado CESC, enquanto nos grupos infectados de LIHC e HNSC o enriquecimento era principalmente relacionado com replicação e reparação de DNA. Este facto parece indicar que a infecção é especialmente ativa no CESC, contradizendo alegações anteriores de que a tumorigenese no colo do útero não estava diretamente ligada à infecção. Nos três tipos de cancro, os vírus integraram os seus genomas no genoma do hospedeiro, afetando a replicação, manutenção e reparação do DNA. No nosso estudo sobre a integração do genoma de HPV16 numa amostra de tumor HNSC, foi confirmada a integração viral no gene humano RAD51B que codifica uma proteína implicada na reparação de DNA por recombinação homóloga. Desta forma, conseguimos confirmar que HPV16 pode atuar tanto como agente cancerígeno directo e indirecto. Provavelmente através da integração do genoma viral no genoma do hospedeiro, a infecção aumentou a quantidade de mutações somáticas no grupo de amostras infectadas em LIHC, mas não em HNSC onde o consumo de tabaco é também um importante agente cancerígeno. O reduzido número de amostras não-infectadas em CESC não permitiu uma comparação fiável da quantidade de mutações somáticas entre grupos de infectados e não-infectados. Ainda assim, nos grupos infectados de LIHC e HNSC, algumas mutações somáticas ocorreram no contexto de vias relacionadas com o sistema imunológico, mostrando que podem contribuir para tornar estes indivíduos susceptíveis à infecção. Além disso, ao verificar a expressão dos genes de HPV16 em cinco amostras de CESC e de HNSC, confirmou-se que os genes E6 e E7 estão entre os mais expressos em muitas das amostras, enquanto que o E2 não é expresso. Os genes E6 e E7 são conhecidos por serem preferencialmente integrados no genoma do hospedeiro, ao contrário do gene E2, o qual controla a expressão daqueles, que não é integrado ou é fragmentado. Acredita-se que é a sobreexpressão de E6 e E7 que inicia a carcinogénese. As taxas de infecção viral inferidas neste trabalho por mining de bases de dados omicos são muito semelhantes aos obtidos pelos métodos tradicionais (Tang et al. 2013), mostrando que a informação disponível nos consórcios internacionais públicos pode elucidar, indiretamente, sobre o envolvimento da infecção viral na tumorigénese. O elevado número de amostras por tumor, a grande variedade de origem geográfica das amostras e a caracterização de alto rendimento para diferentes plataformas omicas permitem comparações e avaliações múltiplas, numa escala não acessível anteriormente.
YiLi-Chang and 張怡莉. "Analyses of Replication Signals of Satellite RNAs Associated with Cucumber Mosaic Virus." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/74960729152939618250.
Full text國立中興大學
生物科技學研究所
92
Cucumber mosaic virus (CMV) is the type species of the genus Cucumovirus. Some strains of CMV harbor satellite RNAs (satRNAs) that are dependent on CMV for their replication, encapsidation, and transmission. The specific aim of this study is to investigate the 5’- and 3’-terminal signals on satRNAs required for high efficiency replication and to distinguish the effects of nucleotide sequences and structures. The 5’- and/or 3’-terminal 30 nucleotides were replaced by random sequences with synthetic primers in polymerase chain reactions (PCR). The transcripts of mutants were used to inoculate the plants with the helper virus, CMV-NT9. The viabilities of the mutants were analyzed by double-stranded RNA (dsRNA) analyses. The nucleotide sequences of the mutant satRNAs progenies were investigated using RT-PCR followed by DNA sequencing. To differentiate between inoculated transcripts and wild-type satRNA contaminations, a selection marker was introduced into the satRNAs by mutating the Thymine164 into an Adenine to create an NdeI restriction site. Since the 5’- and/or 3’-terminal sequences were randomized, poly(A) polmerase was used to add poly(A) tail at 3’ends of dsRNAs, and oligo(dT) primers were used to amplify the progenies. In addition, the multimeric replication forms of satRNAs were also used as templates for RT-PCR with inverse internal primer pair to amplify the terminal sequences. A viable mutant was recovered from the progenies of 324 inoculation tests. Nucleotide sequencing revealed that the mutant contains an addition of T in the 5’ terminal T stretch region and a deletion of C in the 3’ terminal triple region. The sequence analysis of the multimeric forms revealed that some nucleotides at the junctions between monomers were deleted, and the deletion sites contain similar sequence. The formation of multimers may come from homologous template-switching. Sequence analyses and structural predictions revealed evident differences between viable and non-viable satRNAs. Together, the results indicated that the current wild-type terminal sequences of satRNAs are the most competitive patterns under our experimental conditions. Both structure and sequence may play important roles in the replication of satRNAs.
horng-woei, Yang, and 楊鴻偉. "Analyse of Phloem Sap Transcription Profile Revealed Two Floral Activators, FVE and AGL24 are Phloem-mobile RNAs." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/17931981243904097436.
Full text國防醫學院
生命科學研究所
97
Plants take advantage of the vascular system to operate environmental stimulates for fine-tuning their developmental programs. Recent evidence shows that the FLOWERING LOCUS T (FT) protein is the long-sought-after florigen that integrates the photoperiod variation perceived in the leaves. However, evidence also supports that other yet-to-be identified systemic regulators participate in floral induction. To this end, we investigated phloem exudates from excised broccoli (Brassica oleracea) inflorescences. Microarray and RT-PCR analyses revealed that at least two RNAs of floral regulators, FVE and AGAMOUS-LIKE 24 (AGL24), are present in the phloem sap. Enzymatic analysis demonstrated that the phloem-sap RNAs contain a 5' cap and a polyadenlylation tail, which suggests that phloem sap contains typical mRNAs. Arabidopsis grafting experiments were used to test whether these RNAs move long distance along the phloem translocation stream. Consistent with previous reports, Arabidopsis transformants expressing FVE and AGL24 displayed an early flowering phenotype. When wild-type scions were grafted onto P35S-FVE or P35S-AGL24 transformant stocks, the RNAs of transgenic FVE and AGL24 were detected from the wild-type scions. Thus, both FVE and AGL24 RNAs can move long distance across the graft union. Our data support the notion that multiple systemic floral regulators may participate in floral regulation.
Chen, Ho-Ming, and 陳荷明. "Exploring the Arabidopsis Non-coding RNAs: Inference from the Bioinformatic Analyses of Small RNA Sequencing Data." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/64874065425101072828.
Full text國立中興大學
生物科技學研究所
97
Non-coding RNAs (ncRNAs) play vital roles in translation, splicing, RNA processing, RNA modification and regulation of gene expression. The advancement in ncRNA discovery is evolving along with the finding of new classes of ncRNAs and the invention of revolutionary sequencing platforms. High-throughput sequencing technologies greatly facilitate the study of small regulatory RNAs which are 20 to 30 nt in length. High-throughput sequencing data of 18-26 nt small RNA fragments are a mixture of small regulatory RNAs and degraded products from coding RNAs or ncRNAs. The proper choice of computational approaches in analyzing small RNA sequencing data is crucial for the dissection of small RNAs derived from distinct origins, for making discovery of new ncRNAs and for revealing embedded knowledge in these ncRNAs. To date, the development of computational approaches mostly focused on the discovery of microRNAs (miRNAs). Computational approaches which use small RNA sequencing data for the studies of other ncRNAs are much in need. This dissertation presents the development of novel bioinformatics approaches to analyze small RNA sequencing data and showed that the analyses have increased the understandings of Arabidopsis ncRNAs. In first part, by the use of abundant small RNA sequencing data from the public domain, a new bioinformatics approach was developed for the finding of trans-acting small interfering RNAs (ta-siRNAs), a new class of small regulatory RNAs. Different from that of other siRNAs, the biogenesis of ta-siRNAs is dependent on the cleavage directed by miRNAs. Moreover, most ta-siRNAs are clustered in 21-nt increments relative to the cleavage site. Based on this characteristic, this study developed the first computational algorithm which successfully recovered both known and novel Arabidopsis loci producing ta-siRNAs from complex small RNA sequencing data. A group of newly identified ta-siRNAs was produced by the cleavage directed by a ta-siRNA instead of by miRNAs as was reported previously. The results indicate the existence of a small RNA regulatory cascade initiated by miRNA-directed cleavage and followed by the consecutive production of ta-siRNAs. The second part focuses on the use of small RNA sequencing data in the annotation of small nucleolar RNAs (snoRNAs). Small RNAs from snoRNAs are often considered to be degraded products of snoRNAs and were filtered out without further analysis in previous studies. However, the analysis of Arabidopsis small RNA sequencing data revealed an enrichment of small RNAs at the termini of snoRNAs. With the use of this feature, this study developed a new method which was able to re-annotate known snoRNAs lacking well defined termini and to discover novel snoRNA species. The finding of new snoRNAs also supported that there are additional RNA modification sites on Arabidopsis ribosomal RNAs and spliceosomal small nuclear RNAs. This research demonstrates that, by combining pre-existing biological knowledge and appropriate mining approaches, small RNA sequencing data represent a wealth treasure for the studies of small regulatory RNAs as well as other ncRNAs.
Zhan, Shuhua. "Genome-wide expression analysis and regulation of microRNAs and cis natural antisense transcripts in Arabidopsis thaliana." 2012. http://hdl.handle.net/10214/3274.
Full textPhD thesis
NSERC