Dissertations / Theses: 'Sequence motif'

1

Leung, Chi-ming. "Motif discovery for DNA sequences." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B3859755X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Leung, Chi-ming, and 梁志銘. "Motif discovery for DNA sequences." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B3859755X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Liu, Agatha H. "Motif-based mining of protein sequences /." Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/6894.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Dinh, Hieu Trung. "Algorithms for DNA Sequence Assembly and Motif Search." University of Connecticut, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Siu, Man-hung. "Finding motif pairs from protein interaction networks." Click to view the E-thesis via HKUTO, 2008. http://sunzi.lib.hku.hk/hkuto/record/B40987760.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Siu, Man-hung, and 蕭文鴻. "Finding motif pairs from protein interaction networks." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B40987760.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Al-Ouran, Rami. "Motif Selection: Identification of Gene Regulatory Elements using Sequence CoverageBased Models and Evolutionary Algorithms." Ohio University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1449003717.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Lin, Jasper Chua. "Application of the Trp-cage motif to polypeptide folding questions /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/8684.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Chen, Bernard. "Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries." Digital Archive @ GSU, 2009. http://digitalarchive.gsu.edu/cs_diss/42.

Full text

Abstract:

Protein sequence motifs are gathering more and more attention in the field of sequence analysis. The recurring patterns have the potential to determine the conformation, function and activities of the proteins. In our work, we obtained protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is essential. We use two granular computing models, Fuzzy Improved K-means (FIK) and Fuzzy Greedy K-means (FGK), in order to efficiently generate protein motif information. After that, we develop an efficient Super Granular SVM Feature Elimination model to further extract the motif information. During the motifs searching process, setting up a fixed window size in advance may simplify the computational complexity and increase the efficiency. However, due to the fixed size, our model may deliver a number of similar motifs simply shifted by some bases or including mismatches. We develop a new strategy named Positional Association Super-Rule to confront the problem of motifs generated from a fixed window size. It is a combination approach of the super-rule analysis and a novel Positional Association Rule algorithm. We use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified HHK clustering, which requires no parameter setup to identify the similarities and dissimilarities between the motifs. The positional association rule is created and applied to search similar motifs that are shifted some residues. By analyzing the motifs results generated by our approaches, we realize that these motifs are not only significant in sequence area, but also in secondary structure similarity and biochemical properties.

APA, Harvard, Vancouver, ISO, and other styles

10

Pei, Shermin. "Identification of functional RNA structures in sequence data." Thesis, Boston College, 2016. http://hdl.handle.net/2345/bc-ir:107275.

Full text

Abstract:

Thesis advisor: Michelle M. Meyer
Thesis advisor: Peter Clote
Structured RNAs have many biological functions ranging from catalysis of chemical reactions to gene regulation. Many of these homologous structured RNAs display most of their conservation at the secondary or tertiary structure level. As a result, strategies for natural structured RNA discovery rely heavily on identification of sequences sharing a common stable secondary structure. However, correctly identifying the functional elements of the structure continues to be challenging. In addition to studying natural RNAs, we improve our ability to distinguish functional elements by studying sequences derived from in vitro selection experiments to select structured RNAs that bind specific proteins. In this thesis, we seek to improve methods for distinguishing functional RNA structures from arbitrarily predicted structures in sequencing data. To do so, we developed novel algorithms that prioritize the structural properties of the RNA that are under selection. In order to identify natural structured ncRNAs, we bring concepts from evolutionary biology to bear on the de novo RNA discovery process. Since there is selective pressure to maintain the structure, we apply molecular evolution concepts such as neutrality to identify functional RNA structures. We hypothesize that alignments corresponding to structured RNAs should consist of neutral sequences. During the course of this work, we developed a novel measure of neutrality, the structure ensemble neutrality (SEN), which calculates neutrality by averaging the magnitude of structure retained over all single point mutations to a given sequence. In order to analyze in vitro selection data for RNA-protein binding motifs, we developed a novel framework that identifies enriched substructures in the sequence pool. Our method accounts for both sequence and structure components by abstracting the overall secondary structure into smaller substructures composed of a single base-pair stack. Unlike many current tools, our algorithm is designed to deal with the large data sets coming from high-throughput sequencing. In conclusion, our algorithms have similar performance to existing programs. However, unlike previous methods, our algorithms are designed to leverage the evolutionary selective pressures in order to emphasize functional structure conservation
Thesis (PhD) — Boston College, 2016
Submitted to: Boston College. Graduate School of Arts and Sciences
Discipline: Biology

APA, Harvard, Vancouver, ISO, and other styles

11

Mak, Chi-ho. "Characterization of a recombination signal sequence and kappa-B motif DNA binding protein, KRc /." The Ohio State University, 1997. http://rave.ohiolink.edu/etdc/view?acc_num=osu148794334152931.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Naik, Ashwini. "Mining Gene Regulatory Motifs Using the Concept of Sequence Coverage." Ohio University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1408699463.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Langer, Björn. "Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoring." Doctoral thesis, Center for Systems Biology Dresden, 2017. https://tud.qucosa.de/id/qucosa%3A31172.

Full text

Abstract:

Understanding the connection between an organism’s genotype and its phenotype is a key question in evolutionary biology and genetics. It has been shown that many changes of morphological or other complex phenotypic traits result from changes in the expression pattern of key developmental genes rather than from changes in the genes itself. Such altered gene expression arises often from changes in the gene regulatory regions. That usually means the loss of important transcription factor (TF) binding sites within these regulatory regions, because the interaction between TFs and specific sites on the DNA is a key element of gene regulation. An established approach for the genome-wide mapping of genomic regions to phenotypes is the Forward Genomics framework. This approach compares the genomic sequences of species with and without the phenotype of interest based upon two ideas. First, the initial loss of a phenotype relaxes selection on all phenotypically related genomic regions and, second, this can happen independently in multiple species. Of interest are such regions that diverged specifically in phenotype-loss species. Although this principle is general, the current implementation is only well-suited for the identification of phenotype related gene-coding regions and has a limited applicability on regulatory regions. The reason is its reliance on sequence conservation as divergence measure, which does not accurately measure functional divergence of regulatory elements. In this thesis, I developed REforge, a novel implementation of the Forward Genomics principle that takes functional information of regulatory elements in the form of known phenotype-related TF into account. The consideration of the flexible organization of TF binding sites within a regulatory region, both in terms of strength and order, allows the abstraction from the region’s sequence level to its functional level. Thus, functional divergence of regulatory regions is directly compared to phenotypical divergence, which tremendously improves performance compared to Forward Genomics, as I demonstrated on synthetic and real data. Additionally, I developed TFforge which follows the same approach but aims at identifying the TFs relevant for the given phenotype. Given a multi-species alignment with a phenotype annotation and a set of regulatory regions, TFforge systematically searches for TFs whose changes in binding affinity between species fit the phenotype signature. The reported output is a ranking of the TFs according to their level of correspondence. I prove the concept of this approach on both biological data and artificially generated regions. TFforge can be used as a standalone analysis tool and also to generate the input set of TFs for a subsequent REforge analysis. I demonstrate that REforge in combination with TFforge is able to substantially outperform standard Forward Genomics, i.e. even without foreknowledge of relevant TFs. Overall, the in this thesis introduced methods are examples for the power of computational tools in comparative genomics to catalyze biological insights. I did not only show a detailed description of the methods but also conducted a real data analysis as validation. REforge and TFforge have a wide applicability on endless phenotypes, both on their own in the association of TF and regulatory region to a phenotype. Moreover, particularly their combination constitutes in respect to gene regulatory network analyses a valuable tool set for evo-devo studies.

APA, Harvard, Vancouver, ISO, and other styles

14

Choi, Hyunjin. "An Interdisciplinary Approach: Computational Sequence Motif Search and Prediction of Protein Function with Experimental Validation." Diss., Virginia Tech, 2013. http://hdl.handle.net/10919/51762.

Full text

Abstract:

Pathogens colonize their hosts by releasing molecules that can enter host cells. A biotrophic oomycete plant pathogen, Phytophthora sojae harbors a superfamily of effector genes whose protein products enter the cells of the host, soybean. Many of the effectors contain an RXLR-dEER motif in their N-terminus. More than 400 members belonging to this family have been previously identified using a Hidden Markov Model. Amino acids flanking the RXLR motif have been utilized to identify effector proteins from the P. sojae secretome, despite the high level of sequence divergence among the members of this protein family. I present here machine learning methods to identify protein candidates that belong to a particular class, such as the effector superfamily. Converting the flanking amino acid sequences of RXLR motifs (or other candidate motifs) into numeric values that reflect their physical properties enabled the protein sequences to be analyzed through these methods. The methods evaluated include Support Vector Machines and a related spherical classification method that I have developed. I also approached the effector prediction problem by building functional linkage networks and have produced lists of predicted P. sojae effector proteins. I tested the best candidate through gene gun bombardment assays using the beta-glucuronidase reporter system, which revealed that there is a high likelihood that the candidate can enter the soybean cells.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

15

McMillen, Lyle, and l. mcmillen@sct gu edu au. "Isolation and Characterisation of the 5'-Nucleotidase from Escherichia coli." Griffith University. School of Biomolecular and Biomedical Science, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20030226.153545.

Full text

Abstract:

Escherichia coli 5'-nucleotidase is a periplasmically localised enzyme capable of hydrolysing a broad range of substrates, including all 5'-ribo- and 5'-deoxyribonucleotides, uridine diphosphate sugars, and a number of synthetic substrates such as bis (r-nitrophenyl) phosphate. The enzyme has been shown to contain at least one zinc ion following purification, and to have two metal binding sites in the catalytic cleft. 5'-Nucleotidase activity is significantly stimulated by the addition of particular divalent metal ions, most notably cobalt which results in a 30-50 fold increase in activity. Significant sequence homology between the E. coli 5'-nucleotidase and members of the Ser/Thr protein phosphatase family in the catalytic site has lead to 5'-nucleotidase being included in this protein family. This thesis describes the development of a rapid purification methodology for milligram quantities of 5'-nucleotidase, and the investigation of a number of physical and biochemical properties of the enzyme with the aim of comparing these properties to those of certain catalytic site mutants. The molecular weight of the mature protein was estimated as 58219 daltons, with a specific activity for 5'-AMP, in the presence of 4 mM Co2+ and 13 mM Ca2+ at pH 6.0, of 730 mmol/min/mg. The presence of up to two zinc ions associated with the purified enzyme was observed using ICP-ES analysis, suggesting both metal ion binding sites are occupied by zinc in vivo, and some degree of displacement of zinc by cobalt could be observed. Mass spectrometry data, gathered at 60 and 70 mS orifice potential, suggested the presence of a small proportion of material with a mass 118 to 130 daltons greater than the main 5'-nucleotidase mass estimation. This study suggests that this mass difference, only evident at the lower orificepotential, is due to the presence of two zinc ions closely associated with 5'-nucleotidase. To account for the observed high level of activation of 5'-nucleotidase activity by particular divalent metal ions, this thesis describes a proposed model in which these divalent ions may displace the zinc ion at one of the metal ion binding sites. This displacement only occurs at one of the two metal ion binding sites, with the other metal binding site retaining the zinc ion already present. Studies with purified enzyme, each with a single amino acid substitution, lend support to this hypothesis and suggest the identity of the metal ion binding site at which displacement occurs. Seven key catalytic site residues (Asp-41, His-43, Asp-84, His-117, Glu-118, His-217 and His-252) were selected on the basis of sequence conservation within the Ser/Thr protein phosphatases and 5'-nucleotidases. X-ray crystallographic data published by others during this study implicated five of the selected residues (Asp-41, His-43, Asp-84, His-217 and His-252) directly in metal ion binding, including two residues from each metal ion binding site and one directly involved in both sites (Asp-84). The remaining two residues (His-117 and Glu-118) are highly conserved but were not thought to play direct roles in metal ion binding. The seven selected residues were modified by site-directed mutagenesis, and the effect of the amino acid substitutions upon the kinetic properties of 5'-nucleotidase activity was determined. Residues hypothesised to be involved in metal ion displacement, and subsequent activation of 5'-nucleotidase activity, were identified by reductions in metal ion affinity and increased levels of activation by cobalt compared to the wild type 5'-nucleotidase. This study suggests that the metal binding site, M2, that includes residues Asp-84, His-217 and His-252, is involved in metal ion displacement, while the other metal binding site, M1, is not. This, in turn, suggests the metal binding sites are functionally non-equivalent and kinetically distinct. No residues were identified in this study as playing significant roles in substrate binding, as there was no significant reduction observed in affinity for 5'-AMP observed in any of the catalytic site mutants.

APA, Harvard, Vancouver, ISO, and other styles

16

McMillen, Lyle. "Isolation and Characterisation of the 5'-Nucleotidase from Escherichia coli." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366487.

Full text

Abstract:

Escherichia coli 5'-nucleotidase is a periplasmically localised enzyme capable of hydrolysing a broad range of substrates, including all 5'-ribo- and 5'-deoxyribonucleotides, uridine diphosphate sugars, and a number of synthetic substrates such as bis (r-nitrophenyl) phosphate. The enzyme has been shown to contain at least one zinc ion following purification, and to have two metal binding sites in the catalytic cleft. 5'-Nucleotidase activity is significantly stimulated by the addition of particular divalent metal ions, most notably cobalt which results in a 30-50 fold increase in activity. Significant sequence homology between the E. coli 5'-nucleotidase and members of the Ser/Thr protein phosphatase family in the catalytic site has lead to 5'-nucleotidase being included in this protein family. This thesis describes the development of a rapid purification methodology for milligram quantities of 5'-nucleotidase, and the investigation of a number of physical and biochemical properties of the enzyme with the aim of comparing these properties to those of certain catalytic site mutants. The molecular weight of the mature protein was estimated as 58219 daltons, with a specific activity for 5'-AMP, in the presence of 4 mM Co2+ and 13 mM Ca2+ at pH 6.0, of 730 mmol/min/mg. The presence of up to two zinc ions associated with the purified enzyme was observed using ICP-ES analysis, suggesting both metal ion binding sites are occupied by zinc in vivo, and some degree of displacement of zinc by cobalt could be observed. Mass spectrometry data, gathered at 60 and 70 mS orifice potential, suggested the presence of a small proportion of material with a mass 118 to 130 daltons greater than the main 5'-nucleotidase mass estimation. This study suggests that this mass difference, only evident at the lower orificepotential, is due to the presence of two zinc ions closely associated with 5'-nucleotidase. To account for the observed high level of activation of 5'-nucleotidase activity by particular divalent metal ions, this thesis describes a proposed model in which these divalent ions may displace the zinc ion at one of the metal ion binding sites. This displacement only occurs at one of the two metal ion binding sites, with the other metal binding site retaining the zinc ion already present. Studies with purified enzyme, each with a single amino acid substitution, lend support to this hypothesis and suggest the identity of the metal ion binding site at which displacement occurs. Seven key catalytic site residues (Asp-41, His-43, Asp-84, His-117, Glu-118, His-217 and His-252) were selected on the basis of sequence conservation within the Ser/Thr protein phosphatases and 5'-nucleotidases. X-ray crystallographic data published by others during this study implicated five of the selected residues (Asp-41, His-43, Asp-84, His-217 and His-252) directly in metal ion binding, including two residues from each metal ion binding site and one directly involved in both sites (Asp-84). The remaining two residues (His-117 and Glu-118) are highly conserved but were not thought to play direct roles in metal ion binding. The seven selected residues were modified by site-directed mutagenesis, and the effect of the amino acid substitutions upon the kinetic properties of 5'-nucleotidase activity was determined. Residues hypothesised to be involved in metal ion displacement, and subsequent activation of 5'-nucleotidase activity, were identified by reductions in metal ion affinity and increased levels of activation by cobalt compared to the wild type 5'-nucleotidase. This study suggests that the metal binding site, M2, that includes residues Asp-84, His-217 and His-252, is involved in metal ion displacement, while the other metal binding site, M1, is not. This, in turn, suggests the metal binding sites are functionally non-equivalent and kinetically distinct. No residues were identified in this study as playing significant roles in substrate binding, as there was no significant reduction observed in affinity for 5'-AMP observed in any of the catalytic site mutants.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Biomolecular and Biomedical Sciences
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

17

Tang, Thomas Cheuk Kai. "Discovering Protein Sequence-Structure Motifs and Two Applications to Structural Prediction." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1188.

Full text

Abstract:

This thesis investigates the correlations between short protein peptide sequences and local tertiary structures. In particular, it introduces a novel algorithm for partitioning short protein segments into clusters of local sequence-structure motifs, and demonstrates that these motif clusters contain useful structural information via two applications to structural prediction. The first application utilizes motif clusters to predict local protein tertiary structures. A novel dynamic programming algorithm that performs comparably with some of the best existing algorithms is described. The second application exploits the capability of motif clusters in recognizing regular secondary structures to improve the performance of secondary structure prediction based on Support Vector Machines. Empirical results show significant improvement in overall prediction accuracy with no performance degradation in any specific aspect being measured. The encouraging results obtained illustrate the great potential of using local sequence-structure motifs to tackle protein structure predictions and possibly other important problems in computational biology.

APA, Harvard, Vancouver, ISO, and other styles

18

Lombe, Chipampe Patricia. "Analysis, expression profiling and characterization of hsa-miR-5698 target genes as putative dynamic network biomarkers for prostate cancer: a combined in silico and molecular approach." University of the Western Cape, 2019. http://hdl.handle.net/11394/7026.

Full text

Abstract:

Philosophiae Doctor - PhD
2018, the International Agency for Research on Cancer (IARC) estimated that prostate cancer (PCa) was the second leading cause of death in males worldwide. The number of deaths are expected to raise by 50 % in the next decade. This rise is attributed to the shortcomings of the current diagnostic, prognostic, and therapeutic biomarkers used in the management of the disease. Therefore, research into more sensitive, specific and effective biomarkers is a requirement. The use of biomarkers in PCa diagnosis and management takes advantage of the genetic alterations and abnormalities that characterise the disease. In this regard, a microRNA, hsa-miR-5698 was identified in a previous study as a differentiating biomarker between prostate adenocarcinoma and bone metastasis. Six putative translational targets (CDKN1A, CTNND1, FOXC1, LRP8, ELK1 and BIRC2) of this microRNA were discovered using in silico approaches. The aim of this study was to analyse via expression profiling and characterization, the target genes of hsa-miR-5698 in order to determine their ability to act as putative dynamic network biomarkers for PCa. The study was conducted using a combined in silico and molecular approach. The in silico part of the study investigated the putative transcriptional effects of hsa-miR-5698 on the promotors of its translational targets, the correlation between hsa-miR-5698 and mRNA expression profiles as well as the co-expression analysis, pathway analysis and prognostic ability of the target genes. A number of computational software were employed for these purposes, including, R Studio, Trident algorithm, STRING, KEGG, MEME Suite, SurvExpress and ProGgene. The molecular part of the study involved expression profiling of the genes in two PCa cell line LNCaP and PC3 via qPCR.

APA, Harvard, Vancouver, ISO, and other styles

19

Pfeiffer, Philip Edward. "A System for Determining the Statistical Significance of the Frequency of Short DNA Motif Matches in a Genome - An Analytical Approach." University of Dayton / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1304599225.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Langer, Björn [Verfasser], Michael [Akademischer Betreuer] Hiller, Ivo [Gutachter] Sbalzarini, and Peter [Gutachter] Stadler. "Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoring / Björn Langer ; Gutachter: Ivo Sbalzarini, Peter Stadler ; Betreuer: Michael Hiller." Dresden : Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://d-nb.info/1226813224/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Langer, Björn [Verfasser], Michael [Akademischer Betreuer] Hiller, Ivo Fabian [Gutachter] Sbalzarini, and Peter [Gutachter] Stadler. "Phenotype-related regulatory element and transcription factor identification via phylogeny-aware discriminative sequence motif scoring / Björn Langer ; Gutachter: Ivo Sbalzarini, Peter Stadler ; Betreuer: Michael Hiller." Dresden : Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2018. http://d-nb.info/1226813224/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Bellora, Pereyra Nicolás. "In silico analysis of regulatory motifs in gene promoters." Doctoral thesis, Universitat Pompeu Fabra, 2010. http://hdl.handle.net/10803/7202.

Full text

Abstract:

Regulation of gene transcription is a complex process involving many different proteins, some of which bind in a sequence-specific manner to DNA motifs in the gene promoter. The need to maintain specific interactions between transcription factors and proteins involved in the RNA polymerase II complex is expected to impose constrains on the relative position and spacing of the interacting DNA motifs. The present work includes the development of a novel approach to identify motifs that show a preferential location in DNA sequences and the implementation of a public web application called PEAKS. We investigated if the arrangement and nature of the most common motifs depended on the breath of expression of the gene. We found differences that serve to illustrate that many key specific regulatory signals may be present in the proximal promoter region in mammalian genes. We also apply other methods for the identification of specific transcription factors (TFs) involved in the co-regulation of a set of genes. Data from experimentally-verified transcription factors binding sites (TFBSs) support the biological relevance of our findings.
La regulació de la transcripció dels gens és un procés complex que implica moltes proteïnes diferents, algunes de les quals s'unexien a motius específics d'ADN localitzats a la regió promotora dels gens. S'espera que la necessitat de mantenir les interaccions específiques entre els factors de transcripció i les proteïnes implicades en el complex de la ARN polimerasa II imposi limitacions en la posició relativa i l'espaiat dels motius d'interacció amb l'ADN. La feina presentada en aquesta tesi inclou el desenvolupament d'un nou metode per l'identificació de motius que mostren una localització preferencial en seqüències d'ADN i l'implementació d'una aplicació web pública anomenada PEAKS. Hem investigat si la col·locació i la naturalesa de la majoria dels motius comuns depen del rang d'expresió del gen. Hem trobat diferències que serveixen per il·lustrar el fet que moltes senyals clau de regulació gènica poden estar presents en la regió proximal del promotor dels gens de mamífers. També hem aplicat altres mètodes per a l'identificació de factors de transcripció (TFs) específics involucrats en la co-regulació d'un grup de gens. Dades de llocs d'unio dels TFs (TFBSs) verificats experimentalment recolzen la rellevància biològica dels nostres resultats.

APA, Harvard, Vancouver, ISO, and other styles

23

Zhong, Wei. "Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction." Digital Archive @ GSU, 2006. http://digitalarchive.gsu.edu/cs_diss/7.

Full text

Abstract:

Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied.

APA, Harvard, Vancouver, ISO, and other styles

24

Sandve, Geir Kjetil. "Motif discovery in biological sequences." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9270.

Full text

Abstract:

This master thesis is a Ph.D. research plan for motif discovery in biological sequences, and consists of three main parts. Chapter 2 is a survey of methods for motif discovery in DNA regulatory regions, with a special emphasis on computational models. The survey presents an integrated model of the problem that allows systematic and coherent treatment of the surveyed methods. Chapter 3 presents a new algorithm for composite motif discovery in biological sequences. This algorithm has been used with success for motif discovery in protein sequences, and will in future work be extended on to explore properties of the DNA regulatory mechanism. Finally, chapter 4 describes several current research projects, as well as some more general future directions of research. The research focuses on the development of new algorithms for the discovery of composite motifs in DNA. These algorithms will partly be used for systematic exploration of the DNA regulatory mechanism. An increased understanding of this mechanism may lead to more accurate computational models, and hence more sensitive motif discovery methods.

APA, Harvard, Vancouver, ISO, and other styles

25

Těthal, Jiří. "Fuzzy klasifikace DNA sekvencí." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220008.

Full text

Abstract:

The work deals with the fuzzy classification of DNA sequences. In the first part the theory summarized information about Fuzzy logic and methods of its use in the classification of biological sequence data. The second part is practically deal with the classification algorithm for assessing the similarity of sequences. Specifically, the dividing of coding and non-coding parts of the sequence and the use of fuzzy classification in DNA barcoding.

APA, Harvard, Vancouver, ISO, and other styles

26

Morozov, Vyacheslav. "Computational Methods for Inferring Transcription Factor Binding Sites." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23382.

Full text

Abstract:

Position weight matrices (PWMs) have become a tool of choice for the identification of transcription factor binding sites in DNA sequences. PWMs are compiled from experimentally verified and aligned binding sequences. PWMs are then used to computationally discover novel putative binding sites for a given protein. DNA-binding proteins often show degeneracy in their binding requirement, the overall binding specificity of many proteins is unknown and remains an active area of research. Although PWMs are more reliable predictors than consensus string matching, they generally result in a high number of false positive hits. A previous study introduced a novel method to PWM training based on the known motifs to sample additional putative binding sites from a proximal promoter area. The core idea was further developed, implemented and tested in this thesis with a large scale application. Improved mono- and dinucleotide PWMs were computed for Drosophila melanogaster. The Matthews correlation coefficient was used as an optimization criterion in the PWM refinement algorithm. New PWMs keep an account of non-uniform background nucleotide distributions on the promoters and consider a larger number of new binding sites during the refinement steps. The optimization included the PWM motif length, the position on the promoter, the threshold value and the binding site location. The obtained predictions were compared for mono- and dinucleotide PWM versions with initial matrices and with conventional tools. The optimized PWMs predicted new binding sites with better accuracy than conventional PWMs.

APA, Harvard, Vancouver, ISO, and other styles

27

Roll, James Elwood. "Inferring RNA 3D Motifs from Sequence." Bowling Green State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1557482505513958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Soni, Neha. "Sequence motifs predictive of tissue-specific skipping." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/35608.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
Includes bibliographical references (p. 53-55).
Alternative splicing plays a major role in protein diversity and regulating gene expression. Motifs that regulate tissue-specific alternative splicing have been identified by groups studying small sets of genes. We introduce a tissue-specific skipping score for skipped exons using exon-exon junction microarray data. We compare these exons with known literature-verified EST skipped exons and exons predicted to be skipped in both human and mouse. After deriving tissue-specific skipped exon sets for brain, heart, muscle and testis, we find sequence features in the exon and flanking introns that distinguish these tissue-specific skipped exons from constitutive exons. Lastly, we use sequence-based scoring based on these features to predict tissue-specific skipped exons and compare these with EST data to demonstrate the tissue-specificity of the motifs.
by Neha Soni.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

29

Lancaster, Owen. "Sequence and structural templates for protein motifs." Thesis, University of Manchester, 2006. http://www.manchester.ac.uk/escholar/uk-ac-man-scw:157940.

Full text

Abstract:

Current methodologies for recognizing similar protein motifs are predominantly based upon establishing a homology relationship between the sequences. These methods are widely exploited to annotate new genomes and assign putative functions to new genes. However they are usually based on sequence data alone. More recent approaches have incorporated structural data into methods to improve the predictions compared to just sequence based methods alone. So far these approaches have not been widely exploited in bioinformatics for identifying common, small motifs. A test system was examined containing such a degenerate but short, repeating motif, the tetratricopeptide repeat (TPR). Sequence analysis was done to assess the effectiveness of common search tools for finding TPR motifs. These methods included BLAST, PSI-BLAST and Hidden Markov Models and found the latter to be easily the most effective search strategy. Further sequence analysis of the TPR motif was carried out to demonstrate the extent to which TPRs with similar sequences are related in functional terms. In addition a full structural analysis was also performed. The results of the sequence and structural analysis of the TPR allowed structural information to be obtained and structurally conserved features in TPRs comprising conserved interacting residues pair positions were revealed. Comparative models were built and evaluated for all annotated TPR sequences with unknown structures to assess their compatibility with the TPR motif structure. From these and other models the interaction energy of structurally adjacent residues pairs has been calculated. These models were generated by mutating residues in key conserved positions to all possible amino acid combinations. The energy is then evaluated for each of these 20x20 pair combinations. This energy is then integrated into sequence based methods such as Hidden Markov Models with the aim of improving TPR prediction. An improvement in search sensitivity and specificity is demonstrated which should allow improved identification and annotation of this motif in sequence databases.

APA, Harvard, Vancouver, ISO, and other styles

30

Buchholz, Frank, Anja Nitzsche, Maciej Paszkowski-Rogacz, Filomena Matarese, Eva M. Janssen-Megens, Nina C. Hubner, Herbert Schulz, et al. "RAD21 Cooperates with Pluripotency Transcription Factors in the Maintenance of Embryonic Stem Cell Identity." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-191596.

Full text

Abstract:

For self-renewal, embryonic stem cells (ESCs) require the expression of specific transcription factors accompanied by a particular chromosome organization to maintain a balance between pluripotency and the capacity for rapid differentiation. However, how transcriptional regulation is linked to chromosome organization in ESCs is not well understood. Here we show that the cohesin component RAD21 exhibits a functional role in maintaining ESC identity through association with the pluripotency transcriptional network. ChIP-seq analyses of RAD21 reveal an ESC specific cohesin binding pattern that is characterized by CTCF independent co-localization of cohesin with pluripotency related transcription factors Oct4, Nanog, Sox2, Esrrb and Klf4. Upon ESC differentiation, most of these binding sites disappear and instead new CTCF independent RAD21 binding sites emerge, which are enriched for binding sites of transcription factors implicated in early differentiation. Furthermore, knock-down of RAD21 causes expression changes that are similar to expression changes after Nanog depletion, demonstrating the functional relevance of the RAD21 - pluripotency transcriptional network association. Finally, we show that Nanog physically interacts with the cohesin or cohesin interacting proteins STAG1 and WAPL further substantiating this association. Based on these findings we propose that a dynamic placement of cohesin by pluripotency transcription factors contributes to a chromosome organization supporting the ESC expression program.

APA, Harvard, Vancouver, ISO, and other styles

31

Kopp, Wolfgang [Verfasser]. "Statistical methods for motif hit enrichment in DNA sequences / Wolfgang Kopp." Berlin : Freie Universität Berlin, 2017. http://d-nb.info/1135184852/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Valebjørg, Vetle Søraas. "Discovery of approximate composite motifs in biological sequences." Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2006. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-10130.

Full text

Abstract:

Mapping the regulatory system in living organisms is a great challenge, and many methods have been created during the last 15 years to solve this problem. The biological processes are however more flexible and complex than first thought, and many of the methods lack the ability to imitate this exactly. The new method devised here is not a complete solution to this situation, but pose an innovative solution for finding approximate composite patterns in a set of sequences. Motifs are read from any third-party tool represented as either {A,C,G,T}, IUPAC or PWMs, and weighted with significance and support as an estimate to how important the patterns are. Finding combinations with both high significance and support can reveal important properties preserved in the sequences. Based on this, the algorithm use a branch-and-bound approach to traverse every combination while preserving the best solutions in this multiple object optimization problem in a Pareto front. The best patterns found, are investigated further by applying different statistical and experimental method to better support the significance of the patterns found. The three most important tests done on the TransCompel dataset, where (i) to look at the patterns predicted measured against known sites based on nucleotide correlation. (ii) Find the frequency for motifs participating in the combinations, so that the best could be studied manually. And (iii), different test where compared when the significance was based on real background sequences instead of the uniform distribution. Some of the results found where low, but still similar to the accuracy provided by other known methods that have been tested with the same methods. The test results can be biased by the parameters used, a too simple and restrictive test set or by faulty predictions done one the dataset tested. More testing and tuning of parameters might result in better predictions. However, the different tests still proved this method to be a valuable tool in composite motif discovery.

APA, Harvard, Vancouver, ISO, and other styles

33

Holm, Lotta. "The MHC-glycopeptide-T cell interaction in collagen induced arthritis : a study using glycopeptides, isosteres and statistical molecular design in a mouse model for rheumatoid arthritis." Doctoral thesis, Umeå : Department of Chemistry, Umeå University, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-899.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Buchholz, Frank, Anja Nitzsche, Maciej Paszkowski-Rogacz, Filomena Matarese, Eva M. Janssen-Megens, Nina C. Hubner, Herbert Schulz, et al. "RAD21 Cooperates with Pluripotency Transcription Factors in the Maintenance of Embryonic Stem Cell Identity." Public Library of Science, 2011. https://tud.qucosa.de/id/qucosa%3A29134.

Full text

Abstract:

For self-renewal, embryonic stem cells (ESCs) require the expression of specific transcription factors accompanied by a particular chromosome organization to maintain a balance between pluripotency and the capacity for rapid differentiation. However, how transcriptional regulation is linked to chromosome organization in ESCs is not well understood. Here we show that the cohesin component RAD21 exhibits a functional role in maintaining ESC identity through association with the pluripotency transcriptional network. ChIP-seq analyses of RAD21 reveal an ESC specific cohesin binding pattern that is characterized by CTCF independent co-localization of cohesin with pluripotency related transcription factors Oct4, Nanog, Sox2, Esrrb and Klf4. Upon ESC differentiation, most of these binding sites disappear and instead new CTCF independent RAD21 binding sites emerge, which are enriched for binding sites of transcription factors implicated in early differentiation. Furthermore, knock-down of RAD21 causes expression changes that are similar to expression changes after Nanog depletion, demonstrating the functional relevance of the RAD21 - pluripotency transcriptional network association. Finally, we show that Nanog physically interacts with the cohesin or cohesin interacting proteins STAG1 and WAPL further substantiating this association. Based on these findings we propose that a dynamic placement of cohesin by pluripotency transcription factors contributes to a chromosome organization supporting the ESC expression program.

APA, Harvard, Vancouver, ISO, and other styles

35

Sarver, Michael. "STRUCTURE-BASED MULTIPLE RNA SEQUENCE ALIGNMENT AND FINDING RNA MOTIFS." Bowling Green State University / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1151076710.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

El, Soufi Karim. "Study of circular code motifs in nucleic acid sequences." Thesis, Strasbourg, 2017. http://www.theses.fr/2017STRAD004/document.

Full text

Abstract:

Le travail effectué dans cette thèse présente une nouvelle approche de la théorie du code circulaire dans les gènes qui a été initiée en 1996. Cette approche consiste à analyser les motifs construits à partir de ce code circulaire, ces motifs particuliers sont appelés motifs de code circulaire. Ainsi, nous avons développé des algorithmes de recherche pour localiser les motifs de code circulaire dans les séquences d'acides nucléiques afin de leur trouver une signification bioinformatique. En effet, le code circulaire X identifie dans les gènes est un ensemble de trinucleotides qui a la propriété de retrouver, synchroniser et maintenir la phase de lecture. Nous avons commencé notre analyse avec le centre de décodage du ribosome (ARNr) qui est une région majeure dans le processus de traduction des gènes aux protéines. Puis, nous avons étendu les résultats obtenus avec le ribosome aux ARN de transfert (ARNt) pour étudier les interactions ARNr-ARNt. Enfin, nous avons généralisé la recherche de motifs de code circulaire X dans l'ADN aux chromosomes d'eucaryotes complets
The work done in this thesis presents a new direction for circular code identified in 1996 by analysing the motifs constructed from circular code. These particular motifs are called circular code motifs. We applied search algorithms to locate circular code motifs in nucleic acid sequences in order to find biological significance. In fact, the circular code X, which was found in gene sequences, is a set of trinucleotides that have the property of reading frame retrieval, synchronization and maintenance. We started our study in the ribosomal decoding centre (rRNA), an important region involved in the process of translating genes into proteins. Afterwards, we expanded our scope to study the interaction of rRNA through the X circular code. Finally, we search for the X circular code motifs in the complete DNA sequences of chromosomes of the eukaryotic genomes. This study introduced new properties to the circular code theory

APA, Harvard, Vancouver, ISO, and other styles

37

Crowley, Louis J. "Structure-function studies of conserved sequence motifs of cytochrome b5 reductase." [Tampa, Fla] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0001913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Hicks, Matthew Raymond. "Coiled-coil assembly by proteins and peptides with unusual sequence motifs." Thesis, University of Sussex, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.311349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

EL, MABROUK NADIA. "Recherche approchee de motifs - application a des sequences biologiques structurees." Paris 7, 1996. http://www.theses.fr/1996PA077199.

Full text

Abstract:

Ce travail presente differentes methodes pour la recherche approchee, a erreurs pres, d'un motif p dans un texte t. Les erreurs considerees sont les substitutions de caracteres et les motifs sont, soit un mot, soit une classe de mots sur un alphabet. Deux des approches les plus efficaces, en theorie et en pratique, pour la recherche approchee sont les approches de type numerique qui se basent sur la manipulation de mots machine, et les approches de type boyer-moore. Nous avons concu une nouvelle methode de recherche qui tire profit, a la fois de l'efficacite de la demarche boyer-moore, et de la rapidite de programmation des algorithmes numeriques et en particulier de l'algorithme shift-add de baeza-yates et gonnet. Lorsque le mot n'est pas trop long, l'algorithme obtenu est lineaire dans le pire des cas. Si, de plus, l'alphabet est suffisamment grand, l'algorithme est sous-lineaire en moyenne. Une facon naturelle d'aborder le probleme de la recherche approchee consiste a construire un automate fini deterministe reconnaissant le langage l#p#,#k forme de tous les mots qui sont a une distance de p. Le probleme de cette approche par automate est que le temps de construction et la taille d'un tel automate sont importants. Nous montrons que, meme en considerant l'automate minimal reconnaissant l#p#,#k, l'espace memoire utilise reste grand. Nous nous interessons plus particulierement a appliquer les techniques de recherche approchee dans le cas de la prediction de motifs biologiques structures, et plus specialement des genes d'arnt. Notre nouvel algorithme fastrna est une amelioration de l'algorithme trnascan de fichant-burks du point de vue des resultats biologiques et de la rapidite d'execution. Le taux de reconnaissance obtenu pour differentes sequences biologiques est proche de 99%.

APA, Harvard, Vancouver, ISO, and other styles

40

Liu, Kai. "Detecting stochastic motifs in network and sequence data for human behavior analysis." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/60.

Full text

Abstract:

With the recent advent of Web 2.0, mobile computing, and pervasive sensing technologies, human activities can readily be logged, leaving digital traces of di.erent forms. For instance, human communication activities recorded in online social networks allow user interactions to be represented as “network” data. Also, human daily activities can be tracked in a smart house, where the log of sensor triggering events can be represented as “sequence” data. This thesis research aims to develop computational data mining algorithms using the generative modeling approach to extract salient patterns (motifs) embedded in such network and sequence data, and to apply them for human behavior analysis. Motifs are de.ned as the recurrent over-represented patterns embedded in the data, and have been known to be e.ective for characterizing complex networks. Many motif extraction methods found in the literature assume that a motif is either present or absent. In real practice, such salient patterns can appear partially due to their stochastic nature and/or the presence of noise. Thus, the probabilistic approach is adopted in this thesis to model motifs. For network data, we use a probability matrix to represent a network motif and propose a mixture model to extract network motifs. A component-wise EM algorithm is adopted where the optimal number of stochastic motifs is automatically determined with the help of a minimum message length criterion. Considering also the edge occurrence ordering within a motif, we model a motif as a mixture of .rst-order Markov chains for the extraction. Using a probabilistic approach similar to the one for network motif, an optimal set of stochastic temporal network motifs are extracted. We carried out rigorous experiments to evaluate the performance of the proposed motif extraction algorithms using both synthetic data sets and real-world social network data sets and mobile phone usage data sets, and obtained promising results. Also, we found that some of the results can be interpreted using the social balance and social status theories which are well-known in social network analysis. To evaluate the e.ectiveness of adopting stochastic temporal network motifs for not only characterizing human behaviors, we incorporate stochastic temporal network motifs as local structural features into a factor graph model for followee recommendation prediction (essentially a link prediction problem) in online social networks. The proposed motif-based factor graph model is found to outperform signi.cantly the existing state-of-the-art methods for the prediction task. For extract motifs from sequence data, the probabilistic framework proposed for the stochastic temporal network motif extraction is also applicable. One possible way is to make use of the edit distance in the probabilistic framework so that the subsequences with minor ordering variations can .rst be grouped to form the initial set of motif candidates. A mixture model can then be used to determine the optimal set of temporal motifs. We applied this approach to extract sequence motifs from a smart home data set which contains sensor triggering events corresponding to some activities performed by residents in the smart home. The unique behavior extracted for each resident based on the detected motifs is also discussed. Keywords: Stochastic network motifs, .nite mixture models, expectation maximization algorithms, social networks, stochastic temporal network motifs, mixture of Markov chains, human behavior analysis, followee recommendation, signed social networks, activity of daily living, smart environments

APA, Harvard, Vancouver, ISO, and other styles

41

John, Rosalind. "Identification of potential gene coding sequences within large cloned DNA arrays : analysis of zinc finger motif." Thesis, Imperial College London, 1991. http://hdl.handle.net/10044/1/46844.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Sinoquet, Christine. "Grammaires a transformations morphiques recherche de motif - exacte ou approchee - adaptee aux sequences genetiques : le systeme gtm." Rennes 1, 1998. http://www.theses.fr/1998REN10048.

Full text

Abstract:

Nous nous interessons a la recherche de motifs composes dans une sequence. Ces motifs sont decrits a l'aide de grammaires a variables de type chaine et a transformations morphiques. Le concept de transformation morphique permet de capturer les aspects hors-contexte (inversion) et contextuel (repetition) d'un langage de sequences donne. Il est adapte a la description de liens entre regions d'une sequence et en particulier a la modelisation des dependances intra-moleculaires qui etablissent la structure secondaire des sequences genetiques. Nous definissons une classe de grammaires discontinues (a gaps implicites) et a transformations morphiques. La propriete fondamentale des transformations morphiques est exploitee lors d'un pretraitement de la sequence a analyser. Le pretraitement conduit a une representation auxiliaire de la sequence ou sont compiles les liens inter-regions, sous la forme d'une reference a un modele consensus commun. Ce procede permet d'atteindre les deux objectifs fixes : meme efficacite de traitement des transformations morphiques directes et inverses, reconnaissance approchee avec erreurs de substitution, d'insertion et de deletion. La pertinence d'un outil de validation de structure destine a l'etude de sequences biologiques exige ce deuxieme point. L'algorithme de pretraitement repose sur une extension parallele des modeles consensus et de leurs occurrences approchees modulo transformation morphique. Le formalisme gtm unifie deux notions, instanciation de variable et occurrence approchee de modele consensus, par le biais d'un mecanisme de primo-instanciation. La reduction des complexites spatiale et temporelle des phases de pretraitement et de derivation-instantiation-approchee est assuree par la prise en compte de contraintes absolues et relatives, exprimees dans la specification. Nous proposons en particulier un algorithme de filtrage par contraintes relatives, approche par exces, efficace. Le generateur automatique d'analyseurs gtm a ete valide dans le cadre de diverses approches : protocole de tests sur donnees artificielles, recherche d'arn de transfert, de pseudo-nuds, entre autres, pour le genome d'escherichia coli, et contribution a une etude philogenetique relative au genome humain. Pour le type d'analyse non deterministe precedent, la taille de l'espace des solutions est reduite grace a la specification (grammaticale) de contraintes. Nous nous interessons par ailleurs au cas d'analyses ou les contraintes sont inferees a partir d'un corpus d'apprentissage : nous posons les fondements theoriques d'une analyse syntaxique non deterministe guidee par connaissances statistiques. La prise en compte d'un contexte local d'analyse (modele de n-grams) permet de guider la recherche des meilleures solutions. Une application, dediee a un probleme de traduction (traduction reverse de sequences proteiques), a pris pour cible le genome d'escherichia coli.

APA, Harvard, Vancouver, ISO, and other styles

43

Roth, Christian [Verfasser]. "Statistical methods for biological sequence analysis for DNA binding motifs and protein contacts / Christian Roth." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2021. http://nbn-resolving.de/urn:nbn:de:gbv:7-21.11130/00-1735-0000-0008-5912-0-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Prytuliak, Roman [Verfasser], and Christian [Akademischer Betreuer] Leibold. "Recognition of short functional motifs in protein sequences / Roman Prytuliak ; Betreuer: Christian Leibold." München : Universitätsbibliothek der Ludwig-Maximilians-Universität, 2018. http://d-nb.info/1166559513/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Crowley, Louis J. "Structure-Function Studies of Conserved Sequence Motifs of Cytochrome b₅ Reductase:." Scholar Commons, 2007. https://scholarcommons.usf.edu/etd/682.

Full text

Abstract:

NADH:Cytochrome b5 Reductase (cb5r) catalyzes the two electron reduction of the iron center of the heme cofactor found within cytochrome b5 (cb5) utilizing reducing equivalents of the nicotinamide adenine dinucleotide (NADH) coenzyme. Cb5r is characterized by two domains necessary for proper enzyme function: a flavin-binding domain and a pyridine nucleotide-binding domain. Within these domains are highly conserved "motifs" necessary for the proper binding and orientation of both the NADH coenzyme and the FAD cofactor. To address the importance of these conserved motifs site-directed mutagenesis was utilized to generate a series of variants upon residues found within the motifs to allow for the full characterizations. Second, naturally occurring recessive congenital methemoglobinemia (RCM) mutants that are found within or in close proximity to these highly conserved motifs were analyzed utilizing site-directed mutagenesis. The flavin-binding motif "91RxYSTxxSN97" was characterized by the generation of variants T94H, T94G, T94P, P95I, V96S, and S97N. In addition to this, the naturally occurring double mutant P92H/E255- was fully characterized to establish a role of the P92 residue giving rise to RCM. The role of the "124GRxxST127" was determined by the introduction of a positive charge, charge reversal, and conserved amino acid mutations through site-directed mutagenesis of the G124, K125, and M126 residues. Based on the data presented here, each of the residues of the GRxxST motif are directly involved in maintaining the proper binding and orientation of the cb5r flavin prosthetic group. Analysis of the NADH-binding motif "273CGxxx-M278" was accomplished through the characterization of the type II RCM variant M272- and the type I RCM variant P275L. This demonstrates that the deletion of the M272 residue causes a frame shift leading to the inability of the NADH substrate to bind. The introduction of the P275L variant showed that substrate affinity was diminished, yet turnover was comparable to wild-type cytochrome b5 reductase, indicating that although P275 is required for proper substrate binding it is not essential for overall catalytic function. Finally, analysis of the naturally occurring double mutant G75S/V252M provided the first insight into a methemoglobinemia variant that possessed mutations in both the FAD-binding and NADH-binding domains.

APA, Harvard, Vancouver, ISO, and other styles

46

Jorda, Julien. "Analyse systématique des motifs répétés en tandem dans les séquences protéiques." Thesis, Montpellier 2, 2010. http://www.theses.fr/2010MON20090/document.

Full text

Abstract:

Au cours des dernières décennies, les avancées techniques dans la biologie moléculaire telles que les projets de séquençage de génome ont eu pour conséquence un accroissement du volume des banques de données biologiques. Parmi ces données, des séquences présentent des motifs similaires entre eux, répétés de façon juxtaposée, appelés répétitions en tandem. L'objectif de cette thèse est de comprendre l'existence de ces répétitions dans les séquences protéiques via une analyse à grande échelle
Over the last decades, technical advances in molecular biology such as the genome sequencing projects led to a huge increase of data in the biological databanks. Among them, there are particular motifs which are adjacently repeated and similar between them, called tandem repeats. The purpose of this thesis is to understand the existence of these repeats in protein sequences through a large-scale analysis

APA, Harvard, Vancouver, ISO, and other styles

47

Sanghvi, Jubin Dinakarpandian Deendayal. "IFREE - an Indexed Forest of Representer Expressions Extractor for position frequency matrices to rapidly detect sequence motifs." Diss., UMK access, 2006.

Find full text

Abstract:

Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2006.
"A thesis in computer science." Typescript. Advisor: Deendayal Dinakarpandian. Vita. Title from "catalog record" of the print edition Description based on contents viewed Jan. 29, 2007. Includes bibliographical references (leaves 60-62). Online version of the print edition.

APA, Harvard, Vancouver, ISO, and other styles

48

Pham, Quang-Khai. "Time Sequence Summarization: Theory and Applications." Phd thesis, Université de Nantes, 2010. http://tel.archives-ouvertes.fr/tel-00538512.

Full text

Abstract:

Les domaines de la médecine, du web, du commerce ou de la nance génèrent et stockent de grandes masses d'information sous la forme de séquences d'événements. Ces archives représentent des sources d'information très riches pour des analystes avides d'y découvrir des perles de connaissance. Par exemple, les biologistes cherchent à découvrir les facteurs de risque d'une maladie en analysant l'historique des patients, les producteurs de contenu web et les bureaux de marketing examinent les habitudes de consommation des clients et les opérateurs boursiers suivent les évolutions du marché pour mieux l'anticiper. Cependant, ces applications requièrent l'exploration de séquences d'événements très volumineuses, par exemple, la nance génère quotidiennement des millions d'événements, où les événements peuvent être décrits par des termes extraits de riches contenus textuels. La variabilité des descripteurs peut alors être très grande. De ce fait, découvrir des connaissances non triviales à l'aide d'approches classiques de fouille de données dans ces sources d'information prolixes est un problème dicile. Une étude récente montre que les approches classiques de fouille de données peuvent tirer prot de formes condensées de ces données, telles que des résultats d'agrégation ou encore des résumés. La connaissance ainsi extraite est qualiée de connaissance d'ordre supérieur. À partir de ce constat, nous présentons dans ces travaux le concept de résumé de séquence d'événements dont le but est d'amener les applications dépendantes du temps à gagner un facteur d'échelle sur de grandes masses de données. Un résumé s'obtient en transformant une séquence d'événements où les événements sont ordonnés chronologiquement. Chaque événement est précisément décrit par un ensemble ni de descripteurs symboliques. Le résumé produit est alors une séquence d'événements, plus concise que la séquence initiale, et pouvant s'y substituer dans les applications. Nous proposons une première méthode de construction guidée par l'utilisateur, appelée TSaR. Il s'agit d'un processus en trois phases : i) une généralisation, ii) un regroupement et iii) une formation de concepts. TSaR utilise des connaissances de domaine exprimées sous forme de taxonomies pour généraliser les descripteurs d'événements. Une fenêtre temporelle est donnée pour contrôler le processus de regroupement selon la proximité temporelle des événements. Dans un second temps, pour rendre le processus de résumé autonome, c'est- à-dire sans paramétrage, nous proposons une redénition du problème de résumé en un nouveau problème de classication. L'originalité de ce problème de classication tient au fait que la fonction objective à optimiser dépend simultanément du contenu des événements et de leur proximité dans le temps. Nous proposons deux algorithmes gloutons appelés G-BUSS et GRASS pour répondre à ce problème. Enn, nous explorons et analysons l'aptitude des résumés de séquences d'événements à contribuer à l'extraction de motifs séquentiels d'ordre supérieur. Nous analysons les caractéristiques des motifs fréquents extraits des résumés et proposons une méthodologie qui s'appuie sur ces motifs pour en découvrir d'autres, à granularité plus ne. Nous évaluons et validons nos approches de résumé et notre méthodologie par un ensemble d'expériences sur un jeu de données réelles extraites des archives d'actualités nancières produites par Reuters.

APA, Harvard, Vancouver, ISO, and other styles

49

Grunert, Steffen. "Strukturelles und funktionelles Verständnis von Membranproteinen im Kontext sequenzmotivbasierter Methoden." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-229383.

Full text

Abstract:

Die vorliegende Arbeit wurde im Rahmen einer kooperativen Promotion zwischen der TU Dresden und der Hochschule Mittweida angefertigt. In dieser werden neuartige, computerorientierte Ansätze für die Analyse von Membranproteinen vorgestellt. Membranproteine sind von essentieller Bedeutung für eine Vielzahl biologischer Prozesse innerhalb eines Organismus und stellen wichtige Zielmoleküle für eine breite Palette von Pharmazeutika dar. Ihre Sequenzen liefern wertvolle und teilweise noch nicht entschlüsselte Informationen über die dreidimensionale Struktur und funktionale Eigenschaften. Innerhalb der Proteomik und Genomik stellen Analysen an Membranproteinstrukturen einen wichtigen Teil für das Verständnis komplexer biologischer Prozesse dar. Im Zuge von Untersuchungen an Membranproteinen konnte eine Vielzahl kurzer wiederkehrender Muster, sogenannte Motive, in den Sequenzen von Membranproteinen beobachtet werden. Diese Motive unterstützen das Verständnis, wie sich Membranproteine in der Zellmembran falten. Im Fokus dieser Arbeit stehen derartige Sequenzmotive. Innerhalb von drei Projekten bilden ausschließlich sequenzmotivbasierte Ansätze die Grundlage für nähere Untersuchungen an Membranproteinstrukturen. Letztendlich liefern die in dieser Arbeit postulierten Methoden wertvolle Erkenntnisse über die strukturelle und funktionelle Rolle von Sequenzmotiven, auf deren Grundlage dazu beigetragen wird, den komplexen Aufbau von Membranproteinen besser verstehen zu können. Generell wird die Zusammenführung proteomischer und mutagener Informationen intensiviert. Nicht zuletzt wird dazu beigetragen, die in dieser Arbeit zusammengetragenen Ergebnisse, für die Planung von in vitro Experimenten sowie weiterführenden Arbeiten auf dem Gebiet der Membranproteinanalyse, der Wissenschaft zur Verfügung zu stellen
The present work was written as part of a cooperative doctorate between the TU Dresden and the University of Applied Sciences Mittweida. In the doctoral thesis, novel, computer-oriented approaches for the analysis of membrane proteins are presented. Membrane proteins are essential for many cellular processes and are important targets for a wide range of pharmaceuticals. Their sequences provide valuable and partly not yet decoded information about their three-dimensional structure and functional characteristics. The analysis of membrane proteins is an important part for the understanding of complex biological processes in the context of proteomics and genomics. Research of membrane proteins revealed a large number of short, distinct sequence motifs. The motifs found so far support the understanding of the folded protein in the Membrane environment. In this dissertation, in three different approaches it is shown how the output of sequence motif-based methods can support the understanding of structural and functional properties of membrane proteins. In general, the junction of proteomic and mutagenic information is intensified. Last but not least, the results of this work are made available for the planning of in vitro experiments as well as for further works in the field of membrane Protein analysis

APA, Harvard, Vancouver, ISO, and other styles

50

Liang, Chengzhi. "COPIA: A New Software for Finding Consensus Patterns in Unaligned Protein Sequences." Thesis, University of Waterloo, 2001. http://hdl.handle.net/10012/1050.

Full text

Abstract:

Consensus pattern problem (CPP) aims at finding conserved regions, or motifs, in unaligned sequences. This problem is NP-hard under various scoring schemes. To solve this problem for protein sequences more efficiently,a new scoring scheme and a randomized algorithm based on substitution matrix are proposed here. Any practical solutions to a bioinformatics problem must observe twoprinciples: (1) the problem that it solves accurately describes the real problem; in CPP, this requires the scoring scheme be able to distinguisha real motif from background; (2) it provides an efficient algorithmto solve the mathematical problem. A key question in protein motif-finding is how to determine the motif length. One problem in EM algorithms to solve CPP is how to find good startingpoints to reach the global optimum. These two questions were both well addressed under this scoring scheme,which made the randomized algorithm both fast and accurate in practice. A software, COPIA (COnsensus Pattern Identification and Analysis),has been developed implementing this algorithm. Experiments using sequences from the von Willebrand factor (vWF)familyshowed that it worked well on finding multiple motifs and repeats. COPIA's ability to find repeats makes it also useful in illustrating the internal structures of multidomain proteins. Comparative studies using several groups of protein sequences demonstrated that COPIA performed better than the commonly used motif-finding programs.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Sequence motif'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles