Tesi: "Bioinformatics"

1

Hvidsten, Torgeir R. "Predicting Function of Genes and Proteins from Sequence, Structure and Expression Data". Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis : Univ.-bibl. [distributör], 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-4490.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

2

Hooper, Sean. "Dynamics of Microbial Genome Evolution". Doctoral thesis, Uppsala University, Molecular Evolution, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3283.

Testo completo

Abstract (sommario):

The success of microbial life on Earth can be attributed not only to environmental factors, but also to the surprising hardiness, adaptability and flexibility of the microbes themselves. They are able to quickly adapt to new niches or circumstances through gene evolution and also by sheer strength of numbers, where statistics favor otherwise rare events.

An integral part of adaptation is the plasticity of the genome; losing and acquiring genes depending on whether they are needed or not. Genomes can also be the birthplace of new gene functions, by duplicating and modifying existing genes. Genes can also be acquired from outside, transcending species boundaries. In this work, the focus is set primarily on duplication, deletion and import (lateral transfer) of genes – three factors contributing to the versatility and success of microbial life throughout the biosphere.

We have developed a compositional method of identifying genes that have been imported into a genome, and the rate of import/deletion turnover has been appreciated in a number of organisms. Furthermore, we propose a model of genome evolution by duplication, where through the principle of gene amplification, novel gene functions are discovered within genes with weak- or secondary protein functions. Subsequently, the novel function is maintained by selection and eventually optimized. Finally, we discuss a possible synergic link between lateral transfer and duplicative processes in gene innovation.

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Snøve, Jr Ola. "Hardware-accelerated analysis of non-protein-coding RNAs". Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-713.

Testo completo

Abstract (sommario):

A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information.

Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes.

This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes.

Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.com

Gli stili APA, Harvard, Vancouver, ISO e altri

4

Björkholm, Patrik. "Method for recognizing local descriptors of protein structures using Hidden Markov Models". Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-11408.

Testo completo

Abstract (sommario):

Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.

Gli stili APA, Harvard, Vancouver, ISO e altri

5

Keller, Jens. "Clustering biological data using a hybrid approach : Composition of clusterings from different features". Thesis, University of Skövde, School of Humanities and Informatics, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-1078.

Testo completo

Abstract (sommario):

Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology.

The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.

Gli stili APA, Harvard, Vancouver, ISO e altri

6

Chawade, Aakash. "Inferring Gene Regulatory Networks in Cold-Acclimated Plants by Combinatorial Analysis of mRNA Expression Levels and Promoter Regions". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-20.

Testo completo

Abstract (sommario):

Understanding the cold acclimation process in plants may help us develop genetically engineered plants that are resistant to cold. The key factor in understanding this process is to study the genes and thus the gene regulatory network that is involved in the cold acclimation process. Most of the existing approaches1-8 in deriving regulatory networks rely only on the gene expression data. Since the expression data is usually noisy and sparse the networks generated by these approaches are usually incoherent and incomplete. Hence a new approach is proposed here that analyzes the promoter regions along with the expression data in inferring the regulatory networks. In this approach genes are grouped into sets if they contain similar over-represented motifs or motif pairs in their promoter regions and if their expression pattern follows the expression pattern of the regulating gene. The network thus derived is evaluated using known literature evidence, functional annotations and from statistical tests.

Gli stili APA, Harvard, Vancouver, ISO e altri

7

Muhammad, Ashfaq. "Design and Development of a Database for the Classification of Corynebacterium glutamicum Genes, Proteins, Mutants and Experimental Protocols". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-23.

Testo completo

Abstract (sommario):

Coryneform bacteria are largely distributed in nature and are rod like, aerobic soil bacteria capable of growing on a variety of sugars and organic acids. Corynebacterium glutamicum is a nonpathogenic species of Coryneform bacteria used for industrial production of amino acids. There are three main publicly available genome annotations, Cg, Cgl and NCgl for C. glutamicum. All these three annotations have different numbers of protein coding genes and varying numbers of overlaps of similar genes. The original data is only available in text files. In this format of genome data, it was not easy to search and compare the data among different annotations and it was impossible to make an extensive multidimensional customized formal search against different protein parameters. Comparison of all genome annotations for construction deletion, over-expression mutants, graphical representation of genome information, such as gene locations, neighboring genes, orientation (direct or complementary strand), overlapping genes, gene lengths, graphical output for structure function relation by comparison of predicted trans-membrane domains (TMD) and functional protein domains protein motifs was not possible when data is inconsistent and redundant on various publicly available biological database servers. There was therefore a need for a system of managing the data for mutants and experimental setups. In spite of the fact that the genome sequence is known, until now no databank providing such a complete set of information has been available. We solved these problems by developing a standalone relational database software application covering data processing, protein-DNA sequence extraction and

management of lab data. The result of the study is an application named, CORYNEBASE, which is a software that meets our aims and objectives.

Gli stili APA, Harvard, Vancouver, ISO e altri

8

Chen, Lei. "Construction of Evolutionary Tree Models for Oncogenesis of Endometrial Adenocarcinoma". Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-25.

Testo completo

Abstract (sommario):

Endometrial adenocarcinoma (EAC) is the fourth leading cause of carcinoma in woman worldwide, but not much is known about genetic factors involved in this complex disease. During the EAC process, it is well known that losses and gains of chromosomal regions do not occur completely at random, but partly through some flow of causality. In this work, we used three different algorithms based on frequency of genomic alterations to construct 27 tree models of oncogenesis. So far, no study about applying pathway models to microsatellite marker data had been reported. Data from genome–wide scans with microsatellite markers were classified into 9 data sets, according to two biological approaches (solid tumor cell and corresponding tissue culture) and three different genetic backgrounds provided by intercrossing the susceptible rat BDII strain and two normal rat strains. Compared to previous study, similar conclusions were drawn from tree models that three main important regions (I, II and III) and two subordinate regions (IV and V) are likely to be involved in EAC development. Further information about these regions such as their likely order and relationships was produced by the tree models. A high consistency in tree models and the relationship among p19, Tp53 and Tp53 inducible

protein genes provided supportive evidence for the reliability of results.

Gli stili APA, Harvard, Vancouver, ISO e altri

9

Dodda, Srinivasa Rao. "Improvements and extensions of a web-tool for finding candidate genes associated with rheumatoid arthritis". Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-26.

Testo completo

Abstract (sommario):

QuantitativeTraitLocus (QTL) is a statistical method used to restrict genomic regions contributing to specific phenotypes. To further localize genes in such regions a web tool called “Candidate Gene Capture” (CGC) was developed by Andersson et al. (2005). The CGC tool was based on the textual description of genes defined in the human phenotype database OMIM. Even though the CGC tool works well, the tool was limited by a number of inconsistencies in the underlying database structure, static web pages and some gene descriptions without properly defined function in the OMIM database. Hence, in this work the CGC tool was improved by redesigning its database structure, adding dynamic web pages and improving the prediction of unknown gene function by using exon analysis. The changes in database structure diminished the number of tables considerably, eliminated redundancies and made data retrieval more efficient. A new method for prediction of gene function was proposed, based on the assumption that similarity between exon sequences is associated with biochemical function. Using Blast with 20380 exon protein sequences and a threshold E-value of 0.01, 639 exon groups were obtained with an average of 11 exons per group. When estimating the functional similarity, it was found that on the average 72% of the exons in a group had at least one Gene Ontology (GO) term in common.

Gli stili APA, Harvard, Vancouver, ISO e altri

10

Huque, Enamul. "Shape Analysis and Measurement for the HeLa cell classification of cultured cells in high throughput screening". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-27.

Testo completo

Abstract (sommario):

Feature extraction by digital image analysis and cell classification is an important task for cell culture automation. In High Throughput Screening (HTS) where thousands of data points are generated and processed at once, features will be extracted and cells will be classified to make a decision whether the cell-culture is going on smoothly or not. The culture is restarted if a problem is detected. In this thesis project HeLa cells, which are human epithelial cancer cells, are selected for the experiment. The purpose is to classify two types of HeLa cells in culture: Cells in cleavage that are round floating cells (stressed or dead cells are also round and floating) and another is, normal growing cells that are attached to the substrate. As the number of cells in cleavage will always be smaller than the number of cells which are growing normally and attached to the substrate, the cell-count of attached cells should be higher than the round cells. There are five different HeLa cell images that are used. For each image, every single cell is obtained by image segmentation and isolation. Different mathematical features are found for each cell. The feature set for this experiment is chosen in such a way that features are robust, discriminative and have good generalisation quality for classification. Almost all the features presented in this thesis are rotation, translation and scale invariant so that they are expected to perform well in discriminating objects or cells by any classification algorithm. There are some new features added which are believed to improve the classification result. The feature set is considerably broad rather than in contrast with the restricted sets which have been used in previous work. These features are used based on a common interface so that the library can be extended and integrated into other applications. These features are fed into a machine learning algorithm called Linear Discriminant Analysis (LDA) for classification. Cells are then classified as ‘Cells attached to the substrate’ or Cell Class A and ‘Cells in cleavage’ or Cell Class B. LDA considers features by leaving and adding shape features for increased performance. On average there is higher than ninety five percent accuracy obtained in the classification result which is validated by visual classification.

Gli stili APA, Harvard, Vancouver, ISO e altri

11

Naswa, Sudhir. "Representation of Biochemical Pathway Models : Issues relating conversion of model representation from SBML to a commercial tool". Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-28.

Testo completo

Abstract (sommario):

Background: Computational simulation of complex biological networks lies at the heart of systems biology since it can confirm the conclusions drawn by experimental studies of biological networks and guide researchers to produce fresh hypotheses for further experimental validation. Since this iterative process helps in development of more realistic system models a variety of computational tools have been developed. In the absence of a common format for representation of models these tools were developed in different formats. As a result these tools became unable to exchange models amongst them, leading to development of SBML, a standard exchange format for computational models of biochemical networks. Here the formats of SBML and one of the commercial tools of systems biology are being compared to study the issues which may arise during conversion between their respective formats. A tool StoP has been developed to convert the format of SBML to the format of the selected tool.

Results: The basic format of SBML representation which is in the form of listings of various elements of a biochemical reaction system differs from the representation of the selected tool which is location oriented. In spite of this difference the various components of biochemical pathways including multiple compartments, global parameters, reactants, products, modifiers, reactions, kinetic formulas and reaction parameters could be converted from the SBML representation to the representation of the selected tool. The MathML representation of the kinetic formula in an SBML model can be converted to the string format of the selected tool. Some features of the SBML are not present in the selected tool. Similarly, the ability of the selected tool to declare parameters for locations, which are global to those locations and their children, is not present in the SBML.

Conclusions: Differences in representations of pathway models may include differences in terminologies, basic architecture, differences in capabilities of software’s, and adoption of different standards for similar things. But the overall similarity of domain of pathway models enables us to interconvert these representations. The selected tool should develop support for unit definitions, events and rules. Development of facility for parameter declaration at compartment level by SBML and facility for function declaration by the selected tool is recommended.

Gli stili APA, Harvard, Vancouver, ISO e altri

12

Poudel, Sagar. "GPCR-Directed Libraries for High Throughput Screening". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-29.

Testo completo

Abstract (sommario):

Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS.

Gli stili APA, Harvard, Vancouver, ISO e altri

13

Anders, Patrizia. "A bioinformaticians view on the evolution of smell perception". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-30.

Testo completo

Abstract (sommario):

Background:

The origin of vertebrate sensory systems still contains many mysteries and thus challenges to bioinformatics. Especially the evolution of the sense of smell maintains important puzzles, namely the question whether or not the vomeronasal system is older than the main olfactory system. Here I compare receptor sequences of the two distinct systems in a phylogenetic study, to determine their relationships among several different species of the vertebrates.

Results:

Receptors of the two olfactory systems share little sequence similarity and prove to be a challenge in multiple sequence alignment. However, recent dramatical improvements in the area of alignment tools allow for better results and high confidence. Different strategies and tools were employed and compared to derive a

high quality alignment that holds information about the evolutionary relationships between the different receptor types. The resulting Maximum-Likelihood tree supports the theory that the vomeronasal system is rather an ancestor of the main olfactory system instead of being an evolutionary novelty of tetrapods.

Conclusions:

The connections between the two systems of smell perception might be much more fundamental than the common architecture of receptors. A better understanding of these parallels is desirable, not only with respect to our view on evolution, but also in the context of the further exploration of the functionality and complexity of odor perception. Along the way, this work offers a practical protocol through the jungle of programs concerned with sequence data and phylogenetic reconstruction.

Gli stili APA, Harvard, Vancouver, ISO e altri

14

Pohl, Matin. "Using an ontology to enhance metabolic or signaling pathway comparisions by biological and chemical knowledge". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-32.

Testo completo

Abstract (sommario):

Motivation:

As genome-scale efforts are ongoing to investigate metabolic networks of miscellaneous organisms the amount of pathway data is growing. Simultaneously an increasing amount of gene expression data from micro arrays becomes available for reverse engineering, delivering e.g. hypothetical regulatory pathway data. To avoid outgrowing of data and keep control of real new informations the need of analysis tools arises. One vital task is the comparison of pathways for detection of similar functionalities, overlaps, or in case of reverse engineering, detection of known data corroborating a hypothetical pathway. A comparison method using ontological knowledge about molecules and reactions will feature a more biological point of view which graph theoretical approaches missed so far. Such a comparison attempt based on an ontology is described in this report.

Results:

An algorithm is introduced that performs a comparison of pathways component by component. The method was performed on two selected databases and the results proved it to be not satisfying using it as stand-alone method. Further development possibilities are suggested and steps toward an integrated method using several approaches are recommended.

Availability:

The source code, used database snapshots and pictures can be requested from the author.

Gli stili APA, Harvard, Vancouver, ISO e altri

15

Sentausa, Erwin. "Time course simulation replicability of SBML-supporting biochemical network simulation tools". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-33.

Testo completo

Abstract (sommario):

Background: Modelling and simulation are important tools for understanding biological systems. Numerous modelling and simulation software tools have been developed for integrating knowledge regarding the behaviour of a dynamic biological system described in mathematical form. The Systems Biology Markup Language (SBML) was created as a standard format for exchanging biochemical network models among tools. However, it is not certain yet whether actual usage and exchange of SBML models among the tools of different purpose and interfaces is assessable. Particularly, it is not clear whether dynamic simulations of SBML models using different modelling and simulation packages are replicable.

Results: Time series simulations of published biological models in SBML format are performed using four modelling and simulation tools which support SBML to evaluate whether the tools correctly replicate the simulation results. Some of the tools do not successfully integrate some models. In the time series output of the successful

simulations, there are differences between the tools.

Conclusions: Although SBML is widely supported among biochemical modelling and simulation tools, not all simulators can replicate time-course simulations of SBML models exactly. This incapability of replicating simulation results may harm the peer-review process of biological modelling and simulation activities and should be addressed accordingly, for example by specifying in the SBML model the exact algorithm or simulator used for replicating the simulation result.

Gli stili APA, Harvard, Vancouver, ISO e altri

16

Simu, Tiberiu. "A method for extracting pathways from Scansite-predicted protein-protein interactions". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-34.

Testo completo

Abstract (sommario):

Protein interaction is an important mechanism for cellular functionality. Predicting protein interactions is available in many cases as computational methods in publicly available resources (for example Scansite). These predictions can be further combined with other information sources to generate hypothetical pathways. However, when using computational methods for building pathways, the process may become time consuming, as it requires multiple iterations and consolidating data from different sources. We have tested whether it is possible to generate graphs of protein-protein interaction by using only domain-motif interaction data and the degree to which it is possible to automate this process by developing a program that is able to aggregate, under user guidance, query results from different information sources. The data sources used are Scansite and SwissProt. Visualisation of the graphs is done with an external program freely available for academic purposes, Osprey. The graphs obtained by running the software show that although it is possible to combine publicly available data and theoretical protein-protein interaction predictions from Scansite, further efforts are needed to increase the biological plausibility of these collections of data. It is possible, however, to reduce the dimensionality of the obtained graphs by focusing the searches on a certain tissue of interest.

Gli stili APA, Harvard, Vancouver, ISO e altri

17

Mathew, Sumi. "A method to identify the non-coding RNA gene for U1 RNA in species in which it has not yet been found". Thesis, University of Skövde, School of Humanities and Informatics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-37.

Testo completo

Abstract (sommario):

Background

Non coding RNAs are the RNA molecules that do not code for proteins but play structural, catalytic or regulatory roles in the organisms in which they are found. These RNAs generally conserve their secondary structure more than their primary sequence. It is possible to look for protein coding genes using sequence signals like promoters, terminators, start and stop codons etc. However, this is not the case with non coding RNAs since these signals are weakly conserved in them. Hence the situation with non coding RNAs is more challenging. Therefore a protocol is devised to identify U1 RNA in species not previously known to have it.

Results

It is sufficient to use the covariance models to identify non coding RNAs but they are very slow and hence a filtering step is needed before using the covariance models to reduce the search space for identifying these genes. The protocol for identifying U1 RNA genes employs for the filtering a pattern matcher RNABOB that can conduct secondary structure pattern searches. The descriptor for RNABOB is made automatically such that it can also represent the bulges and interior loops in helices of RNA. The protocol is compared with the Rfam and Weinberg & Ruzzo approaches and has been able to identify new U1 RNA homologues in the Apicomplexan group where it has not previously been found.

Conclusions

The method has been used to identify the gene for U1 RNA in certain species in which it has not been detected previously. The identified genes may be further analyzed by wet laboratory techniques for the confirmation of their existence.

4

Gli stili APA, Harvard, Vancouver, ISO e altri

18

Rao, Aditya. "Tarfetpf: A Plasmodium faciparum protein localization predictor". Thesis, University of Skövde, School of Humanities and Informatics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-24.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

19

Wakadkar, Sachin. "Analysis of transmembrane and globular protein depending on their solvent energy". Thesis, University of Skövde, School of Life Sciences, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-2971.

Testo completo

Abstract (sommario):

The number of experimentally determined protein structures in the protein data bank (PDB) is continuously increasing. The common features like; cellular location, function, topology, primary structure, secondary structure, tertiary structure, domains or fold are used to classify them. Therefore, there are various methods available for classification of proteins. In this work we are attempting an additional method for making appropriate classification, i.e. solvent energy. Solvation is one of the most important properties of macromolecules and biological membranes by which they remain stabilized in different environments. The energy required for solvation can be measured in term of solvent energy. Proteins from similar environments are investigated for similar solvent energy. That is, the solvent energy can be used as a measure to analyze and classify proteins. In this project solvent energy of proteins present in the Protein Data Bank (PDB) was calculated by using Jones’ algorithm. The proteins were classified into two classes; transmembrane and globular. The results of statistical analysis showed that the values of solvent energy obtained for two main classes (globular and transmebrane) were from different sets of populations. Thus, by adopting classification based on solvent energy will definitely help for prediction of cellular placement.

Gli stili APA, Harvard, Vancouver, ISO e altri

20

Birkmeier, Bettina. "Integrating Prior Knowledge into the Fitness Function of an Evolutionary Algorithm for Deriving Gene Regulatory Networks". Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-31.

Testo completo

Abstract (sommario):

The topic of gene regulation is a major research area in the bioinformatics community. In this thesis prior knowledge from Gene Ontology in the form of templates is integrated into the fitness function of an evolutionary algorithm to predict gene regulatory networks. The resulting multi-objective fitness functions are then tested with MAPK network data taken from KEGG to evaluate their respective performances. The results are presented and analyzed. However, a clear tendency cannot be observed. The results are nevertheless promising and can provide motivation for further research in that direction. Therefore different ideas and approaches are suggested for future work.

Gli stili APA, Harvard, Vancouver, ISO e altri

21

Rao, Aditya. "TargetPf: A Plasmodium falciparum protein localization predictor". Thesis, University of Skövde, School of Humanities and Informatics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-914.

Testo completo

Abstract (sommario):

Background: In P. falciparum a similarity between the transit peptides of apicoplast and mitochondrial proteins in the context of net positive charge has previously been observed in few proteins. Existing P. falciparum protein localization prediction tools were leveraged in this study to study this similarity in larger sets of these proteins.

Results: The online public-domain malarial repository PlasmoDB was utilized as the source of apicoplast and mitochondrial protein sequences for the similarity study of the two types of transit peptides. It was found that

many of the 551 apicoplast-targeted proteins (NEAT proteins) of PlasmoDB may have been wrongly annotated to localize to the apicoplast, as some of these proteins lacked annotations for signal peptides, while others also had annotations for localization to the mitochondrion (NEMT proteins). Also around 50 NEAT proteins could contain signal anchors instead of signal peptides in their N-termini, something that could have an impact on the current theory that explains localization to the apicoplast [1].

The P. falciparum localization prediction tools were then used to study the similarity in net positive charge between the transit peptides of NEAT and NEMT proteins. It was found that NEAT protein prediction tools like PlasmoAP and PATS could be made to recognize NEMT proteins as NEAT proteins, while the NEMT predicting tool PlasMit could be made to recognize a significant number of NEAT proteins as NEMT. Based on these results a conjecture was proposed that a single technique may be sufficient to predict both apicoplast and mitochondrial transit peptides. An implementation in PERL called TargetPf was implemented to test this conjecture (using PlasmoAP rules), and it reported a total of 408 NEAT

proteins and 1504 NEMT proteins. This number of predicted NEMT proteins (1504) was significantly higher than the annotated 258 NEMT proteins of plasmoDB, but more in line with the 1200 predictions of the tool PlasMit.

Conclusions: Some possible ambiguities in the PlasmoDB annotations related to NEAT protein localization were identified in this study. It was also found that existing P. falciparum localization prediction tools can be made to detect transit peptides for which they have not been trained or built for.

Gli stili APA, Harvard, Vancouver, ISO e altri

22

Truvé, Katarina. "Using combined methods to reveal the dynamic organization of protein networks". Thesis, University of Skövde, School of Humanities and Informatics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-962.

Testo completo

Abstract (sommario):

Proteins combine in various ways to execute different essential functions. Cellular processes are enormously complex and it is a great challenge to explain the underlying organization. Various methods have been applied in attempt to reveal the organization of the cell. Gene expression analysis uses the mRNA levels in the cell to predict which proteins are present in the cell simultaneously. This method is useful but also known to sometimes fail. Proteins that are known to be functionally related do not always show a significant correlation in gene expression. This fact might be explained by the dynamic organization of the proteome. Proteins can have diverse functions and might interact with some proteins only during a few time points, which would probably not result in significant correlation in their gene expression. In this work we tried to address this problem by combining gene expression data with data for physical interactions between proteins. We used a method for modular decomposition introduced by Gagneur et al. (2004) that aims to reveal the logical organization in protein-protein networks. We extended the interpretation of the modular decomposition to localize the dynamics in the protein organization. We found evidence that protein-interactions supported by gene expression data are very likely to be related in function and thus can be used to predict function for unknown proteins. We also identified negative correlation in gene expression as an overlooked area. Several hypotheses were generated using combination of these methods. Some could be verified by the literature and others might shed light on new pathways after additional experimental testing.

Gli stili APA, Harvard, Vancouver, ISO e altri

23

Orzechowski, Westholm Jakub. "Genome-wide Studies of Transcriptional Regulation in Yeast". Doctoral thesis, Uppsala universitet, Institutionen för cell- och molekylärbiologi, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-99205.

Testo completo

Abstract (sommario):

In this thesis, nutrient signalling in yeast is used as a model to study several features of gene regulation, such as combinatorial gene regulation, the role of motif context and chromatin modifications. The nutrient signalling system in yeast consists of several pathways that transmit signals about the availability of key nutrients, and regulate the transcription of a large part of the genome. Some of the signalling pathways are also conserved in other eukaryotic species where they are implicated in processes such as aging and in human disease. Combinatorial gene regulation is examined in papers I and II. In paper I, the role of Mig1, Mig2 and Mig3 is studied. To elucidate how the three proteins contribute to the control of gene expression, we used microarrays to study the expression of all yeast genes in the wild type and in all seven possible combinations of mig1, mig2 and mig3 deletions. In paper II, a similar strategy is used to investigate Gis1 and Rph1, two related transcription factors. Our results reveal that Rph1 is involved in nutrient signalling together with Gis1, and we find that both the activities and the target specificities of Gis1 and Rph1 depend on the growth phase. Paper III describes ContextFinder, a program for identifying constraints on sequence motif locations and orientations. ContextFinder was used to analyse over 300 cases of motifs that are enriched in experimentally selected groups of yeast promoters. Our results suggest that motif context frequently is important for stable DNA binding and/or regulatory activity of transcription factors. In paper IV, we investigated how gene expression changes resulting from nitrogen starvation are accompanied by chromatin modifications. Activation of gene expression is concentrated to specific genomic regions. It is associated with nucleosome depletion (in both promoters and coding regions) and increased levels of H3K9ac (but not H4K5ac).

Gli stili APA, Harvard, Vancouver, ISO e altri

24

Besnier, Francois. "Development of Variance Component Methods for Genetic Dissection of Complex Traits". Doctoral thesis, Uppsala universitet, Centrum för bioinformatik, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-101399.

Testo completo

Abstract (sommario):

This thesis presents several developments on Variance component (VC) approach for Quantitative Trait Locus (QTL) mapping. The first part consists of methodological improvements: a new fast and efficient method for estimating IBD matrices, have been developed. The new method makes a better use of the computer resources in terms of computational power and storage memory, facilitating further improvements by resolving methodological bottlenecks in algorithms to scan multiple QTL. A new VC model have also been developed in order to consider and evaluate the correlation of the allelic effects within parental lines origin in experimental outbred crosses. The method was tested on simulated and experimental data and revealed a higher or similar power to detect QTL than linear regression based QTL mapping. The second part focused on the prospect to analyze multi-generational pedigrees by VC approach. The IBD estimation algorithm was extended to include haplotype information in addition to genotype and pedigree to improve the accuracy of the IBD estimates, and a new haplotyping algorithm was developed for limiting the risk of haplotyping errors in multigenerational pedigrees. Those newly developed methods where subsequently applied for the analysis of a nine generations AIL pedigree obtained after crossing two chicken lines divergently selected for body weight. Nine QTL described in a F2 population were replicated in the AIL pedigree, and our strategy to use both genotype and phenotype information from all individuals in the entire pedigree clearly made efficient use of the available genotype information provided in AIL.

Gli stili APA, Harvard, Vancouver, ISO e altri

25

Hennerdal, Aron, e Arne Elofsson. "Rapid membrane protein topology prediction". Stockholms universitet, Institutionen för biokemi och biofysik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-61921.

Testo completo

Abstract (sommario):

State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: TOPCONS-single is available as a web server from http://single.topcons.net/ and is also included for local installation from the web site. In addition, consensus-based topology predictions for the entire international protein index (IPI) is available from the web server and will be updated at regular intervals.

Gli stili APA, Harvard, Vancouver, ISO e altri

26

Csaba, Gergely. "Context based bioinformatics". Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-157252.

Testo completo

Abstract (sommario):

The goal of bioinformatics is to develop innovative and practical methods and algorithms for bio- logical questions. In many cases, these questions are driven by new biotechnological techniques, especially by genome and cell wide high throughput experiment studies. In principle there are two approaches: 1. Reduction and abstraction of the question to a clearly deﬁned optimization problem, which can be solved with appropriate and efﬁcient algorithms. 2. Development of context based methods, incorporating as much contextual knowledge as possible in the algorithms, and derivation of practical solutions for relevant biological ques- tions on the high-throughput data. These methods can be often supported by appropriate software tools and visualizations, allowing for interactive evaluation of the results by ex- perts. Context based methods are often much more complex and require more involved algorithmic techniques to get practical relevant and efﬁcient solutions for real world problems, as in many cases already the simpliﬁed abstraction of problems result in NP-hard problem instances. In many cases, to solve these complex problems, one needs to employ efﬁcient data structures and heuristic search methods to solve clearly deﬁned sub-problems using efﬁcient (polynomial) op- timization (such as dynamic programming, greedy, path- or tree-algorithms). In this thesis, we present new methods and analyses addressing open questions of bioinformatics from different contexts by incorporating the corresponding contextual knowledge. The two main contexts in this thesis are the protein structure similarity context (Part I) and net- work based interpretation of high-throughput data (Part II). For the protein structure similarity context Part I we analyze the consistency of gold standard structure classiﬁcation systems and derive a consistent benchmark set usable for different ap- plications. We introduce two methods (Vorolign, PPM) for the protein structure similarity recog- nition problem, based on different features of the structures. Derived from the idea and results of Vorolign, we introduce the concept of contact neighbor- hood potential, aiming to improve the results of protein fold recognition and threading. For the re-scoring problem of predicted structure models we introduce the method Vorescore, clearly improving the fold-recognition performance, and enabling the evaluation of the contact neighborhood potential for structure prediction methods in general. We introduce a contact consistent Vorolign variant ccVorolign further improving the structure based fold recognition performance, and enabling direct optimization of the neighborhood po- tential in the future. Due to the enforcement of contact-consistence, the ccVorolign method has much higher computational complexity than the polynomial Vorolign method - the cost of com- puting interpretable and consistent alignments. Finally, we introduce a novel structural alignment method (PPM) enabling the explicit modeling and handling of phenotypic plasticity in protein structures. We employ PPM for the analysis of effects of alternative splicing on protein structures. With the help of PPM we test the hypothesis, whether splice isoforms of the same protein can lead to protein structures with different folds (fold transitions). In Part II of the thesis we present methods generating and using context information for the interpretation of high-throughput experiments. For the generation of context information of molecular regulations we introduce novel textmin- ing approaches extracting relations automatically from scientiﬁc publications. In addition to the fast NER (named entity recognition) method (syngrep) we also present a novel, fully ontology-based context-sensitive method (SynTree) allowing for the context-speciﬁc dis- ambiguation of ambiguous synonyms and resulting in much better identiﬁcation performance. This context information is important for the interpretation of high-throughput data, but often missing in current databases. Despite all improvements, the results of automated text-mining methods are error prone. The RelAnn application presented in this thesis helps to curate the automatically extracted regula- tions enabling manual and ontology based curation and annotation. For the usage of high-throughput data one needs additional methods for data processing, for example methods to map the hundreds of millions short DNA/RNA fragments (so called reads) on a reference genome or transcriptome. Such data (RNA-seq reads) are the output of next generation sequencing methods measured by sequencing machines, which are becoming more and more efﬁcient and affordable. Other than current state-of-the-art methods, our novel read-mapping method ContextMap re- solves the occurring ambiguities at the ﬁnal step of the mapping process, employing thereby the knowledge of the complete set of possible ambiguous mappings. This approach allows for higher precision, even if more nucleotide errors are tolerated in the read mappings in the ﬁrst step. The consistence between context information of molecular regulations stored in databases and extracted from textmining against measured data can be used to identify and score consistent reg- ulations (GGEA). This method substantially extends the commonly used gene-set based methods such over-representation (ORA) and gene set enrichment analysis (GSEA). Finally we introduce the novel method RelExplain, which uses the extracted contextual knowl- edge and generates network-based and testable hypotheses for the interpretation of high-throughput data.
Bioinformatik befasst sich mit der Entwicklung innovativer und praktisch einsetzbarer Verfahren und Algorithmen für biologische Fragestellungen. Oft ergeben sich diese Fragestellungen aus neuen Beobachtungs- und Messverfahren, insbesondere neuen Hochdurchsatzverfahren und genom- und zellweiten Studien. Im Prinzip gibt es zwei Vorgehensweisen: Reduktion und Abstraktion der Fragestellung auf ein klar definiertes Optimierungsproblem, das dann mit geeigneten möglichst effizienten Algorithmen gelöst wird. Die Entwicklung von kontext-basierten Verfahren, die möglichst viel Kontextwissen und möglichst viele Randbedingungen in den Algorithmen nutzen, um praktisch relevante Lösungen für relvante biologische Fragestellungen und Hochdurchsatzdaten zu erhalten. Die Verfahren können oft durch geeignete Softwaretools und Visualisierungen unterstützt werden, um eine interaktive Auswertung der Ergebnisse durch Fachwissenschaftler zu ermöglichen. Kontext-basierte Verfahren sind oft wesentlich aufwändiger und erfordern involviertere algorithmische Techniken um für reale Probleme, deren simplifizierende Abstraktionen schon NP-hart sind, noch praktisch relevante und effiziente Lösungen zu ermöglichen. Oft werden effiziente Datenstrukturen und heuristische Suchverfahren benötigt, die für klar umrissene Teilprobleme auf effiziente (polynomielle) Optimierungsverfahren (z.B. dynamische Programmierung, Greedy, Wege- und Baumverfahren) zurückgreifen und sie entsprechend für das Gesamtverfahren einsetzen. In dieser Arbeit werden eine Reihe von neuen Methoden und Analysen vorgestellt um offene Fragen der Bioinformatik aus verschiedenen Kontexten durch Verwendung von entsprechendem Kontext-Wissen zu adressieren. Die zwei Hauptkontexte in dieser Arbeit sind (Teil 1) die Ähnlichkeiten von 3D Protein Strukturen und (Teil 2) auf die netzwerkbasierte Interpretation von Hochdurchsatzdaten. Im Proteinstrukturkontext Teil 1 analysieren wir die Konsistenz der heute verfügbaren Goldstandards für Proteinstruktur-Klassifikationen, und leiten ein vielseitig einsetzbares konsistentes Benchmark-Set ab. Für eine genauere Bestimmung der Ähnlichkeit von Proteinstrukturen beschreiben wir zwei Methoden (Vorolign, PPM), die unterschiedliche Strukturmerkmale nutzen. Ausgehend von den für Vorolign erzielten Ergebnissen, führen wir Kontakt-Umgebungs-Potentiale mit dem Ziel ein, Fold-Erkennung (auf Basis der vorhandenen Strukturen) und Threading (zur Proteinstrukturvorhersage) zu verbessern. Für das Problem des Re-scorings von vorhergesagten Strukturmodellen beschreiben wir das Vorescore Verfahren ein, mit dem die Fold-Erkennung deutlich verbessert, aber auch die Anwendbarkeit von Potentialen im Allgemeinen getested werden kann. Zur weiteren Verbesserung führen wir eine Kontakt-konsistente Vorolign Variante (ccVorolign) ein, die wegen der neuen Konsistenz-Randbedingung erheblich aufwÃ¤ndiger als das polynomielle Vorolignverfahren ist, aber eben auch interpretierbare konsistente Alignments liefert. Das neue Strukturalignment Verfahren (PPM) erlaubt es phänotypische Plastizität, explizit zu modellieren und zu berücksichtigen. PPM wird eingesetzt, um die Effekte von alternativem Splicing auf die Proteinstruktur zu untersuchen, insbesondere die Hypothese, ob Splice-Isoformen unterschiedliche Folds annehmen können (Fold-Transitionen). Im zweiten Teil der Arbeit werden Verfahren zur Generierung von Kontextinformationen und zu ihrer Verwendung für die Interpretation von Hochdurchsatz-Daten vorgestellt. Neue Textmining Verfahren extrahieren aus wissenschaftlichen Publikationen automatisch molekulare regulatorische Beziehungen und entsprechende Kontextinformation. Neben schnellen NER (named entity recognition) Verfahren (wie syngrep) wird auch ein vollständig Ontologie-basiertes kontext-sensitives Verfahren (SynTree) eingeführt, das es erlaubt, mehrdeutige Synonyme kontext-spezifisch und damit wesentlich genauer aufzulösen. Diese für die Interpretation von Hochdurchsatzdaten wichtige Kontextinformation fehlt häufig in heutigen Datenbanken. Automatische Verfahren produzieren aber trotz aller Verbesserungen noch viele Fehler. Mithilfe unserer Applikation RelAnn können aus Texten extrahierte regulatorische Beziehungen ontologiebasiert manuell annotiert und kuriert werden. Die Verwendung aktueller Hochdurchsatzdaten benötigt zusätzliche Ansätze für die Datenprozessierung, zum Beispiel für das Mapping von hunderten von Millionen kurzer DNA/RNA Fragmente (sog. reads) auf Genom oder Transkriptom. Diese Daten (RNA-seq) ergeben sich durch next generation sequencing Methoden, die derzeit mit immer leistungsfähigeren Geräten immer kostengünstiger gemessen werden können. In der ContextMap Methode werden im Gegensatz zu state-of-the-art Verfahren die auftretenden Mehrdeutigkeiten erst am Ende des Mappingprozesses aufgelöst, wenn die Gesamtheit der Mappinginformationen zur Verfügung steht. Dadurch könenn mehr Fehler beim Mapping zugelassen und trotzdem höhere Genauigkeit erreicht werden. Die Konsistenz zwischen der Kontextinformation aus Textmining und Datenbanken sowie den gemessenen Daten kann dann für das Auffinden und Bewerten von konsistente Regulationen (GGEA) genutzt werden. Dieses Verfahren stellt eine wesentliche Erweiterung der häufig verwendeten Mengen-orientierten Verfahren wie overrepresentation (ORA) und gene set enrichment analysis (GSEA) dar. Zuletzt stellen wir die Methode RelExplain vor, die aus dem extrahierten Kontextwissen netzwerk-basierte, testbare Hypothesen für die Erklärung von Hochdurchsatzdaten generiert.

Gli stili APA, Harvard, Vancouver, ISO e altri

27

Cingolani, Pablo. "Bioinformatics for epigenomics". Thesis, McGill University, 2009. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=40820.

Testo completo

Abstract (sommario):

Epigenetics refers to reversible, heritable changes in gene regulation that occur without a change in DNA sequence. These changes are usually due to methylation of cytosine bases in DNA. In this work we review existing method- ologies and propose new ones for their use in epigenomics. High throughtput methods to estimate methylation levels were developed as well as methods to make a biological interpretation of the data based on gene sets enrichment. High correlation was obtained between our methylation estimations and ex- perimental data from MeDIP experiments. Our proposed methods for gene sets enrichment performed better than well-known methods.
L’ ́epigenetique d ́ecrit les changements re'versibles et he'ritables de la r ́egulation g ́enique qui arrivent sans changements dans la s ́equence d’ADN. Ces change- ments sont habituellement dus `a la m ́ethylation de cytosines dans l’ADN. Dans cette th`ese, nous r ́ecapitulons les m ́ethodes bioinformatiques existantes et nous proposons des nouvelles m ́ethodes pour des probl`emes reli ́es `a l’ ́epig ́en ́etique. Les m ́ethodes a haut d ́ebit pour l’estimation du niveau de m ́ethylation sont d ́evelopp ́ees, de mˆeme que des m ́ethodes pour l’interpr ́etation biologique des donn ́ees en se basant sur l’enrichissement d’ensemble de g`enes de la mˆeme fonction. De hauts niveaux de corr ́elation sont obtenus entre nos estim ́es et les donn ́ees exp ́erimentales provenant d’exp ́eriences de type MeDIP. Les m ́ethodes que nous proposons pour l’analyse d’enrichissement de fonction des g`enes performent mieux que les autres m ́ethodes existantes.

Gli stili APA, Harvard, Vancouver, ISO e altri

28

LEMOS, MELISSA. "WORKFLOW FOR BIOINFORMATICS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2004. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=5928@1.

Testo completo

Abstract (sommario):

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Os projetos para estudo de genomas partem de uma fase de sequenciamento onde são gerados em laboratório dados brutos, ou seja, sequências de DNA sem significado biológico. As sequências de DNA possuem códigos responsáveis pela produção de proteínas e RNAs, enquanto que as proteínas participam de todos os fenômenos biológicos, como a replicação celular, produção de energia, defesa imunológica, contração muscular, atividade neurológica e reprodução. As sequências de DNA, RNA e proteínas são chamadas nesta tese de biossequências. Porém, o grande desafio destes projetos consiste em analisar essas biossequências, e obter informações biologicamente relevantes. Durante a fase de análise, os pesquisadores usam diversas ferramentas, programas de computador, e um grande volume de informações armazenadas em fontes de dados de Biologia Molecular. O crescente volume e a distribuição das fontes de dados e a implementação de novos processos em Bioinformática facilitaram enormemente a fase de análise, porém criaram uma demanda por ferramentas e sistemas semi-automáticos para lidar com tal volume e complexidade. Neste cenário, esta tese aborda o uso de workflows para compor processos de Bioinformática, facilitando a fase de análise. Inicialmente apresenta uma ontologia modelando processos e dados comumente utilizados em Bioinformática. Esta ontologia foi derivada de um estudo cuidadoso, resumido na tese, das principais tarefas feitas pelos pesquisadores em Bioinformática. Em seguida, a tese propõe um framework para um sistema de gerência de análises em biossequências, composto por dois sub-sistemas. O primeiro é um sistema de gerência de workflows de Bioinformática, que auxilia os pesquisadores na definição, validação, otimização e execução de workflows necessários para se realizar as análises. O segundo é um sistema de gerência de dados em Bioinformática, que trata do armazenamento e da manipulação dos dados envolvidos nestas análises. O framework inclui um gerente de ontologias, armazenando ontologias para Bioinformática, nos moldes da apresentada anteriormente. Por fim, a tese descreve instanciações do framework para três tipos de ambiente de trabalho comumente encontrados e sugestivamente chamados de ambiente pessoal, ambiente de laboratório e ambiente de comunidade. Para cada um destes ambientes, a tese discute em detalhe os aspectos particulares da execução e otimização de workflows.
Genome projects usually start with a sequencing phase, where experimental data, usually DNA sequences, is generated, without any biological interpretation. DNA sequences have codes which are responsible for the production of protein and RNA sequences, while protein sequences participate in all biological phenomena, such as cell replication, energy production, immunological defense, muscular contraction, neurological activity and reproduction. DNA, RNA and protein sequences are called biosequences in this thesis. The fundamental challenge researchers face lies exactly in analyzing these sequences to derive information that is biologically relevant. During the analysis phase, researchers use a variety of analysis programs and access large data sources holding Molecular Biology data. The growing number of Bioinformatics data sources and analysis programs indeed enormously facilitated the analysis phase. However, it creates a demand for systems that facilitate using such computational resources. Given this scenario, this thesis addresses the use of workflows to compose Bioinformatics analysis programs that access data sources, thereby facilitating the analysis phase. An ontology modeling the analysis program and data sources commonly used in Bioinformatics is first described. This ontology is derived from a careful study, also summarized in the thesis, of the computational resources researchers in Bioinformatics presently use. A framework for biosequence analysis management systems is next described. The system is divided into two major components. The first component is a Bioinformatics workflow management system that helps researchers define, validate, optimize and run workflows combining Bioinformatics analysis programs. The second component is a Bioinformatics data management system that helps researchers manage large volumes of Bioinformatics data. The framework includes an ontology manager that stores Bioinformatics ontologies, such as that previously described. Lastly, instantiations for the Bioinformatics workflow management system framework are described. The instantiations cover three types of working environments commonly found and suggestively called personal environment, laboratory environment and community environment. For each of these instantiations, aspects related to workflow optimization and execution are carefully discussed.

Gli stili APA, Harvard, Vancouver, ISO e altri

29

Gnad, Florian. "Bioinformatics of phosphoproteomics". Diss., kostenfrei, 2008. http://edoc.ub.uni-muenchen.de/9303/.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

30

Jauhiainen, Alexandra. "Evaluation and Development of Methods for Identification of Biochemical Networks". Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2811.

Testo completo

Abstract (sommario):

Systems biology is an area concerned with understanding biology on a systems level, where structure and dynamics of the system is in focus. Knowledge about structure and dynamics of biological systems is fundamental information about cells and interactions within cells and also play an increasingly important role in medical applications.

System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists.

This is a master thesis in systems biology treating identification of biochemical systems. Methods based on both local parameter perturbation data and time series data have been tested and evaluated in silico.

The advantage of local parameter perturbation data methods proved to be that they demand less complex data, but the drawbacks are the reduced information content of this data and sensitivity to noise. Methods employing time series data are generally more robust to noise but the lack of available data limits the use of these methods.

The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at the Department of Physics and Measurement Technology, Biology, and Chemistry at Linköping University during the autumn of 2004.

Gli stili APA, Harvard, Vancouver, ISO e altri

31

Andrade, Jorge. "Grid and High-Performance Computing for Applied Bioinformatics". Doctoral thesis, Stockholm : Bioteknologi, Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4573.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

32

Bresell, Anders. "Characterization of protein families, sequence patterns, and functional annotations in large data sets". Doctoral thesis, Linköping : Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10565.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

33

Gold, David L. "Bayesian learning in bioinformatics". [College Station, Tex. : Texas A&M University, 2007. http://hdl.handle.net/1969.1/ETD-TAMU-1624.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

34

Peng, Zeshan. "Structure comparison in bioinformatics". Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B36271299.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

35

Baladandayuthapani, Veerabhadran. "Bayesian methods in bioinformatics". Texas A&M University, 2005. http://hdl.handle.net/1969.1/4856.

Testo completo

Abstract (sommario):

This work is directed towards developing flexible Bayesian statistical methods in the semi- and nonparamteric regression modeling framework with special focus on analyzing data from biological and genetic experiments. This dissertation attempts to solve two such problems in this area. In the first part, we study penalized regression splines (P-splines), which are low-order basis splines with a penalty to avoid under- smoothing. Such P-splines are typically not spatially adaptive, and hence can have trouble when functions are varying rapidly. We model the penalty parameter inherent in the P-spline method as a heteroscedastic regression function. We develop a full Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech- niques for drawing random samples from the posterior for inference. We show that the approach achieves very competitive performance as compared to other methods. The second part focuses on modeling DNA microarray data. Microarray technology enables us to monitor the expression levels of thousands of genes simultaneously and hence to obtain a better picture of the interactions between the genes. In order to understand the biological structure underlying these gene interactions, we present a hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also between genes and disease status. The novelty of the approach lies in the attempt to capture the complex nonlinear dependencies between the genes which could otherwise be missed by linear approaches. The Bayesian model is flexible enough to identify significant genes of interest as well as model the functional relationships between the genes. The effectiveness of the proposed methodology is illustrated on leukemia and breast cancer datasets.

Gli stili APA, Harvard, Vancouver, ISO e altri

36

French, Leon Hayes. "Bioinformatics for neuroanatomical connectivity". Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/40369.

Testo completo

Abstract (sommario):

Neuroscience research is increasingly dependent on bringing together large amounts of data collected at the molecular, anatomical, functional and behavioural levels. This data is disseminated in scientific articles and large online databases. I utilized these large resources to study the wiring diagram of the brain or ‘connectome’. The aims of this thesis were to automatically collect large amounts of connectivity knowledge and to characterize relationships between connectivity and gene expression in the rodent brain. To extract the knowledge embedded in the neuroscience literature I created the first corpus of neuroscience abstracts annotated for brain regions and their connections. These connections describe long distance or macroconnectivity between brain regions. The collection of over 1,300 abstracts allowed accurate training of machine learning classifiers that mark brain region mentions (76% recall at 81% precision) and neuroanatomical connections between regions (50% sentence level recall at 70% precision). By automatically extracting connectivity statements from the Journal of Comparative Neurology I generated a literature based connectome of over 28,000 connections. Evaluations revealed that a large number of brain region descriptions are not found in existing lexicons. To address this challenge I developed novel methods that allow mapping of brain region terms to enclosing structures. To further study the connectome I moved from scientific articles to large online databases. By employing resources for gene expression and connectivity I showed that patterns of gene expression correlate with connectivity. First, two spatially anti-correlated patterns of mouse brain gene expression were identified. These signatures are associated with differences in expression of neuronal and oligodendrocyte markers, suggesting they reflect regional differences in cellular populations. Expression level of these genes is correlated with connectivity degree, with regions expressing the neuron-enriched pattern having more incoming and outgoing connections with other regions. Finally, relationships between profiles of gene expression and connectivity were tested. Specifically, I showed that brain regions with similar expression profiles tend to have similar connectivity profiles. Further, optimized sets of connectivity linked genes are associated with neuronal development, axon guidance and autistic spectrum disorder. This demonstration of text mining and large scale analysis provides new foundations for neuroinformatics.

Gli stili APA, Harvard, Vancouver, ISO e altri

37

Peng, Zeshan, e 彭澤山. "Structure comparison in bioinformatics". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B36271299.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

38

GOMES, LUCIANA DA SILVA ALMENDRA. "PROVENANCE FOR BIOINFORMATICS WORKFLOWS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2011. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=18566@1.

Testo completo

Abstract (sommario):

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Muitos experimentos científicos são elaborados como fluxos de tarefas computacionais, que podem ser implementados através do uso de linguagens de programação. Na área de bioinformática é muito comum o uso de scripts ad-hoc para construir fluxos de tarefas. Os Sistemas de Gerência de Workflow Científico (SGWC) surgiram como uma alternativa a estes scripts. Uma das funcionalidades desses sistemas que têm recebido bastante atenção pela comunidade científica é a captura automática de dados de proveniência. Estes permitem averiguar quais foram os recursos e parâmetros utilizados na geração dos resultados, dentre muitas outras informações indispensáveis para a validação e publicação de um experimento. Neste trabalho foram levantados alguns desafios na área de proveniência de dados em SGWCs, como por exemplo (i) a heterogeneidade de formas de representação dos dados nos diferentes sistemas, dificultando a compreensão e a interoperabilidade; (ii) o armazenamento de dados consumidos e produzidos e (iii) a reprodutibilidade de uma execução específica. Estes desafios motivaram a elaboração de um esquema conceitual de proveniência de dados para a representação de workflows. Foi implementada também uma extensão em um SGWC específico (BioSide) para incluir dados de proveniência e armazená-los utilizando o esquema conceitual proposto. Foram priorizados neste trabalho alguns requisitos comumente encontrados em workflows de Bioinformática.
Many scientific experiments are designed as computational workflows, which can be implemented using traditional programming languages. In the Bioinformatics domain ad-hoc scripts are often used to build workflows. Scientific Workflow Management Systems (SWMS) have emerged as an alternative to those scripts. One particular SWMS feature that has received much attention by the scientific community is the automatic capture of provenance data. These allow users to track which resources and parameters were used to obtain the results, among many other required information to validate and publish an experiment. In the present work we have elicited some data provenance challenges in the SWMS context, such as (i) the heterogeneity of data representation schemes that hinders the understanding and interoperability; (ii) the storage of consumed and produced data and (iii) the reproducibility of a specific execution. These challenges have motivated the proposal of a data provenance conceptual scheme for workflow representation. We have implemented an extension of a particular SWMS system (Bioside) to include provenance data and store them using the proposed conceptual scheme. We have focused on some requirements commonly found in bioinformatics workflows.

Gli stili APA, Harvard, Vancouver, ISO e altri

39

Birney, Ewan. "Sequence alignment in bioinformatics". Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621653.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

40

Mumtaz, Shahzad. "Visualisation of bioinformatics datasets". Thesis, Aston University, 2015. http://publications.aston.ac.uk/25261/.

Testo completo

Abstract (sommario):

Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.

Gli stili APA, Harvard, Vancouver, ISO e altri

41

Petty, Emma Marie. "Shape analysis in bioinformatics". Thesis, University of Leeds, 2009. http://etheses.whiterose.ac.uk/822/.

Testo completo

Abstract (sommario):

In this thesis we explore two main themes, both of which involve proteins. The first area of research focuses on the analyses of proteins displayed as spots on 2-dimensional planes. The second area of research focuses on a specific protein and how interactions with this protein can naturally prevent or, in the presence of a pesticide, cause toxicity. The first area of research builds on previously developed EM methodology to infer the matching and transformation necessary to superimpose two partially labelled point configurations, focusing on the application to 2D protein images. We modify the methodology to account for the possibility of missing and misallocated markers, where markers make up the labelled proteins manually located across images. We provide a way to account for the likelihood of an increased edge variance within protein images. We find that slight marker misallocations do not greatly influence the final output superimposition when considering data simulated to mimic the given dataset. The methodology is also successfully used to automatically locate and remove a grossly misallocated marker within the given dataset before further analyses is carried out. We develop a method to create a union of replicate images, which can then be used alone in further analyses to reduce computational expense. We describe how the data can be modelled to enable the inference on the quality of a dataset, a property often overlooked in protein image analysis. To complete this line of research we provide a method to rank points that are likely to be present in one group of images but absent in a second group. The produced score is used to highlight the proteins that are not present in both image sets representing control or diseased tissue, therefore providing biological indicators which are vitally important to improve the accuracy of diagnosis. In the second area of research, we test the hypothesis that pesticide toxicity is related to the shape similarity between the pesticide molecule itself and the natural ligand of the protein to which a pesticide will bind (and ultimately cause toxicity). A ligand of aprotein is simply a small molecule that will bind to that protein. It seems intuitive that the similarities between a naturally formed ligand and a synthetically developed ligand (the pesticide) may be an indicator of how well a pesticide and the protein bind, as well as provide an indicator of pesticide toxicity. A graphical matching algorithm is used to infer the atomic matches across ligands, with Procrustes methodology providing the final superimposition before a measure of shape similarity is defined considering the aligned molecules. We find evidence that the measure of shape similarity does provide a significant indicator of the associated pesticide toxicity, as well as providing a more significant indicator than previously found biological indicators. Previous research has found that the properties of a molecule in its bioactive form are more suitable indicators of an associated activity. Here, these findings dictate that the docked conformation of a pesticide within the protein will provide more accurate indicators of the associated toxicity. So next we use a docking program to predict the docked conformation of a pesticide. We provide a technique to calculate the similarity between the docks of both the pesticide and the natural ligand. A similar technique is used to provide a measure for the closeness of fit between a pesticide and the protein. Both measures are then considered as independent variables for the prediction of toxicity. In this case the results show potential for the calculated variables to be useful toxicity predictors, though further analysis is necessary to properly explore their significance.

Gli stili APA, Harvard, Vancouver, ISO e altri

42

Profiti, Giuseppe <1980&gt. "Graph algorithms for bioinformatics". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amsdottorato.unibo.it/6914/1/profiti_giuseppe_tesi.pdf.

Testo completo

Abstract (sommario):

Biological data are inherently interconnected: protein sequences are connected to their annotations, the annotations are structured into ontologies, and so on. While protein-protein interactions are already represented by graphs, in this work I am presenting how a graph structure can be used to enrich the annotation of protein sequences thanks to algorithms that analyze the graph topology. We also describe a novel solution to restrict the data generation needed for building such a graph, thanks to constraints on the data and dynamic programming. The proposed algorithm ideally improves the generation time by a factor of 5. The graph representation is then exploited to build a comprehensive database, thanks to the rising technology of graph databases. While graph databases are widely used for other kind of data, from Twitter tweets to recommendation systems, their application to bioinformatics is new. A graph database is proposed, with a structure that can be easily expanded and queried.

Gli stili APA, Harvard, Vancouver, ISO e altri

43

Profiti, Giuseppe <1980&gt. "Graph algorithms for bioinformatics". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amsdottorato.unibo.it/6914/.

Testo completo

Abstract (sommario):

Biological data are inherently interconnected: protein sequences are connected to their annotations, the annotations are structured into ontologies, and so on. While protein-protein interactions are already represented by graphs, in this work I am presenting how a graph structure can be used to enrich the annotation of protein sequences thanks to algorithms that analyze the graph topology. We also describe a novel solution to restrict the data generation needed for building such a graph, thanks to constraints on the data and dynamic programming. The proposed algorithm ideally improves the generation time by a factor of 5. The graph representation is then exploited to build a comprehensive database, thanks to the rising technology of graph databases. While graph databases are widely used for other kind of data, from Twitter tweets to recommendation systems, their application to bioinformatics is new. A graph database is proposed, with a structure that can be easily expanded and queried.

Gli stili APA, Harvard, Vancouver, ISO e altri

44

Bertoldi, Loris. "Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data". Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3421950.

Testo completo

Abstract (sommario):

In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors.
Nell’ultimo decennio, l’enorme diminuzione del costo del sequenziamento dovuto allo sviluppo di tecnologie ad alto rendimento ha completamente rivoluzionato il modo di approcciare i problemi genetici. In particolare, il sequenziamento dell’intero esoma e dell’intero genoma stanno contribuendo ad un progresso straordinario nello studio delle varianti genetiche umane, aprendo nuove prospettive nella medicina personalizzata. Essendo un campo relativamente nuovo e in rapido sviluppo, strumenti appropriati e conoscenze specializzate sono richieste per un’efficiente produzione e analisi dei dati. Per rimanere al passo con i tempi, nel 2014, l’Università degli Studi di Padova ha finanziato il progetto strategico BioInfoGen con l’obiettivo di sviluppare tecnologie e competenze nella bioinformatica e nella biologia molecolare applicate alla genomica personalizzata. Lo scopo del mio dottorato è stato quello di contribuire a questa sfida, implementando una serie di strumenti innovativi, al fine di applicarli per investigare e possibilmente risolvere i casi studio inclusi all’interno del progetto. Inizialmente ho sviluppato una pipeline per analizzare i dati Illumina, capace di eseguire in sequenza tutti i processi necessari per passare dai dati grezzi alla scoperta delle varianti sia germinali che somatiche. Le prestazioni del sistema sono state testate mediante controlli interni e tramite la sua applicazione su un gruppo di pazienti affetti da tumore gastrico, ottenendo risultati interessanti. Dopo essere state chiamate, le varianti devono essere annotate al fine di definire alcune loro proprietà come la posizione a livello del trascritto e della proteina, l’impatto sulla sequenza proteica, la patogenicità, ecc. Poiché la maggior parte degli annotatori disponibili presentavano errori sistematici che causavano una bassa coerenza nell’annotazione finale, ho implementato VarPred, un nuovo strumento per l’annotazione delle varianti, che garantisce la migliore accuratezza (>99%) comparato con lo stato dell’arte, mostrando allo stesso tempo buoni tempi di esecuzione. Per facilitare l’utilizzo di VarPred, ho sviluppato un’interfaccia web molto intuitiva, che permette non solo la visualizzazione grafica dei risultati, ma anche una semplice strategia di filtraggio. Inoltre, per un’efficace prioritizzazione mediata dall’utente delle varianti umane, ho sviluppato QueryOR, una piattaforma web adatta alla ricerca all’interno dei geni causativi, ma utile anche per trovare nuove associazioni gene-malattia. QueryOR combina svariate caratteristiche innovative che lo rendono comprensivo, flessibile e facile da usare. La prioritizzazione è raggiunta tramite un processo di selezione positiva che fa emergere le varianti maggiormente significative, piuttosto che filtrare quelle che non soddisfano i criteri imposti. QueryOR è stato usato per analizzare i due casi studio inclusi all’interno del progetto BioInfoGen. In particolare, ha permesso di scoprire le varianti causative dei pazienti affetti da malattie da accumulo lisosomiale, evidenziando inoltre l’efficacia del pannello di sequenziamento sviluppato. Dall’altro lato invece QueryOR ha semplificato l’individuazione del gene LRP2 come possibile candidato per spiegare i soggetti con un fenotipo simile alla malattia di Dent, ma senza alcuna mutazione nei due geni precedentemente descritti come causativi, CLCN5 e OCRL. Come corollario finale, è stata effettuata un’analisi estensiva su varianti esomiche ricorrenti, mostrando come la loro origine possa essere principalmente spiegata da imprecisioni nel genoma di riferimento, tra cui regioni mal assemblate e basi non corrette, piuttosto che da errori piattaforma-specifici.

Gli stili APA, Harvard, Vancouver, ISO e altri

45

Malm, Patrik. "Development of a hierarchical k-selecting clustering algorithm – application to allergy". Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10273.

Testo completo

Abstract (sommario):

The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package.

Gli stili APA, Harvard, Vancouver, ISO e altri

46

Enroth, Stefan. "The Nucleosome as a Signal Carrying Unit : From Experimental Data to Combinatorial Models of Transcriptional Control". Doctoral thesis, Uppsala universitet, Centrum för bioinformatik, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-129181.

Testo completo

Abstract (sommario):

The human genome consists of over 3 billion nucleotides and would be around 2 meters long if uncoiled and laid out. Each human somatic cell contains all this in their nucleus which is only around 5 µm across. This extreme compaction is largely achieved by wrapping the DNA around a histone octamer, the nucleosome. Still, the DNA is accessible to the transcriptional machinery and this regulation is highly dynamic and change rapidly with, e.g. exposure to drugs. The individual histone proteins can carry specific modifications such as methylations and acetylations. These modifications are a major part of the epigenetic status of the DNA which contributes significantly to the transcriptional status of a gene - certain modifications repress transcription and others are necessary for transcription to occur. Specific histone methylations and acetylations have also been implicated in more detailed regulation such as inclusion/exclusion of individual exons, i.e. splicing. Thus, the nucleosome is involved in chromatin remodeling and transcriptional regulation – both directly from steric hindrance but also as a signaling platform via the epigenetic modifications. In this work, we have developed tools for storage (Paper I) and normalization (Paper II) of next generation sequencing data in general, and analyzed nucleosome locations and histone modification in particular (Paper I, III and IV). The computational tools developed allowed us as one of the first groups to discover well positioned nucleosomes over internal exons in such wide spread organisms as worm, mouse and human. We have also provided biological insight into how the epigenetic histone modifications can control exon expression in a combinatorial way. This was achieved by applying a Monte Carlo feature selection system in combination with rule based modeling of exon expression. The constructed model was validated on data generated in three additional cell types suggesting a general mechanism.

Gli stili APA, Harvard, Vancouver, ISO e altri

47

Lindskog, Mats. "Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterization". Doctoral thesis, KTH, School of Biotechnology (BIO), 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-527.

Testo completo

Abstract (sommario):

Following the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures.

This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated.

Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (Populus trichocarpa), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (P. tremula x tremuloides) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history.

Gli stili APA, Harvard, Vancouver, ISO e altri

48

Lysholm, Fredrik. "Structural characterization of overrepresented". Thesis, Linköping University, The Department of Physics, Chemistry and Biology, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-12325.

Testo completo

Abstract (sommario):

Background: Through the last decades vast amount of sequence information have been produced by various protein sequencing projects, which enables studies of sequential patterns. One of the bestknown efforts to chart short peptide sequences is the Prosite pattern data bank. While sequential patterns like those of Prosite have proved very useful for classifying protein families, functions etc. structural analysis may provide more information and possible crucial clues linked to protein folding. Today PDB, which is the main repository for protein structure, contains more than 50’000 entries which enables structural protein studies.

Result: Strongly folded pentapeptides, defined as pentapeptides which retained a specific conformation in several significantly structurally different proteins, were studied out of PDB. Among these several groups were found. Possibly the most well defined is the “double Cys” pentapeptide group, with two amino acids in between (CXXCX|XCXXC) which were found to form backbone loops where the two Cysteine amino acids formed a possible Cys-Cys bridge. Other structural motifs were found both in helixes and in sheets like "ECSAM" and "TIKIW", respectively.

Conclusion: There is much information to be extracted by structural analysis of pentapeptides and other oligopeptides. There is no doubt that some pentapeptides are more likely to obtain a specific fold than others and that there are many strongly folded pentapeptides. By combining the usage of such patterns in a protein folding model, such as the Hydrophobic-polar-model improvements in speed and accuracy can be obtained. Comparing structural conformations for important overrepresented pentapeptides can also help identify and refine both structural information data banks such as SCOP and sequential pattern data banks such as Prosite.

Gli stili APA, Harvard, Vancouver, ISO e altri

49

Lingemark, Maria. "A Lexicon for Gene Normalization". Thesis, Department of Computer and Information Science, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-20250.

Testo completo

Abstract (sommario):

Researchers tend to use their own or favourite gene names in scientific literature, even though there are official names. Some names may even be used for more than one gene. This leads to problems with ambiguity when automatically mining biological literature. To disambiguate the gene names, gene normalization is used. In this thesis, we look into an existing gene normalization system, and develop a new method to find gene candidates for the ambiguous genes. For the new method a lexicon is created, using information about the gene names, symbols and synonyms from three different databases. The gene mention found in the scientific literature is used as input for a search in this lexicon, and all genes in the lexicon that match the mention are returned as gene candidates for that mention. These candidates are then used in the system's disambiguation step. Results show that the new method gives a better over all result from the system, with an increase in precision and a small decrease in recall.

Gli stili APA, Harvard, Vancouver, ISO e altri

50

Viklund, Håkan. "Formalizing life : Towards an improved understanding of the sequence-structure relationship in alpha-helical transmembrane proteins". Doctoral thesis, Stockholm University, Department of Biochemistry and Biophysics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-7144.

Testo completo

Abstract (sommario):

Genes coding for alpha-helical transmembrane proteins constitute roughly 25% of the total number of genes in a typical organism. As these proteins are vital parts of many biological processes, an improved understanding of them is important for achieving a better understanding of the mechanisms that constitute life.

All proteins consist of an amino acid sequence that fold into a three-dimensional structure in order to perform its biological function. The work presented in this thesis is directed towards improving the understanding of the relationship between sequence and structure for alpha-helical transmembrane proteins. Specifically, five original methods for predicting the topology of alpha-helical transmembrane proteins have been developed: PRO-TMHMM, PRODIV-TMHMM, OCTOPUS, Toppred III and SCAMPI.

A general conclusion from these studies is that approaches that use multiple sequence information achive the best prediction accuracy. Further, the properties of reentrant regions have been studied, both with respect to sequence and structure. One result of this study is an improved definition of the topological grammar of transmembrane proteins, which is used in OCTOPUS and shown to further improve topology prediction. Finally, Z-coordinates, an alternative system for representation of topological information for transmembrane proteins that is based on distance to the membrane center has been introduced, and a method for predicting Z-coordinates from amino acid sequence, Z-PRED, has been developed.

Gli stili APA, Harvard, Vancouver, ISO e altri

Tesi sul tema "Bioinformatics"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili