Dissertations / Theses: 'Bioinformatic, Computational Biology, GPCR'

1

Poudel, Sagar. "GPCR-Directed Libraries for High Throughput Screening." Thesis, University of Skövde, School of Humanities and Informatics, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-29.

Full text

Abstract:

Guanine nucleotide binding protein (G-protein) coupled receptors (GPCRs), the largest receptor family, is enormously important for the pharmaceutical industry as they are the target of 50-60% of all existing medicines. Discovery of many new GPCR receptors by the “human genome project”, open up new opportunities for developing novel therapeutics. High throughput screening (HTS) of chemical libraries is a well established method for finding new lead compounds in drug discovery. Despite some success this approach has suffered from the near absence of more focused and specific targeted libraries. To improve the hit rates and to maximally exploit the full potential of current corporate screening collections, in this thesis work, identification and analysis of the critical drug-binding positions within the GPCRs were done, based on their overall sequence, their transmembrane regions and their drug binding fingerprints. A proper classification based on drug binding fingerprints on the basis for a successful pharmacophore modelling and virtual screening were done, which facilities in the development of more specific and focused targeted libraries for HTS.

APA, Harvard, Vancouver, ISO, and other styles

2

Bahena, Silvia. "Computational Methods for the structural and dynamical understanding of GPCR-RAMP interactions." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-416790.

Full text

Abstract:

Protein-protein interaction dominates all major biology processes in living cells. Recent studies suggestthat the surface expression and activity of G protein-coupled receptors (GPCRs), which are the largestfamily of receptors in human cells, can be modulated by receptor activity–modifying proteins (RAMPs). Computational tools are essential to complement experimental approaches for the understanding ofmolecular activity of living cells and molecular dynamics simulations are well suited to providemolecular details of proteins function and structure. The classical atom-level molecular modeling ofbiological systems is limited to small systems and short time scales. Therefore, its application iscomplicated for systems such as protein-protein interaction in cell-surface membrane. For this reason, coarse-grained (CG) models have become widely used and they represent an importantstep in the study of large biomolecular systems. CG models are computationally more effective becausethey simplify the complexity of the protein structure allowing simulations to have longer timescales. The aim of this degree project was to determine if the applications of coarse-grained molecularsimulations were suitable for the understanding of the dynamics and structural basis of the GPCRRAMP interactions in a membrane environment. Results indicate that the study of protein-proteininteractions using CG needs further improvement with a more accurate parameterization that will allowthe study of complex systems.

APA, Harvard, Vancouver, ISO, and other styles

3

Kallberg, Yvonne. "Bioinformatic methods in protein characterization /." Stockholm, 2002. http://diss.kib.ki.se/2002/91-7349-370-8/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Brandström, Mikael. "Bioinformatic analysis of mutation and selection in the vertebrate non-coding genome /." Uppsala : Acta Universitatis Upsaliensis Acta Universitatis Upsaliensis, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-8240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lang, Tiange. "Evolution of transmembrane and gel-forming mucins studied with bioinformatic methods /." Göteborg : The Sahlgrenska Academy at Göteborg University, Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, 2007. http://hdl.handle.net/2077/7502.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

PALOMBO, VALENTINO. "Genomics, Transcriptomics and Computational Biology: new insights into bovine and swine breeding and genetics." Doctoral thesis, Università degli studi del Molise, 2019. http://hdl.handle.net/11695/91489.

Full text

Abstract:

Enormi progressi sono stati fatti nella selezione degli animali per specifici caratteri di interesse zootecnico avvalendosi dei tradizionali approcci di genetica quantitativa. Tuttavia, una considerevole quantità di variabilità fenotipica resta ancora non completamente spiegata; in tal senso una migliore conoscenza delle sue basi molecolari e genetiche rappresenterebbe un ulteriore vantaggio. A tal proposito, il recente sviluppo di tecnologie high-throughput (HT), basate su metodi ad alta specificità di ibridazione e sulle ultime tecniche di sequenziamento (NGS), rappresenta una nuova opportunità per esplorare i più complessi meccanismi biologici. La rapida diffusione di queste tecnologie ha segnato l’inizio dell’era ‘omica’. Gli approcci ‘omici’ si basano sull’analisi complessiva di una specifica classe di molecole contenute in una cellula, un tessuto o un organismo; ovvero sono primariamente indirizzati all’analisi di tutti i geni (genomica), di tutti i trascritti (trascrittomica), di tutte le proteine (proteomica) o di tutti i metaboliti (metabolomica) presenti in un campione biologico. La convizione è che un sistema complesso può essere compreso più a fondo, e più fedelmente, se considerato nella sua globalità. La grandissima mole di dati generata, tuttavia, ha senso soltanto se si è equipaggiati con opportuni strumenti per esplorala. Per questo motivo, di pari passo con tali progressi tecnologici, la bioinformatica, conosciuta anche come biologia computazionale, sta acquisendo progressiva importanza. Anche la zootecnia e il miglioramento genetico si stanno avvalendo delle opportunità offerte da questo nuovo scenario. In particolare, ci si sta spostando dagli approcci tradizionali a quelli che prevedono l’uso integrato di analisi omiche. Ciò permette di meglio investigare e decifrate l’architettura genetica alla base dei caratteri di interesse zootecnico ed utilizzare questa informazione per la selezione dei candidati destinati alla riproduzione. L’obiettivo di questa tesi è stato quello di utilizzare le più innovative analisi genomiche e trascrittomiche per (1) investigare le differenze genetiche alla base del profilo acidico del latte in due razze bovine italiane; (2) individuare i geni e i fattori di trascrizione coinvolti nel controllo della colostrogenesi/lattogenesi suina. A tal fine, sono stati effettuati rispettivamente uno studio di associazione lungo tutto il genoma (GWAS) considerando gli acidi grassi del latte in Frisona e Pezzata Rossa Italiana ed è stato sequenziato il trascrittoma (RNA-Sequencing) di ghiandola mammaria suina. In aggiunta (3) è stato sviluppato un nuovo strumento bioinformatico interamente in R, chiamato PIA (Pathways Interaction Analysis), che consente un’originale analisi delle pathway metaboliche utile ad agevolare l’interpretazione dei risultati genomici e trascrittomici.
Enormous progress has been made in the selection of animals for specific traits using traditional quantitative genetic approaches. Nevertheless, a considerable amount of variation in phenotypes remains unexplained therefore a better knowledge of its genetic basis represents a potential additional gain for animal production. In this regard, the recently developed high-throughput (HT) technologies based on microarray and next-generation sequencing (NGS) methods are a powerful opportunity to prise open the ‘black box’ underlying complex biological processes. These technological advancements have marked the beginning of the ‘omic era’. Broadly, ‘omic’ approaches adopt a holistic view of the molecules that make up a cell, tissue or organism. They are aimed primarily at the universal detection of genes (genomics), RNA (transcriptomics), proteins (proteomics) and metabolites (metabolomics) in a specific biological sample. The basic aspect of these approaches is that a complex system can be understood more thoroughly if considered as a whole. At the same time, the large amount of data generated by these revolutionary approaches makes sense only if one is equipped with the necessary resources and tools to manage and explore it. For this reason, along with HT technical progresses, bioinformatics, often known as computational biology, is gaining immense importance. Animal breeding is gaining new momentum from this renewed scenario. Particularly it pushed to move away from traditional approaches toward systems approaches using integrative analysis of ‘omic’ data to better elucidate the genetic architecture controlling the traits of interest and ultimately use this knowledge for selection of candidates. The aim of this thesis is to (1) investigate the differences of genetic basis related to the milk fatty acids profiles in two Italian dairy cattle breeds and (2) delineate the genes and transcription regulators implicated in the control of the transition from colostrogenesis to lactogenesis in swine, using the state-of-art genomic and transcriptomic analyses. For these reasons, a genome-wide association study (GWAS) on milk fatty acids of Italian Holstein and Italian Simmental cattle breads and an RNASeq study on transcriptional profiles of swine mammary gland are conducted, respectively. In addition, (3) an in-house bioinformatics tool performing an original pathway analysis is presented. The tool, entirely built in R and named PIA (Pathways Interaction Analysis), is designed for post-genomic and transcriptomic data mining.

APA, Harvard, Vancouver, ISO, and other styles

7

Moss, Tiffanie. "CHARACTERIZATION OF STRUCTURAL VARIANTS AND ASSOCIATED MICRORNAS IN FLAX FIBER AND LINSEED GENOTYPES BY BIOINFORMATIC ANALYSIS AND HIGH-THROUGHPUT SEQUENCING." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1333648149.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Santaniello, F. "CHANGES OF REPLICATION TIMING INDUCED BY PML-RARA." Doctoral thesis, Università degli Studi di Milano, 2017. http://hdl.handle.net/2434/469739.

Full text

Abstract:

DNA replication is a cellular process that, starting from precise genomic loci, ensures the loyal and faithful inheritance, from one parental cell to each daughter cell, of the genetic instructions contained in the double-strand DNA molecule. Due to the complexity and the crucial importance of the DNA replication, this process must be tightly regulated in both space and time. Up to now, however, the time-related features of DNA replication, together with the factors that might impact the temporal dimension of this system, are yet poorly studied and described. Given the lack of standard methods able to recognize differences in Replication Timing, we developed an innovative bioinformatic method (DART; Differential Analysis of Replication Timing) to accomplish this task. Moreover, the application of this procedure to our Repli-seq data was instrumental to investigate whether PML-RARα may fulfil its tumorigenic potential by eliciting an alteration of the normal replication timing pace in cells. As a result, we found that, after its expression, PML-RARα indeed exerts a deregulative effect on Replication Timing, inducing some regions to replicate earlier (LtoE-shifted) and some other later (EtoL-shifted), with respect to control cells. We observed a close association between these differentially replicated regions and both pre-existing, and PML-RARα-related, transcriptional status and chromatin structure. Regions presenting a EtoL-shifted replication coincide with ‘active’ chromatin foci enriched for direct down-regulated targets of PML-RARa; at the opposite, regions with a LtoE-shifted Replication Timing show moderate epigenetic ‘active’ features and are enriched for indirect up-regulated targets of PML-RARa.

APA, Harvard, Vancouver, ISO, and other styles

9

Coppe, Alessandro. "A bioinformatic and computational approach to regulation of genome function: integrated analysis of genome organization, promoter sequences and gene expression." Doctoral thesis, Università degli studi di Padova, 2008. http://hdl.handle.net/11577/3426395.

Full text

Abstract:

Although much is known about gene expression regulation in both Prokaryotes and Eukaryotes, this complex and fascinating mechanism still remains to be fully elucidated. The relatively recent advent of high-throughput techniques for studying transcription has made available an invaluable amount of data that can be used for genome-wide analysis using bioinformatics approaches. These computational methods have now become an integrative part of biological research. The different topics of this thesis are related to the development and application of computational methodologies to better understand the basis of genomic gene expression regulation at different levels. A first level of investigation regarded the relationships among chromosomal structure, expression profile and functional characteristics, focusing on genomic organization and structure. For this task, REEF (REgionally Enriched Features) software has been developed, designed to identify genomic regions enriched in specific features, such as a class or group of genes homogeneous for expression and/or functional characteristics. REEF can be used to detect density variations of specific features along the genome sequence, for example genomic regions with significant enrichment of genes which are co-expressed, differentially expressed, or related to particular molecular functions. Local feature enrichment is calculated using test statistic based on the hypergeometric distribution applied genome-wide by sliding windows and false discovery rate is used for controlling multiplicity. REEF has been applied to the study of genomic distribution of tissue-specific genes and to the analysis of gene differentially expressed when comparing different myeloid cell lines. These analyses identified clusters of tissue-specific genes in the human genome and positional enrichment of hemopoietic functional module-related genes. The second level of investigation regarded gene expression regulation at promoter level. Unknown transcription factor binding sites might be detected by searching for shared sequence elements in upstream regulatory regions of genes with common biological function and/or similar expression profile. In fact, genes with similar expression are frequently co-regulated and genes with related function are often similarly expressed. New methodologies for the identification of regulatory motifs in human promoters were developed and tested. Since a drawback of this approach is the exceedingly high number of results, the use of biological knowledge both before and after application of automated pattern discovery allowed the definition of a “sheltered environment” enhancing the specificity of the computational analysis. COOP (Clustering of Overlapping Patterns) software for the extraction of sequence motifs was developed and used to analyze genomic sequences of 1 Kb upstream of 91 retina specific genes, identifying a set of putative regulative motifs, frequently occurring in retina promoter sequences. Most of them are localized in the proximal portion of promoters and tend to be less variable in central region than in lateral regions and some of them are similar to known regulatory sequences. The performances of COOP were further evaluated by simulation approaches and by applying it to a standard positive control dataset, proposed by Tompa and colleagues for systematic evaluation and comparison of pattern discovery software. A webtool for the prediction of functional elements in promoter sequences, MOST (MOtif Searching web Tool), has been applied to different datasets under various testing conditions in order to study the influence of specific search parameters on results. Two groups of promoter sequences containing known regulatory signals were used as positive control datasets: the public yeast benchmark dataset of Tompa and colleagues and a custom produced dataset of 37 human promoter sequences, subgroups of which contained some instances of one of nine different signals. The testing of performances of the method on different benchmark datasets gave quite positive results. Taking the concepts behind COOP to a new level, a more rigorous methodology was developed for the identification of surprising and putatively regulatory motifs, by comparing their frequency in promoters sequences of co-expressed genes with that in a background set of sequences, representative of the whole set of human gene promoters. Promoter sequences are divided in overlapping regions, considered independently, for identifying positional bias in the arrangement of transcription factors binding sites along promoters. Due to the genome-wide characteristics of this approach, a new webtool for the automatic identification and retrieval of a high number of promoters in the human genome was also developed. This motif discovery methodology has been adopted to investigate structure of promoters of genes crucial during myeloid differentiation.

APA, Harvard, Vancouver, ISO, and other styles

10

Favara, David M. "The biology of ELTD1/ADGRL4 : a novel regulator of tumour angiogenesis." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:0d00af0a-bb43-44bc-ba0b-1f8acbe34bc5.

Full text

Abstract:

Background: Our laboratory identified ELTD1, an orphan GPCR belonging to the adhesion GPCR family (aGPCR), as a novel regulator of angiogenesis and a potential anti-cancer therapeutic target. ELTD1 is normally expressed in both endothelial cells and vascular smooth muscle cells and expression is significantly increased in the tumour vasculature. The aim of this project was to analyse ELTD1's function in endothelial cells and its role in breast cancer. Method: 62 sequenced vertebrate genomes were interrogated for ELTD1 conservation and domain alterations. A phylogenetic timetree was assembled to establish time estimates for ELTD1's evolution. After ELTD1 silencing, mRNA array profiling was performed on primary human umbilical vein endothelial cells (HUVECs) and validated with qPCR and confocal microscopy. ELTD1's signalling was investigated by applying the aGPCR âStinger/tethered-agonist Hypothesis'. For this, truncated forms of ELTD1 and peptides analogous to the proposed tethered agonist region were designed. FRET-based 2^nd messenger (Cisbio IP-1;cAMP) and luciferase-reporter assays (NFAT; NFÎoB; SRE; SRF-RE; CREB) were performed to establish canonical GPCR activation. To further investigate ELTD1's role in endothelial cells, ELTD1 was stably overexpressed in HUVECS. Functional angiogenesis assays and mRNA array profiling were then performed. To investigate ELTD1 in breast cancer, a panel of cell lines representative of all molecular subtypes were screened using qPCR. Furthermore, an exploratory pilot study was performed on matched primary and regional nodal secondary breast cancers (n=43) which were stained for ELTD1 expression. Staining intensity was then scored and compared with relapse free survival and overall survival. Results: ELTD1 arose 435 million years ago (mya) in bony fish and is present in all subsequent vertebrates. ELTD1 has 3 evolutionary variants of which 2 are most common: one variant with 3 EGFs and a variant with 2 EGFs. Additionally, ELTD1 may be ancestral to members of aGPCR family 2. HUVEC mRNA expression profiling after ELTD1 silencing showed upregulation of the mitochondrial citrate transporter SLC25A1, and ACLY which converts cytoplasmic citrate to Acetyl CoA, feeding fatty acid and cholesterol synthesis, and acetylation. A review of lipid droplet (fatty acid and cholesterol) accumulation by confocal microscopy and flow cytometry (FACS) revealed no changes with ELTD1 silencing. Silencing was also shown to affect the Notch pathway (downregulating the Notch ligand JAG1 and target gene HES2; upregulating the Notch ligand DLL4) and inducing KIT, a mediator of haematopoietic (HSC) and endothelial stem cell (ESC) maintenance. Signalling experiments revealed that unlike other aGPCRs, ELTD1 does not couple to any canonical GPCR pathways (Gαi, Gαs, Gαq, Gα12/13). ELTD1 overexpression in HUVECS revealed that ELTD1 induces an endothelial tip cell phenotype by promoting sprouting and capillary formation, inhibiting lumen anastomoses in mature vessels and lowering proliferation rate. There was no effect on wound healing or adhesion to angiogenesis associated matrix components. Gene expression changes following ELTD1 overexpression included upregulation of angiogenesis associated ANTRX1 as well as JAG1 and downregulation of migration associated CCL15 as well as KIT and DLL4. In breast cancer, none of the representative breast cancer cell lines screened expressed ELTD1. ELTD1 breast cancer immunohistochemistry revealed higher levels of vascular ELTD1 staining intensity within the tumour stroma contrasted to normal stroma and expression within tumour epithelial cells. Additionally, ELTD1 expression in tumour vessels was differentially expressed between the primary breast cancer microenvironment and that of the matched regional node. Due to the small size of the pilot study population, survival comparisons between the various subgroups did not yield significant results. Conclusion: ELTD1 is a novel regulator of endothelial metabolism through its suppression of ACLY and the related citrate transporter SLC25A1. ELTD1 also represses KIT, which is known to mediate haematopoietic and endothelial progenitors stem cell maintenance, a possible mechanism through which endothelial cells maintain terminal endothelial differentiation. ELTD1 does not signal like other adhesion GPCRS with CTF and FL forms of ELTD1 not signalling canonically. Additionally, ELTD1 regulates various functions of endothelial cell behaviour and function, inducing an endothelial tip cell phenotype and is highly evolutionarily conserved. Lastly, ELTD1 is differentially expressed in tumour vessels between primary breast cancer and regional nodal metastases and is also expressed in a small subset of breast cancer cells in vivo despite no cancer cell lines expressing ELTD1. The pilot study investigating ELTD1 in the primary breast cancer and regional involved nodes will be followed up with a larger study including the investigation of ELTD1 in distant metastases.

APA, Harvard, Vancouver, ISO, and other styles

11

NOTARO, MARCO. "HIERARCHICAL ENSEMBLE METHODS FOR ONTOLOGY-BASED PREDICTIONS IN COMPUTATIONAL BIOLOGY." Doctoral thesis, Università degli Studi di Milano, 2019. http://hdl.handle.net/2434/606185.

Full text

Abstract:

L'annotazione standardizzata di entità biologiche, quali geni e proteine, ha fortemente promosso l'organizzazione dei concetti biologici in vocabolari controllati, cioè ontologie che consentono di indicizzare in modo coerente le relazioni tra le diverse classi funzionali organizzate secondo una gerarchia predefinita. Esempi di ontologie biologiche in cui i termini funzionali sono strutturati secondo un grafo diretto aciclico (DAG) sono la Gene Ontology (GO) e la Human Phenotype Ontology (HPO). Tali tassonomie gerarchiche vengono utilizzate dalla comunità scientifica rispettivamente per sistematizzare le funzioni proteiche di tutti gli organismi viventi dagli Archea ai Metazoa e per categorizzare le anomalie fenotipiche associate a malattie umane. Tali bio-ontologie, offrendo uno spazio di classificazione ben definito, hanno favorito lo sviluppo di metodi di apprendimento per la predizione automatizzata della funzione delle proteine e delle associazioni gene-fenotipo patologico nell'uomo. L'obiettivo di tali metodologie consiste nell'“indirizzare” la ricerca “in-vitro” per favorire una riduzione delle spese ed un uso più efficace dei fondi destinati alla ricerca. Dal punto di vista dell'apprendimento automatico il problema della predizione della funzione delle proteine o delle associazioni gene-fenotipo patologico nell'uomo può essere modellato come un problema di classificazione multi-etichetta strutturato, in cui le predizioni associate ad ogni esempio (i.e., gene o proteina) sono sotto-grafi organizzati secondo una determinata struttura (albero o DAG). A causa della complessità del problema di classificazione, ad oggi l'approccio di predizione più comunemente utilizzato è quello “flat”, che consiste nell'addestrare un classificatore separatamente per ogni termine dell'ontologia senza considerare le relazioni gerarchiche esistenti tra le classi funzionali. L'utilizzo di questo approccio è giustificato non soltanto dal fatto di ridurre la complessità computazionale del problema di apprendimento, ma anche dalla natura “instabile” dei termini che compongono l'ontologia stessa. Infatti tali termini vengono aggiornati mensilmente mediante un processo curato da esperti che si basa sia sulla letteratura scientifica biomedica che su dati sperimentali ottenuti da esperimenti eseguiti “in-vitro” o “in-silico”. In questo contesto, in letteratura sono stati proposti due classi generali di classificatori. Da una parte, si collocano i metodi di apprendimento automatico che predicono le classi funzionali in modo “flat”, ossia senza esplorare la struttura intrinseca dello spazio delle annotazioni. Dall'altra parte, gli approcci gerarchici che, considerando esplicitamente le relazioni gerarchiche fra i termini funzionali dell'ontologia, garantiscono che le annotazioni predette rispettino la “true-path-rule”, la regola biologica che governa le ontologie. Nell'ambito dei metodi gerarchici, in letteratura sono stati proposti due diverse categorie di approcci. La prima si basa su metodi kernelizzati per predizioni con output strutturato, mentre la seconda su metodi di ensemble gerarchici. Entrambi questi metodi presentano alcuni svantaggi. I primi sono computazionalmente pesanti e non scalano bene se applicati ad ontologie biologiche. I secondi sono stati per la maggior parte concepiti per tassonomie strutturate ad albero, e quei pochi approcci specificatamente progettati per ontologie strutturate secondo un DAG, sono nella maggioranza dei casi incapaci di migliorare le performance di predizione dei metodi “flat”. Per superare queste limitazioni, nel presente lavoro di tesi si sono proposti dei nuovi metodi di ensemble gerarchici capaci di fornire predizioni consistenti con la struttura gerarchica dell'ontologia. Tali approcci, da un lato estendono precedenti metodi originariamente sviluppati per ontologie strutturate ad albero ad ontologie organizzate secondo un DAG e dall'altro migliorano significativamente le predizioni rispetto all'approccio “flat” indipendentemente dalla scelta del tipo di classificatore utilizzato. Nella loro forma più generale, gli approcci di ensemble gerarchici sono altamente modulari, nel senso che adottano una strategia di apprendimento a due passi. Nel primo passo, le classi funzionali dell'ontologia vengono apprese in modo indipendente l'una dall'altra, mentre nel secondo passo le predizioni “flat” vengono combinate opportunamente tenendo conto delle gerarchia fra le classi ontologiche. I principali contributi introdotti nella presente tesi sono sia metodologici che sperimentali. Da un punto di vista metodologico, sono stati proposti i seguenti nuovi metodi di ensemble gerarchici: a) HTD-DAG (Hierarchical Top-Down per tassonomie DAG strutturate); b) TPR-DAG (True-Path-Rule per DAG) con diverse varianti algoritmiche; c) ISO-TPR (True-Path-Rule con Regressione Isotonica), un nuovo algoritmo gerarchico che combina la True-Path-Rule con metodi di regressione isotonica. Per tutti i metodi di ensemble gerarchici è stato dimostrato in modo formale la coerenza delle predizioni, cioè è stato provato come gli approcci proposti sono in grado di fornire predizioni che rispettano le relazioni gerarchiche fra le classi. Da un punto di vista sperimentale, risultati a livello dell'intero genoma di organismi modello e dell'uomo ed a livello della totalità delle classi incluse nelle ontologie biologiche mostrano che gli approcci metodologici proposti: a) sono competitivi con gli algoritmi di predizione output strutturata allo stato dell'arte; b) sono in grado di migliorare i classificatori “flat”, a patto che le predizioni fornite dal classificatore non siano casuali; c) sono in grado di predire nuove associazioni tra geni umani e fenotipi patologici, un passo cruciale per la scoperta di nuovi geni associati a malattie genetiche umane e al cancro; d) scalano bene su dataset costituiti da decina di migliaia di esempi (i.e., proteine o geni) e su tassonomie costituite da migliaia di classi funzionali. Infine, i metodi proposti in questa tesi sono stati implementati in una libreria software scritta in linguaggio R, HEMDAG (Hierarchical Ensemble Methods per DAG), che è pubblica, liberamente scaricabile e disponibile per i sistemi operativi Linux, Windows e Macintosh.
The standardized annotation of biomedical related objects, often organized in dedicated catalogues, strongly promoted the organization of biological concepts into controlled vocabularies, i.e. ontologies by which related terms of the underlying biological domain are structured according to a predefined hierarchy. Indeed large ontologies have been developed by the scientific community to structure and organize the gene and protein taxonomy of all the living organisms from Archea to Metazoa, i.e. the Gene Ontology, or human specific ontologies, such as the Human Phenotype Ontology, that provides a structured taxonomy of the abnormal human phenotypes associated with diseases. These ontologies, offering a coded and well-defined classification space for biological entities such as genes and proteins, favor the development of machine learning methods able to predict features of biological objects like the association between a human gene and a disease, with the aim to drive wet lab research allowing a reduction of the costs and a more effective usage of the available research funds. Despite the soundness of the aforementioned objectives, the resulting multi-label classification problems raise so complex machine learning issues that until recently the far common approach was the “flat” prediction, i.e. simply training a classifier for each term in the controlled vocabulary and ignoring the relationships between terms. This approach was not only justified by the need to reduce the computational complexity of the learning task, but also by the somewhat “unstable” nature of the terms composing the controlled vocabularies, because they were (and are) updated on a monthly basis in a process performed by expert curators and based on biomedical literature, and wet and in-silico experiments. In this context, two main general classes of classifiers have been proposed in literature. On the one hand, “hierarchy-unaware” learning methods predict labels in a “flat” way without exploiting the inherent structure of the annotation space. On the other hand, “hierarchy-aware” learning methods can improve the accuracy and the precision of the predictions by considering the hierarchical relationships between ontology terms. Moreover these methods can guarantee the consistency of the predicted labels according to the “true path rule”, that is the biological and logical rule that governs the internal coherence of biological ontologies. To properly handle the hierarchical relationships linking the ontology terms, two main classes of structured output methods have been proposed in literature: the first one is based on kernelized methods for structured output spaces, the second on hierarchical ensemble methods for ontology-based predictions. However both these approaches suffer of significant drawbacks. The kernel-based methods for structured output space are computationally intensive and do not scale well when applied to complex multi-label bio-ontologies. Most hierarchical ensemble methods have been conceived for tree-structured taxonomies and the few ones specifically developed for the prediction in DAG-structured output spaces are, in most cases, unable to improve prediction performances over flat methods. To overcome these limitations, in this thesis novel “ontology-aware” ensemble methods have been developed, able to handle DAG-structured ontologies, leveraging previous results obtained with “true-path-rule”-based hierarchical learning algorithms. These methods are highly modular in the sense that they adopt a “two-step” learning strategy: in the first step they learn separately each term of the ontology using flat methods, and in the second they properly combine the flat predictions according to the hierarchy of the classes. The main contributions of this thesis are both methodological and experimental. From a methodological standpoint, novel hierarchical ensemble methods are proposed, including: a) HTD (Hierarchical Top-Down algorithm for DAG structured ontologies); b) TPR-DAG (True Path Rule ensemble for DAG) with several variants; c) ISO-TPR, a novel ensemble method that combines the True Path Rule approach with Isotonic Regression. For all these methods a formal proof of their consistency, i.e. the guarantee of providing predictions that “respect” the hierarchical relationships between classes, is provided. From an experimental standpoint, extensive genome and ontology-wide results show that the proposed methods: a) are competitive with state-of-the-art prediction algorithms; b) are able to improve flat machine learning classifiers, if the base learners can provide non random predictions; c) are able to predict new associations between genes and human abnormal phenotypes, a crucial step to discover novel genes associated with human diseases ranging from genetic disorders to cancer; d) scale nicely with large datasets and bio-ontologies. Finally HEMDAG, a novel R library implementing the proposed hierarchical ensemble methods has been developed and publicly delivered.

APA, Harvard, Vancouver, ISO, and other styles

12

MASPERO, DAVIDE. "Computational strategies to dissect the heterogeneity of multicellular systems via multiscale modelling and omics data analysis." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2022. http://hdl.handle.net/10281/368331.

Full text

Abstract:

L'eterogeneità pervade i sistemi biologici e si manifesta in differenze strutturali e funzionali osservate sia tra diversi individui di uno stesso gruppo (es. organismi o patologie), sia fra gli elementi costituenti di un singolo individuo (es. cellule). Lo studio dell’eterogeneità dei sistemi biologici e, in particolare, di quelli multicellulari è fondamentale per la comprensione meccanicistica di fenomeni fisiologici e patologici complessi (es. il cancro), così come per la definizione di strategie prognostiche, diagnostiche e terapeutiche efficaci. Questo lavoro è focalizzato sullo sviluppo e l’applicazione di metodi computazionali e modelli matematici per la caratterizzazione dell’eterogeneità di sistemi multicellulari e delle sottopopolazioni di cellule tumorali che sottendono l’evoluzione di una patologia neoplastica. Analoghe metodologie sono state sviluppate per caratterizzare efficacemente l’evoluzione e l’eterogeneità virale. La ricerca è suddivisa in due porzioni complementari, la prima finalizzata alla definizione di metodi per l’analisi e l’integrazione di dati omici generati da esperimenti di sequenziamento, la seconda alla modellazione e simulazione multiscala di sistemi multicellulari. Per quanto riguarda il primo filone, le tecnologie di next-generation sequencing permettono di generare enormi moli di dati omici, relativi per esempio al genoma o trascrittoma di un determinato individuo, attraverso esperimenti di bulk o single-cell sequencing. Una delle sfide principale in informatica è quella di definire metodi computazionali per estrarre informazione utile da tali dati, tenendo conto degli alti livelli di errori dato-specifico, dovuti principalmente a limiti tecnologici. In particolare, nell’ambito di questo lavoro, ci si è concentrati sullo sviluppo di metodi per l’analisi di dati di espressione genica e di mutazioni genomiche. In dettaglio, è stata effettuata una comparazione esaustiva dei metodi di machine-learning per il denoising e l’imputation di dati di single-cell RNA-sequencing. Inoltre, sono stati sviluppati metodi per il mapping dei profili di espressione su reti metaboliche, attraverso un framework innovativo che ha consentito di stratificare pazienti oncologici in base al loro metabolismo. Una successiva estensione del metodo ha permesso di analizzare la distribuzione dei flussi metabolici all'interno di una popolazione di cellule, via un approccio di flux balance analysis. Per quanto riguarda l’analisi dei profili mutazionali, è stato ideato e implementato il primo metodo per la ricostruzione di modelli filogenomici a partire da dati longitudinali a risoluzione single-cell, che sfrutta un framework che combina una Markov Chain Monte Carlo con una nuova funzione di likelihood pesata. Analogamente, è stato sviluppato un framework che sfrutta i profili delle mutazioni a bassa frequenza per ricostruire filogenie robuste e probabili catene di infenzione, attraverso l’analisi dei dati di sequenziamento di campioni virali. Gli stessi profili mutazionali permettono anche di deconvolvere il segnale nelle firme associati a specifici meccanismi molecolari che generano tali mutazioni, attraverso un approccio basato su non-negative matrix factorization. La ricerca condotta per quello che riguarda la simulazione computazionale ha portato allo sviluppo di un modello multiscala, in cui la simulazione della dinamica di popolazioni cellulari, rappresentata attraverso un Cellular Potts Model, è accoppiata all'ottimizzazione di un modello metabolico associato a ciascuna cellula sintetica. Co modello è possibile rappresentare ipotesi in termini matematici e osservare proprietà emergenti da tali assunti. Infine, un primo tentativo per combinare i due approcci metodologici ha condotto all'integrazione di dati di single-cell RNA-seq all'interno del modello multiscala, consentendo di formulare ipotesi data-driven sulle proprietà emergenti del sistema.
Heterogeneity pervades biological systems and manifests itself in the structural and functional differences observed both among different individuals of the same group (e.g., organisms or disease systems) and among the constituent elements of a single individual (e.g., cells). The study of the heterogeneity of biological systems and, in particular, of multicellular systems is fundamental for the mechanistic understanding of complex physiological and pathological phenomena (e.g., cancer), as well as for the definition of effective prognostic, diagnostic, and therapeutic strategies. This work focuses on developing and applying computational methods and mathematical models for characterising the heterogeneity of multicellular systems and, especially, cancer cell subpopulations underlying the evolution of neoplastic pathology. Similar methodologies have been developed to characterise viral evolution and heterogeneity effectively. The research is divided into two complementary portions, the first aimed at defining methods for the analysis and integration of omics data generated by sequencing experiments, the second at modelling and multiscale simulation of multicellular systems. Regarding the first strand, next-generation sequencing technologies allow us to generate vast amounts of omics data, for example, related to the genome or transcriptome of a given individual, through bulk or single-cell sequencing experiments. One of the main challenges in computer science is to define computational methods to extract useful information from such data, taking into account the high levels of data-specific errors, mainly due to technological limitations. In particular, in the context of this work, we focused on developing methods for the analysis of gene expression and genomic mutation data. In detail, an exhaustive comparison of machine-learning methods for denoising and imputation of single-cell RNA-sequencing data has been performed. Moreover, methods for mapping expression profiles onto metabolic networks have been developed through an innovative framework that has allowed one to stratify cancer patients according to their metabolism. A subsequent extension of the method allowed us to analyse the distribution of metabolic fluxes within a population of cells via a flux balance analysis approach. Regarding the analysis of mutational profiles, the first method for reconstructing phylogenomic models from longitudinal data at single-cell resolution has been designed and implemented, exploiting a framework that combines a Markov Chain Monte Carlo with a novel weighted likelihood function. Similarly, a framework that exploits low-frequency mutation profiles to reconstruct robust phylogenies and likely chains of infection has been developed by analysing sequencing data from viral samples. The same mutational profiles also allow us to deconvolve the signal in the signatures associated with specific molecular mechanisms that generate such mutations through an approach based on non-negative matrix factorisation. The research conducted with regard to the computational simulation has led to the development of a multiscale model, in which the simulation of cell population dynamics, represented through a Cellular Potts Model, is coupled to the optimisation of a metabolic model associated with each synthetic cell. Using this model, it is possible to represent assumptions in mathematical terms and observe properties emerging from these assumptions. Finally, we present a first attempt to combine the two methodological approaches which led to the integration of single-cell RNA-seq data within the multiscale model, allowing data-driven hypotheses to be formulated on the emerging properties of the system.

APA, Harvard, Vancouver, ISO, and other styles

13

Suku, Eda. "G-protein coupled receptors activation mechanism: from ligand binding to the transmission of the signal inside the cell." Doctoral thesis, 2019. http://hdl.handle.net/11562/994620.

Full text

Abstract:

G-protein coupled receptors (GPCRs) are the largest family of pharmaceutical drug targets in the human genome and are modulated by a large variety of en- dogenous and synthetic ligands. GPCRs activation usually depends on agonist binding (except for receptors with basal activity), which stabilizes receptor con- formations and allow the requirement and activation of intracellular transducers. GPCRs are unique receptors and very well studied, since they play an important role in a great number of diseases. They interact with different type of ligands (such as light, peptides, proteins) and different partners in the intracellular part (such as G-proteins or β-arrestins). Based on homology and function GPCRs are divided in five classes: Class A or Rhodopsin, Class B1 or Secretin, Class B2 or Adhesion, Class C or Glutamate, Class F or Frizzled. What is still missing in the state of the art of these receptor, and in particular in Class A, is a global study on different binding cavities with divergent properties, with the aim to discover common binding characteristics, preserved during years of evolution. Gaining more knowledge on common features for ligand recognition shared among all the recep- tors may become crucial to deeply understand the mechanism used to transmit the signal into the cell. In the first step of this thesis we have used all the solved Class A receptors structures to analyze and find, if exist, a common way to transmit the signal inside the cell. We identified and validated ten positions shared between all the binding cavities and always involved in the interaction with ligands. We demonstrated that residues in these positions are conserved and have co-evolved together. In a second step, we used these positions to understand how ligands could be positioned in the binding cavities of three study cases: Muscarinic receptors, Kisspeptin receptors and the GPR3 receptor. We did not have any experimental information a priori. We used homology modeling and docking techniques for the first two cases, adding molecular dynamics simulations in the third case. All the predictions and suggestions from the computational point of view, turned out to be very successful. In particular for the GPR3 receptor we were able to identify and validate by alanine-scanning mutagenesis the role of three functionally relevant residues. The latter were correlated with the constitutive and agonist-stimulated adenylate cyclase activity of GPR3 receptor. Taken together, these results suggest an important role of computational structural biology and pave the way of strong collaborations between computational and experimental researches.

APA, Harvard, Vancouver, ISO, and other styles

14

VALASATAVA, YANA. "NEW COMPUTATIONAL APPROACHES TO THE STUDY OF METALS IN BIOLOGY." Doctoral thesis, 2015. http://hdl.handle.net/2158/998429.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Da, Silva Melissa Elizabeth. "A bioinformatic exploration of poxviruses." Thesis, 2007. http://hdl.handle.net/1828/262.

Full text

Abstract:

The overall theme of this dissertation is the genomic analysis of poxviruses using bioinformatics. The first analysis presented in this dissertation (Chapter 2) focuses on a new method for predicting which open reading frames (ORFs) in poxviruses are likely to be expressed. A measure that takes into account the amino acid and purine content of all predicted open reading frames (ORFs) in the genome was developed and when used on the vaccinia virus (VACV) strain Copenhagen genome (training case), the measure had a success rate of 94%. Using the measure on an extremely adenine and thymine rich entomopoxvirus (test case), 241 ORFs were found to be potentially expressed and 51 ORFs were likely not expressed although further biochemical experiments will be required to confirm this result. The second analysis of this dissertation (Chapter 3) focuses on determining the nature of an interesting background pattern similar to a set of stripes that was observed while analyzing a self-dotplot of the molluscum contagiosum virus genome. These stripe regions were further analyzed and were found to have a nucleotide composition and amino acid usage that was different to the remainder of the genome. Given this differing nucleotide and amino acid usage, the genes contained in these stripe regions are thought to have been recently acquired from the host or another virus, making these regions similar to bacterial pathogenicity islands. The third analysis of this dissertation (Chapter 4) focuses on predicting the function of “unknown” poxvirus proteins by using a hidden Markov model (HMM) comparison search tool to scan all “unknown” proteins in the VACV genome looking for any database matches that may have been missed by conventional approaches (BLASTp and PSI-BLAST). One protein, the VACV G5R protein, in this scan showed a promising hit (96% probability) to an archaeal flap endonuclease (FEN-1) protein. A structural model of the G5R protein was created and subsequently compared to the crystal structure of the human FEN-1 protein and was found to be highly conserved in both secondary and tertiary structure and with three of the five main features of the FEN-1 protein including the active site suggesting that the G5R protein should be classified as a flap endonuclease protein. Related to the analysis in Chapter 4, are the results presented in Chapter 5 of this dissertation that focus on locating a protein encoded by the VACV genome that is similar to proliferating cell nuclear antigen (PCNA). Knowing that the FEN-1 protein requires PCNA as an intermediary to contact DNA, the genome of VACV was scanned using InterProScan in order to identify any potential proteins that were similar to PCNA. One protein (VACV G8R) was identified and subsequently modeled and compared to the crystal structure of the human PCNA protein. The secondary and tertiary structure was highly conserved between the two proteins suggesting that the G8R protein should be classified as a sliding clamp similar to human PCNA.

APA, Harvard, Vancouver, ISO, and other styles

16

(10716540), Emily A. Kerstiens. "NEW BIOINFORMATIC METHODS OF BACTERIOPHAGE PROTEIN STUDY." Thesis, 2021.

Find full text

Abstract:

Bacteriophages are viruses that infect and kill bacteria. They are the most abundant organism on the planet and the largest source of untapped genetic information. Every year, more bacteriophages are isolated from the environment, purified, and sequenced. Once sequenced, their genomes are annotated to determine the location and putative function of each gene expressed by the phage. Phages have been used in the past for genetic engineering and new research is being done into how they can be used for the treatment of disease, water safety, agriculture, and food safety.

Despite the influx of sequenced bacteriophages, a majority of the genes annotated are hypothetical proteins, also known as No Known Function (NKF) proteins. They are expressed by the phages, but research has not identified a possible function. Wet lab research into the functions of the hundreds of NKF phages genes would be costly and could take years. Bioinformatics methods could be used to determine putative functions and functional categories for these hypothetical proteins. A new bioinformatics method using algorithms such as Domain Assignments, Hidden Markov Models, Structure Prediction, Sub-Cellular Localization, and iterative algorithms is proposed here. This new method was tested on the bacteriophage genome PotatoSplit and dropped the number of NKF genes from 57 to 40. A total of 17 new functions were found. The functional class was identified for an additional six proteins, though no specific functions were named. Structure Prediction and Simulations were tested with a focus on two NKF proteins within lytic phages and both returned possible functional categories with high confidence.

Additionally, this research focuses on the possibility of phage therapy and FDA regulation. A database of phage proteins was built and tested using R Statistical Analysis to determine proteins significant to phage infecting M. tuberculosis and to the lytic cycle of phages. The statistical methods were also tested on both pharmaceutical products recalled by the FDA between 2012 and 2018 to determine ingredients/manufacturing steps that could affect product quality and on the FDA Adverse Event Reporting System (FAERS) data to determine if AERs could be used to judge the quality of a product. Many significant excipients/manufacturing steps were identified and used to score products on their quality. The AERs were evaluated on two case studies with mixed results.

APA, Harvard, Vancouver, ISO, and other styles

17

BACCI, GIOVANNI. "Mining Microbiomes. Computational Biology approaches to uncover the complexity of bacterial communities." Doctoral thesis, 2015. http://hdl.handle.net/2158/986409.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

(7817588), Ziyun Ding. "Computational methods for protein-protein interaction identification." Thesis, 2019.

Find full text

Abstract:

Understanding protein-protein interactions (PPIs) in a cell is essential for learning protein functions, pathways, and mechanisms of diseases. This dissertation introduces the computational method to predict PPIs. In the first chapter, the history of identifying protein interactions and some experimental methods are introduced. Because interacting proteins share similar functions, protein function similarity can be used as a feature to predict PPIs. NaviGO server is developed for biologists and bioinformaticians to visualize the gene ontology relationship and quantify their similarity scores. Furthermore, the computational features used to predict PPIs are summarized. This will help researchers from the computational field to understand the rationale of extracting biological features and also benefit the researcher with a biology background to understand the computational work. After understanding various computational features, the computational prediction method to identify large-scale PPIs was developed and applied to Arabidopsis, maize, and soybean in a whole-genomic scale. Novel predicted PPIs were provided and were grouped based on prediction confidence level, which can be used as a testable hypothesis to guide biologists’ experiments. Since affinity chromatography combined with mass spectrometry technique introduces high false PPIs, the computational method was combined with mass spectrometry data to aid the identification of high confident PPIs in large-scale. Lastly, some remaining challenges of the computational PPI prediction methods and future works are discussed.

APA, Harvard, Vancouver, ISO, and other styles

19

MADEDDU, LORENZO. "Machine learning methods for extracting medical knowledge from the human interactome." Doctoral thesis, 2022. http://hdl.handle.net/11573/1639572.

Full text

Abstract:

Life on earth is regulated by a complex system of interactions. Network Medicine models biological organisms through network paradigms allowing researchers to discover and understand the molecular mechanisms that govern biological processes and human diseases. The development of computational methodologies based on the analysis of molecular connections may help, for example, researchers by reducing the time and costs of lab experiments and supporting biomedical advancements in diseases such as cancer, diabetes, and Alzheimer. This thesis focuses on the development of machine learning models that extract information from the human interactome to address crucial problems in biology, medicine, and pharmacology. Four fundamental aspects are explored: protein-protein interactions, gene-disease associations, disease-disease associations, and drug repositioning. As first study presented in Chapter 6 of this work, with the support of a large team of researchers belonging to the Network Medicine Alliance, we conducted a large-scale comparative evaluation of algorithms that predict interactions between proteins for the extension of the fundamental network for Network Medicine, the human interactome. Furthermore, in Chapter 7, we developed RW², a deep learning model applied to the human interactome to identify new gene-disease associations. Subsequently, in Chapter 8, a methodology has been defined to induce a new taxonomy of diseases starting from effective molecules, which integrate existing taxonomies, to identify unexplored relationships between pathologies. Finally, to complete the thesis work and support research on the recent COVID-19 pandemic, in Chapter 9 we present two approaches developed for drug repositioning. The first study combines knowledge of the interactome and pharmacological molecular graphs to predict potential therapeutic targets. The second study, conducted under the supervision of the laboratory directed by Dr. Loscalzo, professor at the Harvard Medical School, aims to understand which biological mechanisms link viruses and drugs.

APA, Harvard, Vancouver, ISO, and other styles

20

MARTINO, ALESSIO. "Pattern recognition techniques for modelling complex systems in non-metric domains." Doctoral thesis, 2020. http://hdl.handle.net/11573/1364044.

Full text

Abstract:

Pattern recognition and machine learning problems are often conceived to work on metric vector spaces, where patterns are described by multi-dimensional feature vectors. However, many real-world complex systems are more conveniently modelled by complex data structures such as graphs, which are able to capture topological and semantic information amongst entities. This Thesis helps in bridging the gap between pattern recognition and graphs, with major emphasis on the hypergraphs domain. Six different strategies for solving graph-based pattern recognition problems are proposed, spanning several paradigms including kernel methods, embedding spaces and feature generation. The first two techniques map a graph towards a vector space by means of the spectral density of the Laplacian matrix and by means of topological invariants called the Betti numbers, respectively. Two additional techniques, according to the Granular Computing paradigm, map a graph towards a vector space by means of symbolic histograms. In a first case, simplices extracted from the simplicial complexes evaluated over the underlying graph are considered as candidate pivotal substructures for synthesising the symbolic histograms; in a second case, each path along a graph can be assigned a score that consider its specificity and sensitivity with respect to one of the problem-related classes and its inclusion in the candidate pivotal substructures is strictly related to its score. The final two techniques fall under the kernel methods umbrella: the first defines novel hypergraph kernels on the top of the simplicial complexes, the latter embraces a multiple kernel paradigm to exploit multiple graph representations simultaneously. These techniques are tested on real-world problems related to three biological case studies, namely the solubility prediction and enzymatic properties discrimination in protein networks and the analysis of metabolic networks. Further, the most cutting-edge techniques are also tested on well-known benchmark datasets for graph classification and compared against current approaches in graph-based pattern recognition.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Bioinformatic, Computational Biology, GPCR'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles