Dissertations / Theses: 'Model annotation'

1

Hu, Rong (RongRong). "Image annotation with discriminative model and annotation refinement by visual similarity matching." Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/61311.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 65-67).
A large percentage of photos on the Internet cannot be reached by search engines because of the absence of textual metadata. Such metadata come from description and tags of the photos by their uploaders. Despite of decades of research, neither model based and model-free approaches can provide quality annotation to images. In this thesis, I present a hybrid annotation pipeline that combines both approaches in hopes of increasing the accuracy of the resulting annotations. Given an unlabeled image, the first step is to suggest some words via a trained model optimized for retrieval of images from text. Though the trained model cannot always provide highly relevant words, they can be used as initial keywords to query a large web image repository and obtain text associated with retrieved images. We then use perceptual features (e.g., color, texture, shape, and local characteristics) to match the retrieved images with the query photo and use visual similarity to rank the relevance of suggested annotations for the query photo.
by Rong Hu.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

2

Balachandar, Shreerekha. "Back annotation for conceptual structures." Thesis, This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-06112009-063732/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Elias, Mturi. "Design of Business Process Model Repositories : Requirements, Semantic Annotation Model and Relationship Meta-model." Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-117035.

Full text

Abstract:

Business process management is fast becoming one of the most important approaches for designing contemporary organizations and information systems. A critical component of business process management is business process modelling. It is widely accepted that modelling of business processes from scratch is a complex, time-consuming and error-prone task. However the efforts made to model these processes are seldom reused beyond their original purpose. Reuse of business process models has the potential to overcome the challenges of modelling business processes from scratch. Process model repositories, properly populated, are certainly a step toward supporting reuse of process models. This thesis starts with the observation that the existing process model repositories for supporting process model reuse suffer from several shortcomings that affect their usability in practice. Firstly, most of the existing repositories are proprietary, therefore they can only be enhanced or extended with new models by the owners of the repositories. Secondly, it is difficult to locate and retrieve relevant process models from a large collection. Thirdly, process models are not goal related, thereby making it difficult to gain an understanding of the business goals that are realized by a certain model. Finally, process model repositories lack a clear mechanism to identify and define the relationship between business processes and as a result it is difficult to identify related processes. Following a design science research paradigm, this thesis proposes an open and language-independent process model repository with an efficient retrieval system to support process model reuse. The proposed repository is grounded on four original and interrelated contributions: (1) a set of requirements that a process model repository should possess to increase the probability of process model reuse; (2) a context-based process semantic annotation model for semantically annotating process models to facilitate effective retrieval of process models; (3) a business process relationship meta-model for identifying and defining the relationship of process models in the repository; and (4) architecture of a process model repository for process model reuse. The models and architecture produced in this thesis were evaluated to test their utility, quality and efficacy. The semantic annotation model was evaluated through two empirical studies using controlled experiments. The conclusion drawn from the two studies is that the annotation model improves searching, navigation and understanding of process models. The process relationship meta-model was evaluated using an informed argument to determine the extent to which it meets the established requirements. The results of the analysis revealed that the meta-model meets the established requirements. Also the analysis of the architecture against the requirements indicates that the architecture meets the established requirements.
Processhantering, också kallat ärendehantering, har blivit en av de viktigaste ansatserna för att utforma dagens organisationer och informationssystem. En central komponent i processhantering är processmodellering. Det är allmänt känt att modellering av processer kan vara en komplex, tidskrävande och felbenägen uppgift. Och de insatser som görs för att modellera processer kan sällan användas bortom processernas ursprungliga syfte. Återanvändning av processmodeller skulle kunna övervinna många av de utmaningar som finns med att modellera processer. En katalog över processmodeller är ett steg mot att stödja återanvändning av processmodeller. Denna avhandling börjar med observationen att befintliga processmodellkataloger för att stödja återanvändning av processmodeller lider av flera brister som påverkar deras användbarhet i praktiken. För det första är de flesta processmodellkatalogerna proprietära, och därför kan endast katalogägarna förbättra eller utöka dem med nya modeller. För det andra är det svårt att finna och hämta relevanta processmodeller från en stor katalog. För det tredje är processmodeller inte målrelaterade, vilket gör det svårt att få en förståelse för de affärsmål som realiseras av en viss modell. Slutligen så saknar processmodellkataloger ofta en tydlig mekanism för att identifiera och definiera förhållandet mellan processer, och därför är det svårt att identifiera relaterade processer. Utifrån ett designvetenskapligt forskningsparadigm så föreslår denna avhandling en öppen och språkoberoende processmodellkatalog med ett effektivt söksystem för att stödja återanvändning av processmodeller. Den föreslagna katalogen bygger på fyra originella och inbördes relaterade bidrag: (1) en uppsättning krav som en processmodellkatalog bejöver uppfylla för att öka möjligheterna till återanvändning av processmodeller; (2) en kontextbaserad semantisk processannoteringsmodell för semantisk annotering av processmodeller för att underlätta effektivt återvinnande av processmodeller; (3) en metamodell för processrelationer för att identifiera och definiera förhållandet mellan processmodeller i katalogen; och (4) en arkitektur av en processmodellkatalog för återanvändning av processmodeller. De modeller och den arkitektur som tagits fram i denna avhandling har utvärderats för att testa deras användbarhet, kvalitet och effektivitet. Den semantiska annotationsmodellen utvärderades genom två empiriska studier med kontrollerade experiment. Slutsatsen av de två studierna är att modellen förbättrar sökning, navigering och förståelse för processmodeller. Metamodellen för processrelationer utvärderades med hjälp av ett informerat argument för att avgöra i vilken utsträckning den uppfyllde de ställda kraven. Resultaten av analysen visade att metamodellen uppfyllde dessa krav. Även analysen av arkitekturen indikerade att denna uppfyllde de fastställda kraven.

APA, Harvard, Vancouver, ISO, and other styles

4

Jones, Martin Robert. "Deep metabolome annotation of the freshwater model species, Daphnia magna." Thesis, University of Birmingham, 2017. http://etheses.bham.ac.uk//id/eprint/7984/.

Full text

Abstract:

In the 21st century - the era of big data science - chemical risk assessment procedures remain woefully dependent upon a suite of basic toxicological assays that offer little, if any, biochemical information pertaining to the underlying mechanism of toxicity. Metabolomics, defined as the holistic study of all naturally occurring, low molecular weight metabolites present within a biological system, holds huge potential as a tool to fill this knowledge gap, and thereby, to revolutionise the chemical risk assessment process through provision of rich molecular information . Owing to on-going challenges in the area of metabolite identification, however, which ultimately serves to impede derivation of biological knowledge from metabolomics data sets, the full potential of the metabolomics platform has yet to be realised in the context of (eco-)toxicological research. In this thesis, I present the experiments undertaken in establishing a bespoke bioanalytical workflow specifically designed and optimised to resolve this bottleneck. Ultimately, I demonstrate application of select components of this workflow in the characterisation of the metabolome of D. magna, a model organism for eco-toxicological research.

APA, Harvard, Vancouver, ISO, and other styles

5

Khalili, Ali. "A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-159956.

Full text

Abstract:

The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years. However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos. This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information. Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information. Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization. Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata. Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content. In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users. By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces. We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content. To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications. These use cases address four aspects of the WYSIWYM implementation: 1) Its integration into existing user interfaces, 2) Utilizing it for lightweight text analytics to incentivize users, 3) Dealing with crowdsourcing of semi-structured e-learning content, 4) Incorporating it for authoring of semantic medical prescriptions.

APA, Harvard, Vancouver, ISO, and other styles

6

Banks, Russell K. "Annotation Tools for Multivariate Gene Set Testing of Non-Model Organisms." DigitalCommons@USU, 2015. https://digitalcommons.usu.edu/etd/4515.

Full text

Abstract:

Many researchers across a wide range of disciplines have turned to gene expression anal- ysis to aid in predicting and understanding biological outcomes and mechanisms. Because genes are known to work in a dependent manner, it’s common for researchers to first group genes in biologically meaningful sets and then test each gene set for differential expression. Comparisons are made across different treatment/condition groups. The meta-analytic method for testing differential activity of gene sets, termed multi-variate gene set testing (mvGST), will be used to provide context for two persistent and problematic issues in gene set testing. These are: 1) gathering organism specific annotation for non-model organisms and 2) handling gene annotation ambiguities. The primary purpose of this thesis is to explore different gene annotation gathering methods in the building of gene set lists and to address the problem of gene annotation ambiguity. Using an example study, three different annotation gathering methods are proposed to construct GO gene set lists. These lists are directly compared, as are the subsequent results from mvGST analysis. In a separate study, an optimization algorithm is proposed as a solution for handling gene annotation ambiguities.

APA, Harvard, Vancouver, ISO, and other styles

7

Harmon, Trev R. "On-Line Electronic Document Collaboration and Annotation." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1589.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Basu, Roy Sreeya. "Automated Annotation of Simulink Generated C Code Based on the Simulink Model." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284564.

Full text

Abstract:

There has been a wave of transformation in the automotive industry in recent years, with most vehicular functions being controlled electron- ically instead of mechanically. This has led to an exponential increase in the complexity of software functions in vehicles, making it essential for manufactures to guarantee their correctness. Traditional software testing is reaching its limits, consequently pushing the automotive in- dustry to explore other forms of quality assurance. One such technique that has been gaining momentum over the years is a set of verification techniques based on mathematical logic called formal verification tech- niques. Although formal techniques have not yet been adopted on a large scale, these methods offer systematic and possibly more exhaus- tive verification of the software under test, since their fundamentals are based on the principles of mathematics.In order to be able to apply formal verification, the system under test must be transformed into a formal model, and a set of proper- ties over such models, which can then be verified using some of the well-established formal verification techniques, such as model check- ing or deductive verification. This thesis focuses on formal verification of automatically generated C code based on Simulink models using deductive verification techniques. More specifically, the aim is to ex- plore whether the generated code can be automatically annotated using the underlying Simulink model as an executable specification, thereby making it suitable for verification using state-of-the-art tools. Our in- vestigation of Simulink generated C code shows that the same can be annotated using the corresponding Simulink model as an executable specification. Consequently, we propose an algorithm that automates the annotation generation and their injection into C code for a specific class of Simulink models and code generated under specific conditions. Successful verification would mean that the code satisfies all functional properties of the model irrespective of the code generator used.We validate our approach on a prototype implementation of a Brake-by-Wire (BBW) functionality of heavy load vehicles. Most of the functional properties of the generated code were satisfied.
Det har skett en våg av omvandling inom fordonsindustrin i de senas- te åren, där de flesta fordonsfunktioner nu styrs elektroniskt istället för mekaniskt. Detta har lett till en exponentiell ökning i komplexite- ten av programvarufunktioner i fordon, vilket gör det väsentligt för tillverkare att kunna garantera deras korrekthet. Traditionell testning av programvara är nära sina gränser och driver därför bilindustrin att utforska andra former av kvalitetssäkring. En teknik som har fått fart genom åren är en uppsättning verifieringstekniker baserade på matematisk logik som kallas formella verifikationstekniker. Även om formella tekniker ännu inte har införts på en stor skala erbjuder dessa metoder systematisk och möjligen mer genomgripande verifiering av programvaran som testas, eftersom dess grund är baserad på mate- matikens principer. För att kunna tillämpa formell verifiering behöver systemet som testas omvandlas till en formell modell och en upp- sättning egenskaper över sådana modeller, som sedan kan verifieras med hjälp av några väletablerade formella verifieringstekniker, såsom modellkontroll eller deduktiv verifiering. Denna avhandling fokuse- rar på formell verifiering av automatiskt genererad C-kod baserad på Simulink-modeller med hjälp av deduktiva verifieringstekniker. Mer specifikt är syftet att undersöka om den genererade koden automa- tiskt kan antecknas med den underliggande Simulink-modellen som en körbar specifikation, vilket då skulle göra den lämplig för verifiering med toppmoderna verktyg. Vår undersökning av Simulink-genererad C-kod visar att samma sak kan kommenteras när den motsvarande Simulink-modellen används som en körbar specifikation. Följaktligen föreslår vi en algoritm som automatiserar anteckningsgenereringen och dess injektion i C-kod för en specifik klass av Simulink-modeller och kod genererad under specifika förhållanden. Lyckad verifiering skulle betyda att koden uppfyller alla funktionella egenskaper av mo- dellen oavsett vilken kodgenerator som används.Vi validerar vår strategi med en prototypimplementering av enBrake-by-Wire (BBW) -funktionalitet för tunga lastbilar. De mesta funk- tionella egenskaperna för den genererade koden var uppfyllda.

APA, Harvard, Vancouver, ISO, and other styles

9

Dorribo, Camba Jorge. "ANNOTATION MECHANISMS TO MANAGE DESIGN KNOWLEDGE IN COMPLEX PARAMETRIC MODELS AND THEIR EFFECTS ON ALTERATION AND REUSABILITY." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/45997.

Full text

Abstract:

El proyecto de investigación propuesto se enmarca dentro del área de diseño de producto con aplicaciones de modelado sólido CAD/CAM (Computer Aided Design/Computer Aided Manufacturing). Concretamente, se pretende hacer un estudio de las herramientas de anotación asociativas disponibles en las aplicaciones comerciales de modelado CAD con el fin de analizar su uso, viabilidad, eficiencia y efectos en la modificación y reutilización de modelos digitales 3D, así como en la gestión y comunicación del conocimiento técnico vinculado al diseño. La idea principal de esta investigación doctoral es establecer un método para representar y evaluar el conocimiento implícito de los ingenieros de diseño acerca de un modelo digital, así como la integración dinámica de dicho conocimiento en el propio modelo CAD, a través de anotaciones, con el objetivo de poder almacenar y comunicar eficientemente la mayor cantidad de información útil acerca del modelo, y reducir el tiempo y esfuerzo requeridos para su alteración y/o reutilización.
Dorribo Camba, J. (2014). ANNOTATION MECHANISMS TO MANAGE DESIGN KNOWLEDGE IN COMPLEX PARAMETRIC MODELS AND THEIR EFFECTS ON ALTERATION AND REUSABILITY [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/45997
TESIS

APA, Harvard, Vancouver, ISO, and other styles

10

Li, Honglin. "Hierarchical video semantic annotation the vision and techniques /." Connect to this title online, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1071863899.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2003.
Title from first page of PDF file. Document formatted into pages; contains xv, 146 p.; also includes graphics. Includes bibliographical references (p. 136-146).

APA, Harvard, Vancouver, ISO, and other styles

11

Park, Gyoungju Nah. "Comparative Genomics in Two Dicot Model Systems." Diss., The University of Arizona, 2008. http://hdl.handle.net/10150/194279.

Full text

Abstract:

Comparative sequence analyses were performed with members of the Solanaceae and the Brassicaceae. These studies investigated genomic organization, determined levels of microcolinearity, identified orthologous genes and investigated the molecular basis of trait differences. The first analysis was performed by comparison of tomato (Solanum lycopersicum) genomic sequence (119 kb) containing the JOINTLESS1 (J1) locus with orthologous sequences from two potato species, a diploid, Solanum bulbocastanum (800-900 Mb, 2N=2X=24), and a hexaploid, Solanum demissum (2,700 Mb, 2N=6X=72). Gene colinearity was well maintained across all three regions. Twelve orthologous open reading frames were identified in identical order and orientation and included three putative J1 orthologs with 93-96% amino acid sequence identity in both potato species. Although these regions were highly conserved, several local disruptions were detected and included small-scale expansion/contraction regions with intergenic sequences, non-colinear genes and transposable elements. Three putative Solanaceous-specific genes were also identified in this analysis. The second analysis was performed by comparison of a Thellungiella halophila (T. halophila) genomic sequence (193 kb) containing the SALT OVERLY SENSITIVE1 (SOS1) locus with the orthologous sequence (146 kb) in Arabidopsis thaliana (Arabidopsis). T. halophila is a halophytic relative of Arabidopsis thaliana that exhibits extreme salt tolerance. Twenty-five genes, including the putative T. halophila SOS1 (ThSOS1), showed a high degree of colinearity with Arabidopsis genes in the corresponding region. Although the two sequences were significantly colinear, several local rearrangements were detected which were caused by tandem duplications and inversions. Three major expansion/contraction regions in T. halophila contained five LTR retrotransposons which contributed to genomic size variation in this region. ThSOS1 shares similar gene structure and sequence with Arabidopsis SOS1 (AtSOS1), including 11 transmembrane domains and a cyclic nucleotide-binding domain. Three Simple Sequence Repeats (SSRs) were detected within a 540 bp region upstream of the putative translational start site in ThSOS1. The (CTT)n repeat is present in different copy numbers in ThSOS1 (18 repeats) and AtSOS1 (3 repeats). When present in the 5' UTRs of some Arabidopsis genes, (CTT)n serves as a putative salicylic acid responsive element. These SSRs may serve as cis-acting elements affecting differential mRNA accumulation of SOS1 in the two species.

APA, Harvard, Vancouver, ISO, and other styles

12

Khan, Hamza. "De novo annotation of non-model organisms using whole genome and transcriptome shotgun sequencing." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/60152.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Tian, Tian. "Domain Adaptation and Model Combination for the Annotation of Multi-source, Multi-domain Texts." Thesis, Paris 3, 2019. http://www.theses.fr/2019PA030003.

Full text

Abstract:

Internet propose aujourd’hui aux utilisateurs de services en ligne de commenter, d’éditer et de partager leurs points de vue sur différents sujets de discussion. Ce type de contenu est maintenant devenu la ressource principale pour les analyses d’opinions sur Internet. Néanmoins, à cause des abréviations, du bruit, des fautes d’orthographe et toutes autres sortes de problèmes, les outils de traitements automatiques des langues, y compris les reconnaisseurs d’entités nommées et les étiqueteurs automatiques morphosyntaxiques, ont des performances plus faibles que sur les textes bien-formés (Ritter et al., 2011).Cette thèse a pour objet la reconnaissance d’entités nommées sur les contenus générés par les utilisateurs sur Internet. Nous avons établi un corpus d’évaluation avec des textes multi-sources et multi-domaines. Ensuite, nous avons développé un modèle de champs conditionnels aléatoires, entrainé sur un corpus annoté provenant des contenus générés par les utilisateurs.Dans le but d’améliorer les résultats de la reconnaissance d’entités nommées, nous avons d’abord développé un étiqueteur morpho-syntaxique sur les contenus générés par les utilisateurs et nous avons utilisé les étiquettesprédites comme un attribut du modèle des champs conditionnels aléatoire. Enfin, pour transformer les contenus générés par les utilisateurs en textes bien-formés, nous avons développé un modèle de normalisation lexicale basé sur des réseaux de neurones pour proposer une forme correcte pour les mots non-standard
The increasing mass of User-Generated Content (UGC) on the Internet means that people are now willing to comment, edit or share their opinions on different topics. This content is now the main ressource for sentiment analysis on the Internet. Due to abbreviations, noise, spelling errors and all other problems with UGC, traditional Natural Language Processing (NLP) tools, including Named Entity Recognizers and part-of-speech (POS) taggers, perform poorly when compared to their usual results on canonical text (Ritter et al., 2011).This thesis deals with Named Entity Recognition (NER) on some User-Generated Content (UGC). We have created an evaluation dataset including multi-domain and multi-sources texts. We then developed a Conditional Random Fields (CRFs) model trained on User-Generated Content (UGC).In order to improve NER results in this context, we first developed a POStagger on UGC and used the predicted POS tags as a feature in the CRFs model. To turn UGC into canonical text, we also developed a normalization model using neural networks to propose a correct form for Non-Standard Words (NSW) in the UGC
各种社交网络应用使得互联网用户对各种话题的实时评价，编辑和分享成为可能。这类用户生成的文本内容(User Generated content)已成为社交网络上意见分析的主要目标和来源。但是，此类文本内容中包含的缩写，噪声（不规则词），拼写错误以及其他各种问题导致包括命名实体识别，词性标注在内的传统的自然语言处理工具的性能，相比良好组成的文本降低了许多【参见Ritter 2011】。本论文的主要目标是针对社交网络上用户生成文本内容的命名实体识别。我们首先建立了一个包含多来源，多领域文本的有标注的语料库作为标准评价语料库。然后，我们开发了一个由社交网络用户生成文本训练的基于条件随机场(Conditional Random Fields)的序列标注模型。基于改善这个命名实体识别模型的目的，我们又开发了另一个同样由社交网络用户生成内容训练的词性标注模型，并使用此模型预测的词性作为命名实体识别的条件随机场模型的特征。最后，为了将用户生成文本内容转换成相对标准的良好文本内容，我们开发了一个基于神经网络的词汇标准化模型，用以改正用户生成文本内容中的不标准字，并使用模型提供的改正形式作为命名实体识别的条件随机场模型的特征，借以改善原模型的性能。

APA, Harvard, Vancouver, ISO, and other styles

14

Rissanen, Mikko Juhani. "Virtual reality based teaching of psychomotor skills : annotation model for asynchronous communication in dynamic virtual environments." 京都大学 (Kyoto University), 2007. http://hdl.handle.net/2433/135987.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Dessaigne, Nicolas. "Le modèle DOAN (DOcument ANnotation Model) : modélisation de l'information complexe appliquée à la plateforme Arisem Kaliwatch Server." Phd thesis, Université de Nantes, 2005. http://tel.archives-ouvertes.fr/tel-00465962.

Full text

Abstract:

Nous présentons dans cette thèse le modèle DOAN (DOcument ANnotation Model), destiné à répondre aux besoins de modélisation de la société Arisem. Arisem est éditeur de logiciels dans le domaine de la gestion des connaissances. La plateforme que l'entreprise propose s'inscrit dans le cycle collecte / analyse / diffusion de l'information. À partir de données de nature hétérogène et d'origines diverses (ex. : Internet, intranet, base de données), elle procède à différentes analyses (ex. : classement automatique, extraction de concepts émergents), afin de fournir des informations synthétiques et utiles à l'utilisateur. Partant de cette problématique, nous avons identifié trois besoins principaux pour le modèle : expressivité, flexibilité et performances. Dans le cadre de cette thèse, nous avons développé un modèle basé sur le paradigme d'agrégation de facettes, qui permet aux concepteurs de décrire des données complexes, hétérogènes et évolutives. Au-delà de la simple notion de document, il rend possible la représentation d'objets métiers, comme par exemple des annotations ou des arbres de catégorisation. Complété par un système de types riches et par la capacité d'exprimer des contraintes entre facettes, ce modèle nous permet de répondre aux besoins d'expressivité et de flexibilité. Nous proposons d'autre part un algorithme permettant de traduire les éléments du modèle DOAN en une implémentation relationnelle. Une fois le modèle instancié, les accès en modification sont contrôlés à l'aide de procédures stockées afin de garantir la consistance des données. Les accès en consultations sont en revanche effectués directement à l'aide de requêtes SQL. Les concepteurs peuvent ainsi faire des requêtes à la fois complexes et performantes, tirant parti au maximum des possibilités du système de gestion de bases de données. Cette approche permet une montée en charge importante et répond aux besoins de performances.

APA, Harvard, Vancouver, ISO, and other styles

16

Kimbung, Stanley Mbandi. "A computational framework for transcriptome assembly and annotation in non-model organisms: the case of venturia inaequalis." Thesis, University of the Western Cape, 2014. http://hdl.handle.net/11394/4022.

Full text

Abstract:

Philosophiae Doctor - PhD
In this dissertation three computational approaches are presented that enable optimization of reference-free transcriptome reconstruction. The first addresses the selection of bona fide reconstructed transcribed fragments (transfrags) from de novo transcriptome assemblies and annotation with a multiple domain co-occurrence framework. We showed that selected transfrags are functionally relevant and represented over 94% of the information derived from annotation by transference. The second approach relates to quality score based RNA-seq sub-sampling and the description of a novel sequence similarity-derived metric for quality assessment of de novo transcriptome assemblies. A detail systematic analysis of the side effects induced by quality score based trimming and or filtering on artefact removal and transcriptome quality is describe. Aggressive trimming produced incomplete reconstructed and missing transfrags. This approach was applied in generating an optimal transcriptome assembly for a South African isolate of V. inaequalis. The third approach deals with the computational partitioning of transfrags assembled from RNA-Seq of mixed host and pathogen reads. We used this strategy to correct a publicly available transcriptome assembly for V. inaequalis (Indian isolate). We binned 50% of the latter to Apple transfrags and identified putative immunity transcript models. Comparative transcriptomic analysis between fungi transfrags from the Indian and South African isolates reveal effectors or transcripts that may be expressed in planta upon morphogenic differentiation. These studies have successfully identified V. inaequalis specific transfrags that can facilitate gene discovery. The unique access to an in-house draft genome assembly allowed us to provide preliminary description of genes that are implicated in pathogenesis. Gene prediction with bona fide transfrags produced 11,692 protein-coding genes. We identified two hydrophobin-like genes and six accessory genes of the melanin biosynthetic pathway that are implicated in the invasive action of the appressorium. The cazyome reveals an impressive repertoire of carbohydrate degrading enzymes and carbohydrate-binding modules amongst which are six polysaccharide lyases, and the largest number of carbohydrate esterases (twenty-eight) known in any fungus sequenced to date

APA, Harvard, Vancouver, ISO, and other styles

17

Cleynen, Alice. "Approches statistiques en segmentation : application à la ré-annotation de génome." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00913851.

Full text

Abstract:

Nous proposons de modéliser les données issues des technologies de séquençage du transcriptome (RNA-Seq) à l'aide de la loi binomiale négative, et nous construisons des modèles de segmentation adaptés à leur étude à différentes échelles biologiques, dans le contexte où ces technologies sont devenues un outil précieux pour l'annotation de génome, l'analyse de l'expression des gènes, et la détection de nouveaux transcrits. Nous développons un algorithme de segmentation rapide pour analyser des séries à l'échelle du chromosome, et nous proposons deux méthodes pour l'estimation du nombre de segments, directement lié au nombre de gènes exprimés dans la cellule, qu'ils soient précédemment annotés ou détectés à cette même occasion. L'objectif d'annotation précise des gènes, et plus particulièrement de comparaison des sites de début et fin de transcription entre individus, nous amène naturellement à nous intéresser à la comparaison des localisations de ruptures dans des séries indépendantes. Nous construisons ainsi dans un cadre de segmentation bayésienne des outils de réponse à nos questions pour lesquels nous sommes capable de fournir des mesures d'incertitude. Nous illustrons nos modèles, tous implémentés dans des packages R, sur des données RNA-Seq provenant d'expériences sur la levure, et montrons par exemple que les frontières des introns sont conservées entre conditions tandis que les débuts et fin de transcriptions sont soumis à l'épissage différentiel.

APA, Harvard, Vancouver, ISO, and other styles

18

Hacid, Kahina. "Handling domain knowledge in system design models. An ontology based approach." Phd thesis, Toulouse, INPT, 2018. http://oatao.univ-toulouse.fr/20157/7/HACID_kahina.pdf.

Full text

Abstract:

Complex systems models are designed in heterogeneous domains and this heterogeneity is rarely considered explicitly when describing and validating processes. Moreover, these systems usually involve several domain experts and several design models corresponding to different analyses (views) of the same system. However, no explicit information regarding the characteristics neither of the domain nor of the performed system analyses is given. In our thesis, we propose a general framework offering first, the formalization of domain knowledge using ontologies and second, the capability to strengthen design models by making explicit references to the domain knowledgeformalized in these ontology. This framework also provides resources for making explicit the features of an analysis by formalizing them within models qualified as ‘’points of view ‘’. We have set up two deployments of our approach: a Model Driven Engineering (MDE) based deployment and a formal methods one based on proof and refinement. This general framework has been validated on several no trivial case studies issued from system engineering.

APA, Harvard, Vancouver, ISO, and other styles

19

Khalili, Ali [Verfasser], Klaus-Peter [Gutachter] Fähnrich, and ia Roberto [Gutachter] Garc. "A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration / Ali Khalili ; Gutachter: Klaus-Peter Fähnrich, Roberto Garc ia." Leipzig : Universitätsbibliothek Leipzig, 2015. http://d-nb.info/123942311X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Andrade, Guidson Coelho de. "Semantic enrichment of American English corpora through automatic semantic annotation based on top-level ontologies using the CRF clas- sification model." Universidade Federal de Viçosa, 2018. http://www.locus.ufv.br/handle/123456789/21639.

Full text

Abstract:

Submitted by MARCOS LEANDRO TEIXEIRA DE OLIVEIRA (marcosteixeira@ufv.br) on 2018-09-05T12:51:49Z No. of bitstreams: 1 texto completo.pdf: 1357733 bytes, checksum: 0b0fc46e7358bfaa6996ea4bcbd760d0 (MD5)
Made available in DSpace on 2018-09-05T12:51:49Z (GMT). No. of bitstreams: 1 texto completo.pdf: 1357733 bytes, checksum: 0b0fc46e7358bfaa6996ea4bcbd760d0 (MD5) Previous issue date: 2018-04-26
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
O significado de bases de dados textuais é de fácil percepção para as pessoas, mas de difícil interpretação por parte dos computadores. Para que as máquinas possam compreender a semântica associada aos textos e não somente a sintaxe, é necessário a adição de informações extras a esses corpora. A anotação semântica é a tarefa que incorpora essas informações por meio da adição de metadados aos itens lex- icais. Essas informações podem ser conceitos ontológicos que ajudam a definir a natureza da palavra a fim de atribuir-lhe algum significado. No entanto, anotar textos segundo uma determinada ontologia ainda é uma tarefa que demanda tempo e esforço de anotadores treinados para esse fim. Outra abordagem a ser consid- erada é o desenvolvimento de ferramentas de anotação semântica automática que utilizem técnicas de aprendizado de máquina para classificar os termos anotados. Essa abordagem demanda uma base de dados para treinamento dos algoritmos que nesse caso são corpora pré-anotados segundo a dimensão semântica a ser explorada. Entretanto, essa linhagem metodológica dispõe de recursos limitados para suprir as necessidades dos métodos de aprendizado. Existe uma grande carência de corpora anotados semanticamente e, particularmente, uma ausência ainda maior de corpora ontologicamente anotados, dificultando o avanço da área de anotação semântica au- tomática. O objetivo do presente trabalho é auxiliar no enriquecimento semântico de textos do Inglês americano, anotando-os de forma automática baseando-se em ontologia de nível topo através do modelo de aprendizagem supervisionada Condi- tional Random Fields (CRF). Após a seleção do Open American National Corpus como base de dados linguística e da Schema.org como ontologia, o trabalho teve sua estrutura dividida em duas etapas. Primeiramente, o corpus pré-processado e corrigido foi submetido a uma anotação híbrida, com um anotador baseado em re- gras e, posteriormente, uma anotação complementar manual. Ambas as tarefas de anotação foram dirigidas pelos conceitos e definições das oito classes provenientes do nível topo da ontologia selecionada. De posse do corpus anotado ontologicamente, iniciou-se o processo de anotação automática via uso do método de aprendizagem CRF. O modelo de predição levou em consideração as características linguísticas e estruturais dos termos para classificá-los sob os oito tipos ontológicos. Os resulta- dos obtidos durante a avaliação do modelo foram muito satisfatórios e atingiram o objetivo da pesquisa. O trabalho, embora seja uma nova abordagem de anotação semântica e com pouca margem de comparação, apresentou resultados promissores para o avanço da pesquisa na área de enriquecimento semântico automático baseado em ontologias de nível topo.
Textual databases carry with them human-perceived meanings, but those meanings are difficult to be interpreted by computers. In order for the machines to understand the semantics attached to texts, and not only their syntax, it is necessary to add extra information to these corpora. Semantic annotation is the task of incorporat- ing this information by adding metadata to lexical items. This information can be ontological concepts that help define the nature of the word in order to give it some meaning. However, annotating texts according to an ontology is still a task that requires time and effort from annotators trained for this purpose. Another approach to be considered is the use of automatic semantic annotation tools that use machine learning techniques to classify annotated terms. This approach demands a database for training the algorithms that in this case are corpora pre-annotated according to the semantic dimension to be explored. However, this methodological lineage has limited resources to meet the needs of learning methods. There is a large lack of semantically annotated corpora and an even larger absence of ontologically anno- tated corpora, hindering the advance of the area of automatic semantic annotation. The purpose of the present work is to assist in the semantic enrichment of Amer- ican English texts by automatically annotating them based on top-level ontology through the Conditional Random Fields (CRF) supervised learning model. After the selection of the Open American National Corpus as a linguistic database and Schema.org as an ontology, the work had its structure divided into two stages. First, the pre-processed and corrected corpus was submitted to a hybrid annotation, with a rule-based annotator, and later manually. Both annotation tasks were driven by the concepts and definitions of the eight classes from the top-level of the selected ontology. Once the corpus was written ontologically, the automatic annotation pro- cess was started using the CRF learning method. The prediction model took into account the linguistic and structural features of the terms to classify them under the eight ontological types. The results obtained during the evaluation of the model were very satisfactory and reached the objective of the research. The work, although it is a new approach of semantic annotation and with little margin of comparison, presented promising results for the advance of the research in the area of automatic semantic enrichment based on top-level ontologies.

APA, Harvard, Vancouver, ISO, and other styles

21

Tayari, Meftah Imen. "Modélisation, détection et annotation des états émotionnels à l'aide d'un espace vectoriel multidimensionnel." Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00838803.

Full text

Abstract:

Notre travail s'inscrit dans le domaine de l'affective computing et plus précisément la modélisation, détection et annotation des émotions. L'objectif est d'étudier, d'identifier et de modéliser les émotions afin d'assurer l'échange entre applications multimodales. Notre contribution s'axe donc sur trois points. En premier lieu, nous présentons une nouvelle vision de la modélisation des états émotionnels basée sur un modèle générique pour la représentation et l'échange des émotions entre applications multimodales. Il s'agit d'un modèle de représentation hiérarchique composé de trois couches distinctes : la couche psychologique, la couche de calcul formel et la couche langage. Ce modèle permet la représentation d'une infinité d'émotions et la modélisation aussi bien des émotions de base comme la colère, la tristesse et la peur que les émotions complexes comme les émotions simulées et masquées. Le second point de notre contribution est axé sur une approche monomodale de reconnaissance des émotions fondée sur l'analyse des signaux physiologiques. L'algorithme de reconnaissance des émotions s'appuie à la fois sur l'application des techniques de traitement du signal, sur une classification par plus proche voisins et également sur notre modèle multidimensionnel de représentation des émotions. Notre troisième contribution porte sur une approche multimodale de reconnaissance des émotions. Cette approche de traitement des données conduit à une génération d'information de meilleure qualité et plus fiable que celle obtenue à partir d'une seule modalité. Les résultats expérimentaux montrent une amélioration significative des taux de reconnaissance des huit émotions par rapport aux résultats obtenus avec l'approche monomodale. Enfin nous avons intégré notre travail dans une application de détection de la dépression des personnes âgées dans un habitat intelligent. Nous avons utilisé les signaux physiologiques recueillis à partir de différents capteurs installés dans l'habitat pour estimer l'état affectif de la personne concernée.

APA, Harvard, Vancouver, ISO, and other styles

22

Riviere, Peter. "Génération automatique d’obligations de preuves paramétrée par des théories de domaine dans Event-B : Le cadre de travail EB4EB." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP052.

Full text

Abstract:

De nos jours, nous sommes entourés de systèmes critiques complexes tels que les microprocesseurs, les trains, les appareils intelligents, les robots, les avions, etc. Ces systèmes sont extrêmement complexes et critiques en termes de sûreté, et doivent donc être vérifiés et validés. L'utilisation de méthodes formelles à états s'est avérée efficace pour concevoir des systèmes complexes. Event-B a joué un rôle clé dans le développement de tels systèmes. Event-B est une méthode formelle de conception de systèmes à états avec une approche correcte par construction, qui met l'accent sur la preuve et le raffinement. Event-B facilite la vérification de propriétés telles que la préservation des invariants, la convergence et le raffinement en générant des obligations de preuve et en permettant de les décharger.Certaines propriétés additionnelles du système, telles que l'absence d'inter-blocage, l'atteignabilité ou encore la vivacité, doivent être explicitement encodées et vérifiées par le concepteur, ou formalisées à l'aide d'une autre méthode formelle. Une telle approche pénalise la réutilisabilité des modèles et des techniques, et peut introduire des erreurs, en particulier dans les systèmes complexes.Pour pallier cela, nous avons introduit un "framework" réflexif EB4EB, formalisé au sein de Event-B. Dans ce cadre, chacun des concepts d'Event-B est formalisé comme un objet de première classe en utilisant la logique du premier ordre (FOL) et la théorie des ensembles. EB4EB permet la manipulation et l'analyse de modèles Event-B, et permet la définition d'extensions afin de réaliser des analyses supplémentaires non intrusives sur des modèles, telles que la validation de propriétés temporelles, l'analyse de la couverture d'un invariant, ou encore l'absence de blocage. Ce framework est réalisé grâce aux théories d'Event-B, qui étendent le langage d'Event-B avec des éléments définis dans des théories, et aussi en formalisant de nouvelles obligations de preuves, qui ne sont pas présentes initialement dans Event-B.De plus, la sémantique opérationnelle d'Event-B (basée sur les traces) a été formalisée, de même qu'un cadre qui sert à garantir la correction des théorèmes définis, y compris les opérateurs et les obligations de preuve. Enfin, le cadre proposé et ses extensions ont été validés dans de multiples études de cas, notamment l'horloge de Lamport, le problème du lecteur/rédacteur, l'algorithme de Peterson, les distributeurs automatiques de billets (DAB), les véhicules autonomes, etc
Nowadays, we are surrounded by complex critical systems such as microprocessors, railways, home appliances, robots, aeroplanes, and so on. These systems are extremely complex and are safety-critical, and they must be verified and validated. The use of state-based formal methods has proven to be effective in designing complex systems. Event-B has played a key role in the development of such systems. Event-B is a formal system design method that is state-based and correct-by-construction, with a focus on proof and refinement. Event-B facilitates verification of properties such as invariant preservation, convergence, and refinement by generating and discharging proof obligations.Additional properties for system verification, such as deadlock-freeness, reachability, and liveness, must be explicitly defined and verified by the designer or formalised using another formal method. Such an approach reduces re-usability and may introduce errors, particularly in complex systems.To tackle these challenges, we introduced the reflexive EB4EB framework in Event-B. In this framework, each Event-B concept is formalised as a first-class object using First Order Logic (FOL) and set theory. This framework allows for the manipulation and analysis of Event-B models, with extensions for additional, non-intrusive analyses such as temporal properties, weak invariants, deadlock freeness, and so on. This is accomplished through Event-B Theories, which extend the Event-B language with the theory's defined elements, and also by formalising and articulating new proof obligations that are not present in traditional Event-B. Furthermore, Event-B's operational semantics (based on traces) have been formalised, along with a framework for guaranteeing the soundness of the defined theorems, including operators and proof obligations. Finally, the proposed framework and its extensions have been validated across multiple case studies, including Lamport's clock case study, read/write processes, the Peterson algorithm, Automated Teller Machine (ATM), autonomous vehicles, and so on

APA, Harvard, Vancouver, ISO, and other styles

23

Haertel, Robbie A. "Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/4242.

Full text

Abstract:

Many projects exist whose purpose is to augment raw data with annotations that increase the usefulness of the data. The number of these projects is rapidly growing and in the age of “big data” the amount of data to be annotated is likewise growing within each project. One common use of such data is in supervised machine learning, which requires labeled data to train a predictive model. Annotation is often a very expensive proposition, particularly for structured data. The purpose of this dissertation is to explore methods of reducing the cost of creating such data sets, including annotated text corpora.We focus on active learning to address the annotation problem. Active learning employs models trained using machine learning to identify instances in the data that are most informative and least costly. We introduce novel techniques for adapting vanilla active learning to situations wherein data instances are of varying benefit and cost, annotators request work “on-demand,” and there are multiple, fallible annotators of differing levels of accuracy and cost. In order to account for data instances of varying cost, we build a model of cost from real annotation data based on a user study. We also introduce a novel cost-conscious active learning algorithm which we call return-on-investment, that selects instances for annotation that contain the most benefit per unit cost. To address the issue of annotators that request instances “on-demand,” we develop a parallel, “no-wait” framework that performs computation while the annotator is annotating. As a result, annotators need not wait for the computer to determine the best instance for them to annotate—a common problem with existing approaches. Finally, we introduce a Bayesian model designed to simultaneously infer ground truth annotations from noisy annotations, infer each individual annotators accuracy, and predict its own accuracy on unseen data, without the use of a held-out set. We extend ROI-based active learning and our annotation framework to handle multiple annotators using this model. As a whole, our work shows that the techniques introduced in this dissertation reduce the cost of annotation in scenarios that are more true-to-life than previous research.

APA, Harvard, Vancouver, ISO, and other styles

24

Zhu, Bingyao [Verfasser], Jörg [Akademischer Betreuer] Stülke, Jörg [Gutachter] Stülke, Fabian Moritz [Gutachter] Commichau, and Burkhard [Gutachter] Morgenstern. "SubtiWiki 3.0: A relational database for the functional genome annotation of the model organism Bacillus subtilis / Bingyao Zhu ; Gutachter: Jörg Stülke, Fabian Moritz Commichau, Burkhard Morgenstern ; Betreuer: Jörg Stülke." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2018. http://d-nb.info/1152437682/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Wodke, Judith. "Organization and integration of large-scale datasets for designing a metabolic model and re-annotating the genome of mycoplasma pneumoniae." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2013. http://dx.doi.org/10.18452/16699.

Full text

Abstract:

Mycoplasma pneumoniae, einer der kleinsten lebenden Organismen, ist ein erfolgversprechender Modellorganismus der Systembiologie um eine komplette lebende Zelle zu verstehen. Wichtig dahingehend ist die Konstruktion mathematischer Modelle, die zelluläre Prozesse beschreiben, indem sie beteiligte Komponenten vernetzen und zugrundeliegende Mechanismen entschlüsseln. Für Mycoplasma pneumoniae wurden genomweite Datensätze für Genomics, Transcriptomics, Proteomics und Metabolomics produziert. Allerdings fehlten ein effizientes Informationsaustauschsystem und mathematische Modelle zur Datenintegration. Zudem waren verschiedene Beobachtungen im metabolischen Verhalten ungeklärt. Diese Dissertation präsentiert einen kombinatorischen Ansatz zur Entwicklung eines metabolischen Modells für Mycoplasma pneumoniae. Zuerst haben wir eine Datenbank, MyMpn, entwickelt, um Zugang zu strukturierten, organisierten Daten zu schaffen. Danach haben wir ein genomweites, Constraint-basiertes metabolisches Modell mit Vorhersagekapazitäten konstruiert und parallel dazu das Metabolome experimentell charakterisiert. Wir haben die Biomasse einer Mycoplasma pneumoniae Zelle definiert, das Netzwerk korrigiert, gezeigt, dass ein Grossteil der produzierten Energie auf zelluläre Homeostase verwendet wird, und das Verhalten unter verschiedenen Wachstumsbedingungen analysiert. Schließlich haben wir manuell das Genom reannotiert. Die Datenbank, obwohl noch nicht öffentlich zugänglich, wird bereits intern für die Analyse experimenteller Daten und die Modellierung genutzt. Die Entdeckung von Kontrollprinzipien des Energiemetabolismus und der Anpassungsfähigkeiten bei Genausfall heben den Einfluss der reduktiven Genomevolution hervor und erleichtert die Entwicklung von Manipulationstechniken und dynamischen Modellen. Überdies haben wir gezeigt, dass die Genomorganisation in Mycoplasma pneumoniae komplexer ist als bisher für möglich gehalten, und 32 neue, noch nicht annotierte Gene entdeckt.
Mycoplasma pneumoniae, one of the smallest known self-replicating organisms, is a promising model organism in systems biology when aiming to assess understanding of an entire living cell. One key step towards this goal is the design of mathematical models that describe cellular processes by connecting the involved components to unravel underlying mechanisms. For Mycoplasma pneumoniae, a wealth of genome-wide datasets on genomics, transcriptomics, proteomics, and metabolism had been produced. However, a proper system facilitating information exchange and mathematical models to integrate the different datasets were lacking. Also, different in vivo observations of metabolic behavior remained unexplained. This thesis presents a combinatorial approach to design a metabolic model for Mycoplasma pneumoniae. First, we developed a database, MyMpn, in order to provide access to structured and organized data. Second, we built a predictive, genome-scale, constraint-based metabolic model and, in parallel, we explored the metabolome in vivo. We defined the biomass composition of a Mycoplasma pneumoniae cell, corrected the wiring diagram, showed that a large proportion of energy is dedicated to cellular homeostasis, and analyzed the metabolic behavior under different growth conditions. Finally, we manually re-annotated the genome of Mycoplasma pneumoniae. The database, despite not yet being released to the public, is internally already used for data analysis, and for mathematical modeling. Unraveling the principles governing energy metabolism and adaptive capabilities upon gene deletion highlight the impact of the reductive genome evolution and facilitates the development of engineering tools and dynamic models for metabolic sub-systems. Furthermore, we revealed that the degree of complexity in which the genome of Mycoplasma pneumoniae is organized far exceeds what has been considered possible so far and we identified 32 new, previously not annotated genes.

APA, Harvard, Vancouver, ISO, and other styles

26

Hak, Jean-Luc. "Engineering annotations for supporting the design process of interactive systems : a model based approach and a tool suite." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30062.

Full text

Abstract:

Au cours d'un processus de développement de système interactifs, différents acteurs collaborent lors des activités de ce processus et plusieurs choix de conceptions sont effectués afin de converger vers une solution répondant à la fois aux besoins des utilisateurs et aux exigences. Pour atteindre cette solution, de nombreux artefacts sont produits, utilisés et révisés par les différents intervenants du processus. Afin de communiquer sur des points particuliers d'un artefact, collaborer dans son élaboration ou tout simplement rajouter des informations complémentaires, des annotations peuvent être créés sur ces artefacts. En fonction des annotations et de leurs contenus, certains artefacts peuvent par la suite être amenés à évoluer, reflétant ainsi l'influence des annotations sur ces artefacts et donc leurs influences sur le projet de manière globale. Il est donc possible de considérer les annotations comme un outil versatile jouant un rôle non négligeable dans le processus de conception. Néanmoins, plusieurs problèmes peuvent être identifiés concernant l'intégration des annotations au sein des activités d'un processus de conception de système interactifs. Premièrement, le rôle des annotations n'est pas clairement défini dans les différents processus de conceptions. En effet, bien qu'on observe l'usage omniprésent des annotations lors de la conception de systèmes interactif, les processus de conception actuels n'expliquent pas comment les relier aux tâches à accomplir et les artefacts à produire. Deuxièmement, une annotation peut concerner plusieurs artefacts car chacun modélise des points de vue complémentaires du système interactif. Néanmoins, la multiplicité des types d'artefacts et des outils pour la création de ces artefacts pose un problème car chaque outil qui offre la possibilité de créer des annotations propose son propre modèle d'annotation. Ce modèle est généralement restreint à un type d'artefact donné : celui manipulé par l'outil. Ceci implique que les annotations d'un projet sont éparpillées par lot et que chaque lot d'annotations est fermé à un seul type d'artefact. Cette thèse s'appuie sur une analyse des annotations et des pratiques liées aux annotations ainsi que sur la recommandation "Web Annotation Data Model" du W3C pour proposer un modèle d'annotation et une architecture logicielle permettant de centraliser les annotations d'un projet et d'intégrer ces annotations dans divers types d'outils et d'artefacts. Ce modèle d'annotation et cette architecture logicielle a été appliquée dans trois études de cas différents afin d'explorer différentes intégrations possibles au sein d'un processus de conception. La première étude de cas démontre l'intégration et la personnalisation d'annotations au sein d'un outil de prototypage. La seconde étude de cas s'attarde sur la présentation d'un outil permettant de consulter dans une vue unique l'ensemble des annotations créés sur différents artefacts et sur les différents modèles d'un projet. La troisième étude de cas illustre une intégration des annotations dans un environnement industriel comprenant des outils et un processus de conception existant. Ainsi, ces contributions autour des annotations servent de base pour la réalisation de travaux complémentaires tels que l'utilisation d'annotations pour structurer et connecter les différents modèles d'un système interactif, l'utilisation d'annotations en tant que ressource pour les processus de prises de décisions, et l'utilisation d'annotations pour étudier la traçabilité de l'évolution d'un système interactif. En effet, en reliant les artefacts entre eux en utilisant les annotations et en justifiant les choix de conceptions avec des annotations, il serait possible d'assurer la traçabilité des différents choix de design effectués au cours d'un projet ainsi que la traçabilité de l'impact de ces différents choix sur les artefacts
During the development process of an interactive system, different actors collaborate in the activities of this process and several design choices are made to converge to a solution that meets both user needs and requirements. To achieve this solution, many artifacts are produced, used and reviewed by the various stakeholders of the process. In order to communicate on particular points of an artifact, to collaborate in its elaboration or simply to add additional information, annotations can be created on these artifacts. Depending on the annotations and their contents, some artefacts may subsequently evolve, thus reflecting the influence of annotations on these artifacts and therefore reflecting their influence on the project. Thus, it is possible to consider annotations as a versatile tool playing a significant role in the design process. Nevertheless, several issues can be identified regarding the integration of annotations within the activities of the design process of interactive systems. First, the role of annotations is not clearly defined in the different design processes. While there is a widespread and a ubiquitous use of annotations in the design of interactive systems, current design processes do not address how to relate them to the tasks to be performed and the artifacts to be produced. Secondly, an annotation can be related to several artifacts as each models are giving a complementary representation of the interactive system. However, the multiplicity of artifact types and tools for creating these artifacts is a problem since each tool that provide features for annotations implements their own annotation model. These models are usually restricted to one type of artifact: the one handled by the tool. This implies that the annotations produced within a project are scattered by sets and that each these annotation set is closed to a single type of artifact. This PhD thesis is based on an analysis of annotations and their uses as well as on the W3C "Web Annotation Data Model" recommendation to propose an annotation model and an architecture to centralize the annotations of a project. This architecture also allows to include the annotations support on various tools and type of artifacts. This contribution has been applied on three different case studies to explore the possible integrations of annotations within a design process. The first case study demonstrates the integration and customization of annotations within a prototyping tool. The second case study focuses on the presentation of a tool allowing to consult in a single view all the annotations created on different artefacts and on different models of a project. The third case study illustrates an integration of annotations into an industrial environment that includes existing tools and an existing design process. Thus, these contributions around annotations are used as a basis for the realization of complementary works such as the use of annotations to structure and connect the different models of an interactive system, the use of annotations as a resource for the decisions making processes, and the use of annotations to study the traceability of the evolution of an interactive system. Indeed, by linking the artifacts to each other using annotations and justifying the choice of designs with annotations, it would be possible to ensure the traceability of the different design choices made during a project as well as the traceability of the impact of these different choices on the artifacts

APA, Harvard, Vancouver, ISO, and other styles

27

Daoust, François. "Modélisation informatique de structures dynamiques de segments textuels pour l'analyse de corpus." Phd thesis, Université de Franche-Comté, 2011. http://tel.archives-ouvertes.fr/tel-00870410.

Full text

Abstract:

L'objectif de la thèse est de proposer un modèle informatique pour représenter, construire et exploiterdes structures textuelles. Le modèle proposé s'appuie sur une représentation du texte sous la forme d'unplan lexique/occurrences augmenté de systèmes d'annotations lexicales et contextuelles, modèle dontune implantation a été réalisée dans le logiciel SATO dont on présente les fonctionnalités etl'organisation interne. La présentation d'un certain nombre de travaux rendent compte dudéveloppement et de l'utilisation du logiciel dans divers contextes.La prise en charge formelle des structures textuelles et discursives trouve un allié dans le langage debalisage XML et dans les propositions de la Text Encoding Initiative (TEI). Formellement, lesstructures construites sur les segments textuels correspondent à des graphes. Dans le contexte d'uneanalyse textuelle en élaboration, ces graphes sont multiples et partiellement déployés. La résolution deces graphes, au sens du rattachement des noeuds à des segments textuels ou à des noeuds d'autresgraphes, est un processus dynamique qui peut être soutenu par divers mécanismes informatiques. Desexemples tirés de la linguistique textuelle servent à illustrer les principes de l'annotation structurelle.Des considérations prospectives sur une implantation informatique d'un système de gestion del'annotation structurelle sont aussi exposées.

APA, Harvard, Vancouver, ISO, and other styles

28

Di, Francescomarino Chiara. "Semantic annotation of business process models." Doctoral thesis, Università degli studi di Trento, 2011. https://hdl.handle.net/11572/367849.

Full text

Abstract:

In the last decades, business process models have increasingly been used by companies with different purposes, such as documenting enacted processes or enabling and improving the communication among stakeholders (e.g., designers and implementers). Aside from the differences, all the roles played by process models involve human actors (e.g., business designers, business analysts, re-engineers) and hence demand for readability and ease of use, beyond correctness and reasonable completeness. It often happens, however, that process models are large and intricate, thus resulting potentially difficult to understand and to manage. In this thesis we propose some techniques aimed at supporting business designers and analysts in the management of business process models. The core of the proposal is the enrichment of process models with semantic annotations from domain ontologies and the formalization of both structural and domain information in a shared knowledge base, thus opening to the possibility of exploiting reasoning for supporting business experts in their work. In detail, this thesis investigates some of the services that can be provided on top of the process semantic annotation, as for example, the automatic verification of process constraints, the automated querying of process models or the semi-automatic mining, documentation and modularization of crosscutting concerns. Moreover, special care is devoted to support designers and analysts when process models are not available or they have to be semantically annotated. Specifically, an approach for recovering process models from (Web) applications and some metrics for evaluating the understandability of the recovered models are investigated. Techniques for suggesting candidate semantic annotations are also proposed. The results obtained by applying the presented techniques have been validated by means of case studies, performance evaluations and empirical investigations.

APA, Harvard, Vancouver, ISO, and other styles

29

Di, Francescomarino Chiara. "Semantic annotation of business process models." Doctoral thesis, University of Trento, 2011. http://eprints-phd.biblio.unitn.it/547/1/DiFrancescomarino_Chiara.pdf.

Full text

Abstract:

In the last decades, business process models have increasingly been used by companies with different purposes, such as documenting enacted processes or enabling and improving the communication among stakeholders (e.g., designers and implementers). Aside from the differences, all the roles played by process models involve human actors (e.g., business designers, business analysts, re-engineers) and hence demand for readability and ease of use, beyond correctness and reasonable completeness. It often happens, however, that process models are large and intricate, thus resulting potentially difficult to understand and to manage. In this thesis we propose some techniques aimed at supporting business designers and analysts in the management of business process models. The core of the proposal is the enrichment of process models with semantic annotations from domain ontologies and the formalization of both structural and domain information in a shared knowledge base, thus opening to the possibility of exploiting reasoning for supporting business experts in their work. In detail, this thesis investigates some of the services that can be provided on top of the process semantic annotation, as for example, the automatic verification of process constraints, the automated querying of process models or the semi-automatic mining, documentation and modularization of crosscutting concerns. Moreover, special care is devoted to support designers and analysts when process models are not available or they have to be semantically annotated. Specifically, an approach for recovering process models from (Web) applications and some metrics for evaluating the understandability of the recovered models are investigated. Techniques for suggesting candidate semantic annotations are also proposed. The results obtained by applying the presented techniques have been validated by means of case studies, performance evaluations and empirical investigations.

APA, Harvard, Vancouver, ISO, and other styles

30

Wagner, Darlene Darlington. "Comparative genomics reveal ecophysiological adaptations of organohalide-respiring bacteria." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45916.

Full text

Abstract:

Organohalide-respiring Bacteria (OHRB) play key roles in the reductive dehalogenation of natural organohalides and anthropogenic chlorinated contaminants. Reductive dehalogenases (RDases) catalyze the cleavage of carbon-halogen bonds, enabling respiratory energy conservation and growth. Large numbers of RDase genes, a majority lacking experimental characterization of function, are found on the genomes of OHRB. In silico genomics tools were employed to identify shared sequence features among RDase genes and proteins, predict RDase functionality, and elucidate RDase evolutionary history. These analyses showed that the RDase superfamily could be divided into proteins exported to the membrane and cytoplasmic proteins, indicating that not all RDases function in respiration. Further, Hidden Markov models (HMMs) and multiple sequence alignments (MSAs) based upon biochemically characterized RDases identified previously uncharacterized members of an RDase superfamily, delineated protein domains and amino acid motifs serving to distinguish RDases from unrelated iron-sulfur proteins. Such conserved and discriminatory features among RDases may facilitate monitoring of organohalide-degrading microbial communities or improve accuracy of genome annotation. Phylogenetic analyses of RDase superfamily sequences provided evidence of convergent evolution and horizontal gene transfer (HGT) across distinct OHRB genera. Yet, the low frequency of RDase transfer outside the genus level and the absence of RDase transfer between phyla indicate that RDases evolve primarily by vertical evolution or HGT is restricted among related OHRB strains. Polyphyletic evolutionary lineages within the RDase superfamily comprise distantly-related RDases, some exhibiting activities towards the same substrates, suggesting a longstanding history of OHRB adaptation to natural organohalides. Similar functional and phylogenetic analyses provided evidence that nitrous oxide (N₂O, a potent greenhouse gas) reductase (nosZ) genes from versatile OHRB members of the Anaeromyxobacter and Desulfomonile genera comprised a nosZ sub-family evolutionarily distinct from nosZ found in non-OHRB denitrifiers. Hence, elucidation of RDase and NosZ sequence diversity may enhance the mitigation of anthropogenic organohalides and greenhouse gases (i.e., N₂O), respectively. The tetrachloroethene-respiring bacterium Geobacter lovleyi strain SZ exhibited genomic features distinguishing it from non-organohalide-respiring members of the Geobacter genus, including a conjugative pilus transfer gene cluster, a chromosomal genomic island harboring two RDase genes, and a diminished set of c-type cytochrome genes. The G. lovleyi strain SZ genome also harbored a 77 kbp plasmid carrying 15 out of the 24 genes involved in biosynthesis of corrinoid, likely related to this strains ability to degrade PCE to cis-DCE in the absence of supplied corrinoid (i.e., vitamin B₁₂). Although corrinoids are essential cofactors to RDases, the strictly organohalide-respiring Dehalococcoides mccartyi strains are corrinoid auxotrophs and depend upon uptake of extracellular corrinoids via Archaeal and Bacterial salvage pathways. A key corrinoid salvage gene in D. mccartyi, cbiZ, occurs at duplicated loci adjacent to RDase genes and appears to have been horizontally-acquired from Archaea. These comparative genome analyses highlight RDase dependencies upon corrinoids and also suggest mobile genomic elements (e.g., plasmids) are associated with organohalide respiration and corrinoid acquisition among OHRB. In summary, analyses of OHRB genomes promise to enable more complete modeling of metabolic and evolutionary processes associated with the turnover of organohalides in anoxic environments. These efforts also expand knowledge of biomarkers for monitoring OHRB activity in anoxic environments, and will improve our understanding of the fate of chlorinated contaminants.

APA, Harvard, Vancouver, ISO, and other styles

31

Petkova, Desislava I. "Cluster-based relevance models for automatic image annotation /." Connect to online version, 2005. http://ada.mtholyoke.edu/setr/websrc/pdfs/www/2005/124.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Bannour, Hichem. "Building and Using Knowledge Models for Semantic Image Annotation." Phd thesis, Ecole Centrale Paris, 2013. http://tel.archives-ouvertes.fr/tel-00905953.

Full text

Abstract:

This dissertation proposes a new methodology for building and using structured knowledge models for automatic image annotation. Specifically, our first proposals deal with the automatic building of explicit and structured knowledge models, such as semantic hierarchies and multimedia ontologies, dedicated to image annotation. Thereby, we propose a new approach for building semantic hierarchies faithful to image semantics. Our approach is based on a new image-semantic similarity measure between concepts and on a set of rules that allow connecting the concepts with higher relatedness till the building of the final hierarchy. Afterwards, we propose to go further in the modeling of image semantics through the building of explicit knowledge models that incorporate richer semantic relationships between image concepts. Therefore, we propose a new approach for automatically building multimedia ontologies consisting of subsumption relationships between concepts, and also other semantic relationships such as contextual and spatial relations. Fuzzy description logics are used as a formalism to represent our ontology and to deal with the uncertainty and the imprecision of concept relationships. In order to assess the effectiveness of the built structured knowledge models, we propose subsequently to use them in a framework for image annotation. We propose therefore an approach, based on the structure of semantic hierarchies, to effectively perform hierarchical image classification. Furthermore, we propose a generic approach for image annotation combining machine learning techniques, such as hierarchical image classification, and fuzzy ontological-reasoning in order to achieve a semantically relevant image annotation. Empirical evaluations of our approaches have shown significant improvement in the image annotation accuracy.

APA, Harvard, Vancouver, ISO, and other styles

33

Levy, Mark. "Retrieval and annotation of music using latent semantic models." Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/2969.

Full text

Abstract:

This thesis investigates the use of latent semantic models for annotation and retrieval from collections of musical audio tracks. In particular latent semantic analysis (LSA) and aspect models (or probabilistic latent semantic analysis, pLSA) are used to index words in descriptions of music drawn from hundreds of thousands of social tags. A new discrete audio feature representation is introduced to encode musical characteristics of automatically-identified regions of interest within each track, using a vocabulary of audio muswords. Finally a joint aspect model is developed that can learn from both tagged and untagged tracks by indexing both conventional words and muswords. This model is used as the basis of a music search system that supports query by example and by keyword, and of a simple probabilistic machine annotation system. The models are evaluated by their performance in a variety of realistic retrieval and annotation tasks, motivated by applications including playlist generation, internet radio streaming, music recommendation and catalogue search.

APA, Harvard, Vancouver, ISO, and other styles

34

Mahadevan, Gayatri P. "Automatic back annotation of timing into VHDL behavioral models." Thesis, This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-06102009-063425/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Amir, Mohammad. "Semantically-enriched and semi-Autonomous collaboration framework for the Web of Things. Design, implementation and evaluation of a multi-party collaboration framework with semantic annotation and representation of sensors in the Web of Things and a case study on disaster management." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14363.

Full text

Abstract:

This thesis proposes a collaboration framework for the Web of Things based on the concepts of Service-oriented Architecture and integrated with semantic web technologies to offer new possibilities in terms of efficient asset management during operations requiring multi-actor collaboration. The motivation for the project comes from the rise in disasters where effective cross-organisation collaboration can increase the efficiency of critical information dissemination. Organisational boundaries of participants as well as their IT capability and trust issues hinders the deployment of a multi-party collaboration framework, thereby preventing timely dissemination of critical data. In order to tackle some of these issues, this thesis proposes a new collaboration framework consisting of a resource-based data model, resource-oriented access control mechanism and semantic technologies utilising the Semantic Sensor Network Ontology that can be used simultaneously by multiple actors without impacting each other’s networks and thus increase the efficiency of disaster management and relief operations. The generic design of the framework enables future extensions, thus enabling its exploitation across many application domains. The performance of the framework is evaluated in two areas: the capability of the access control mechanism to scale with increasing number of devices, and the capability of the semantic annotation process to increase in efficiency as more information is provided. The results demonstrate that the proposed framework is fit for purpose.

APA, Harvard, Vancouver, ISO, and other styles

36

MANGONE, IOLANDA. "Structure-based functional annotation methods: development and assessment on homology models." Doctoral thesis, Università degli Studi di Roma "Tor Vergata", 2012. http://hdl.handle.net/2108/202203.

Full text

Abstract:

One of the main goals in bioinformatics is the development of tools for protein functional annotation. To infer the function of a protein we can use fundamentally two different approaches: one based on the comparison of protein sequences (Altschul et al., 1997; Bairoch, 1991; Pearson, 1990) and another based on the comparison of protein structures (Ausiello et al., 2007; Pearl et al., 2001; Thornton et al., 2000; Whisstock and Lesk, 2003). The three-dimensional structure is more informative than the sole amino acidic sequence to assign a molecular function to a new protein (Watson et al., 2007). For this reason many automated methods have been developed to infer the function of a protein of known structure using comparative approaches. For a review see (Gherardini and Helmer-Citterich, 2008). Because of the importance of automatic methods for structure-based protein annotation for biologists and researchers, these methods have to become user-friendly and easily usable even in those laboratories where there is no bioinformatician. The evolutionary information of the protein families (Punta et al., 2012) can be used to improve the performances of structure-based methods. It is well known that the more a residue is conserved during the evolution of homologous proteins, the more important this residue is for the preservation of those proteins function. The amino acids in enzymatic catalytic sites, for example, remain unchanged during the evolution even if the proteins have a very low sequence identity. The large-scale sequencing projects (Abecasis et al., 2010; Sawicki et al., 1993) have increased the growth of protein sequences databases rather than the growth of the experimentally solved protein structure. There are indeed more than 20 million entries in the databases of protein sequences (UniProtConsortium, 2013) and only 79 thousand entries in the database of protein structures (Rose et al., 2013). In the absence of an experimentally determined structure we can use different bioinformatics tools for the prediction of protein structure. If we find one or more protein/s of known structure sharing more than 30% sequence identity with our protein, we can use the homology modeling technique to transfer the structural information of the structure from a template to our target protein (Bork et al., 1994). Obviously, because we have a transfer of the structure information, the quality of a model depends on the sequence identity with the template (Chothia and Lesk, 1986). The aims in our work are: - To explore the possibility of using sequence conservation as a parameter to improve the performances of structure-based functional prediction methods. - To develop webserver to a fully and easily access to bioinformatics structure-based functional prediction methods. - To analyse whether and within which limits structure-based methods can be applied to protein models and to analyse how the prediction performances of different structure-based functional annotation methods decrease with the overall decrease of models’ quality. We developed a new method to determine how much the residues are evolutionary conserved in a protein structure. The method is called PFAMer and derives the conservation scores from the PFAM multiple alignment of protein sequences. This procedure has been successfully applied to PDBinder and Pfinder two structure-based methods (developed in our laboratory) that are able to identify binding sites on protein structures. Pfinder identifies phosphate-binding site and PDBinder identifies binding pockets independently on the specific ligand they are able to bind. The introduction of this parameter to Pfinder allows us to reduce the number of false positive (FP) predictions improving the performances by 3%. The application of PFAMer to PDBinder improves the performances by 5%. In order to make the structure-based method, developed in our laboratory, more accessible to the scientific community we developed two webservers called Phosfinder and webPDBinder based on the improved version of Pfinder and PDBinder. In the last part of this work we analyzed the degradation in performances of the structure annotation methods when they work on models instead of xray solved structures. To achieve this goal, we developed an automated procedure to compare the performances of different structure-based functional prediction methods when used on a set of homology models of different quality or on an experimentally solved structure. Each method is tested on the same dataset of proteins proposed by the authors in the method original publication and on a set of homology models built for each structure in the dataset. To obtain models of different quality only templates are used having a sequence similarity with the solved structures under a set of fixed thresholds. We selected different methods for each category of functional annotation. The performances of the tested methods have been measured using the Fscore or the Matthews correlation coefficient (MCC) where applicable. The applicability of the functional prediction methods to protein models has never been explored so far, even if most of the structural information now available is stored in 3D models. Sensitivity to model quality should become a parameter of evaluation when comparing structure-based methods for functional annotation and give precious information about their applicability to real-world cases. The analysis of the features of the different methods can give hints about the reasons that determine the sensitiveness to model quality.

APA, Harvard, Vancouver, ISO, and other styles

37

Ong, Wai, Trang Vu, Klaus Lovendahl, Jenna Llull, Margrethe Serres, Margaret Romine, and Jennifer Reed. "Comparisons of Shewanella strains based on genome annotations, modeling, and experiments." BioMed Central, 2014. http://hdl.handle.net/10150/610105.

Full text

Abstract:

BACKGROUND:Shewanella is a genus of facultatively anaerobic, Gram-negative bacteria that have highly adaptable metabolism which allows them to thrive in diverse environments. This quality makes them an attractive bacterial target for research in bioremediation and microbial fuel cell applications. Constraint-based modeling is a useful tool for helping researchers gain insights into the metabolic capabilities of these bacteria. However, Shewanella oneidensis MR-1 is the only strain with a genome-scale metabolic model constructed out of 21 sequenced Shewanella strains.RESULTS:In this work, we updated the model for Shewanella oneidensis MR-1 and constructed metabolic models for three other strains, namely Shewanella sp. MR-4, Shewanella sp. W3-18-1, and Shewanella denitrificans OS217 which span the genus based on the number of genes lost in comparison to MR-1. We also constructed a Shewanella core model that contains the genes shared by all 21 sequenced strains and a few non-conserved genes associated with essential reactions. Model comparisons between the five constructed models were done at two levels - for wildtype strains under different growth conditions and for knockout mutants under the same growth condition. In the first level, growth/no-growth phenotypes were predicted by the models on various carbon sources and electron acceptors. Cluster analysis of these results revealed that the MR-1 model is most similar to the W3-18-1 model, followed by the MR-4 and OS217 models when considering predicted growth phenotypes. However, a cluster analysis done based on metabolic gene content revealed that the MR-4 and W3-18-1 models are the most similar, with the MR-1 and OS217 models being more distinct from these latter two strains. As a second level of comparison, we identified differences in reaction and gene content which give rise to different functional predictions of single and double gene knockout mutants using Comparison of Networks by Gene Alignment (CONGA). Here, we showed how CONGA can be used to find biomass, metabolic, and genetic differences between models.CONCLUSIONS:We developed four strain-specific models and a general core model that can be used to do various in silico studies of Shewanella metabolism. The developed models provide a platform for a systematic investigation of Shewanella metabolism to aid researchers using Shewanella in various biotechnology applications.

APA, Harvard, Vancouver, ISO, and other styles

38

Lin, Yun. "Semantic Annotation for Process Models : Facilitating Process Knowledge Management via Semantic Interoperability." Doctoral thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-2119.

Full text

Abstract:

Business process models representing process knowledge about doing business are necessary for designing Information Systems (IS) solutions in enterprises. Interoperability of business process knowledge in legacy systems is crucial for enterprise systems interoperation and integration due to increased enterprise cooperation and business exchange. Many modern technologies and approaches are deployed to support business process interoperability either at the instance level or the protocol level, such as BPML, WSDL and SOAP. However, we argue that a holistic approach is necessary for semantic interoperability of business process models at the conceptual level when considering the process models as reusable process knowledge for other (new or integrated) IS solutions. This brings requirements to manage semantic heterogeneity of process knowledge in process models which are distributed across different enterprise systems. Semantic annotation is an approach to achieve semantic interoperability of heterogeneous resources. However, such an approach has usually been applied to enhance the semantics of unstructured and structured artifacts (e.g. textual resources [72] [49], and Web services [166] [201]).

The aim of the research is to introduce an ontology-based semantic annotation approach to enrich and reconcile semantics of process models — a kind of semi-structured artifact, for managing process knowledge. The approach brings together techniques in process modeling, ontology building, semantic matching, and Description Logic inference in order to provide a comprehensive semantic annotation framework. Furthermore, a prototype system that supports the process of ontology-based semantic annotation of heterogeneous process models is described. The applicational goal of our approach is to facilitate process knowledge management activities (e.g. discovery, reuse, and integration of process knowledge/models) by enhanced semantic interoperability.

A survey has been performed through identifying semantic heterogeneity in process modeling and investigating semantic technology from theoretical and practical views. Based on the results from the survey, a comprehensive semantic annotation framework has been developed, which provides a method to manage semantic heterogeneity of process models from the following perspectives. First, basic descriptions of process models (profile annotation); second, process modeling languages (meta-model annotation); third, contents of process models (model annotation) and finally intentions of process model owners (goal annotation). Applying the semantic annotation framework, an ontology-based annotation method has been elaborated, which results in two categories of research activity — ontology building and semantic mapping. In ontology building, we use Web Ontology Language (OWL), a Semantic Web technology, which can be used to model ontologies. GPO (General Process Ontology) comprising core concepts in most process modeling languages is proposed; domain concepts are classified in the corresponding categories of GPO as a domain ontology; design principles for building a goal ontology are introduced in order to serve the annotation of process models pragmatically. In semantic mapping, a set of mapping strategies are developed to conduct the annotation by considering the semantic relationships between model artifacts and ontology references and as well the semantic inference mechanism supported by OWL DL (Description Logic). The annotation method is finally formalized into a process semantic annotation model - PSAM.

The proposed approach has been implemented in a prototype annotation tool —ProSEAT to facilitate the annotation process. Procedures of applying the semantic annotation approach with the tool are described through exemplar study. The annotation approach and the prototype tool are evaluated using a quality framework. Furthermore, the applicability of the annotation results is validated by going through a process knowledge management application. The Semantic Web Rule Language (SWRL) is applied in the application demonstration. We argue that the ontology-based annotation approach combined with the Semantic Web technology is a feasible approach to reconcile semantic heterogeneity in the process knowledge management. Limitations and future work are discussed after concluding this research work.

The contributions of this thesis are summarized as follows. First, a general process ontology is proposed for unifying process representations at a high level of abstraction. Second, a semantic annotation framework is introduced to describe process knowledge systematically. Third, ontology-based annotation methods are elaborated and formalized. Fourth, an annotation system, utilizing the developed formal methods, is designed and implemented. Fifth, a process knowledge management system is outlined as the platform for manipulating the annotation results. Moreover, applying results of the approach is demonstrated through a process model integration example.

APA, Harvard, Vancouver, ISO, and other styles

39

Hammarkvist, Tom. "Automatic Annotation of Models for Object Classification in Real Time Object Detection." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-86061.

Full text

Abstract:

The times of manual labour are changing as automation grows larger and larger by the day. Self-driving vehicles being one of the more well known examples of automation (the vehicles in this thesis being those found in the construction industry), relies on a machine learning network to recognize its surroundings. To achieve this, the network needs a dataset. A dataset consists of two things, data, which usually come in the form of images, and annotated labels in order for it to learn what it sees. The labels is a descriptor that describes what objects exists in an image, and coordinates for where in the image these objects exists, and the area they occupy. As data is collected, it needs to be manually annotated, which can take several months to finish. With this in mind, is it possible to set up some form of semi-automatic annotation step that does a majority of the work? If so, what techniques can be used to achieve this? How does it compare to a dataset which have been annotated by a human? and is it even worth implementing in the first place? For this research, a dataset was collected where a remote controlled wheel loader approached a stationary dump truck, at various different angles, and during different conditions. Four videos were used in the trainingset, containing 679 images and their respective labels. Two other videos were used for the validationset, consisting of 120 images and their respective labels. The chosen object detector was YOLOv3, which has a low inference time and high accuracy. This helped with gathering results at a faster rate than what would've been possible if an older version was chosen. The method which was chosen for doing the automatic annotations was linear interpolation, which was implemented to work in conjunction with the labels of the trainingset to approximate the corresponding values. The interpolation was done at different frame gaps, a gap of 10 frames, a gap of 20 frames, all the way up to a gap of 60 frames. This was done in order to help locate a sweet spot, where the model had similar performance compared to the manually annotated dataset. The results showed that the fully manually annotated dataset approached a precision value of 0.8, a recall of 0.96, and a mean average precision (mAP) value of 0.95. Some of the models which had interpolated frames between a set gap, achieved similar results in the metrics, where interpolating between every 10th frame, every 20th frame, and every 30th frame, showed the most promise. They all approached precision values of around 0.8, a recall of around 0.94, and an mAP value of around 0.9.

APA, Harvard, Vancouver, ISO, and other styles

40

Wimalaratne, Sarala M. "A framework for annotating and visualizing cellML models." Thesis, University of Auckland, 2009. http://hdl.handle.net/2292/5606.

Full text

Abstract:

The Physiome Project was established to develop tools for international collaboration and sharing physiological knowledge in the form of biological models and experimental data. The CellML language was developed in response to the need for a high-level language to represent and exchange mathematical models of biological processes. The language provides a flexible framework for describing the dynamics of biological processes but does not explicitly lend itself to capturing the underlying biological concepts such as the entities and processes that these models represent. The relationship between the biological process and the mathematical model describing the biological process is also often complex. This makes it difficult to see the biological concepts which the CellML structures represent. A framework which supports visualizing the biological concepts and its relationship to the underlying CellML model would provide a very useful toolset for understanding the biological concepts modeled in CellML. The CellML models need to be annotated with biological concepts in order to provide the machine interpretable data for generating a visual representation. We have developed an ontological framework which can be used to explicitly annotate CellML models with physical and biological concepts, a method to derive a simplified biological view from the annotations, a visual language for representing all biophysical processes captured in the CellML models, and a method to map the visual language to the ontological framework in order to automate the generation of visual representations of a model. The proposed method of model visualization produces a result that is dependent on the structure of the CellML models which requires modelers to structure the model in a way that best describes the biophysical concepts and abstractions they wish to demonstrate. Our argument is that this leads to a best practice approach to building and organizing models. As a part of this research, a software tool for visualizing CellML models was developed. This tool combines the visual language and the ontologies to generate visualizations that depict the physical and biological concepts captured in CellML models and enables different communities in diverse disciplines to more easily understand CellML models within the biological domain they represent. As research continues, with further improvement to the framework it would be possible to visually construct composite CellML models by selecting high level biological concepts.

APA, Harvard, Vancouver, ISO, and other styles

41

Soavi, Michele. "From Legal Contracts to Formal Specifications." Doctoral thesis, Università degli studi di Trento, 2022. https://hdl.handle.net/11572/355741.

Full text

Abstract:

The challenge of implementing and executing a legal contract in a machine has been gaining significant interest recently with the advent of blockchain, smart contracts, LegalTech and IoT technologies. Popular software engineering methods, including agile ones, are unsuitable for such outcome-critical software. Instead, formal specifications are crucial for implementing smart contracts to ensure they capture the intentions of stakeholders, also that their execution is compliant with the terms and conditions of the original natural-language legal contract. This thesis concerns supporting the semi-automatic generation of formal specifications of legal contracts written in Natural Language (NL). The main contribution is a framework, named Contratto, where the transformation process from NL to a formal specification is subdivided into 5 steps: (1) identification of ambiguous terms in the contract and manual disambiguation; (2) structural and semantic annotation of the legal contract; (3) discovery of relationships among the concepts identified in step (2); (4) formalization of the terms used in the NL text into a domain model; (5) generation of formal expressions that describe what should be implemented by programmers in a smart contract. A systematic literature review on the main topic of the thesis was performed to support the definition of the framework. Requirements were derived from standard business contracts for a preliminary implementation of tools that support the transformation process, particularly concerning step (2). A prototype environment was proposed to semi-automate the transformation process although significant manual intervention is required. The preliminary evaluation confirms that the annotation tool can perform the annotation as well as human annotators, albeit novice ones.

APA, Harvard, Vancouver, ISO, and other styles

42

Martins, Diogo Santana. "Models and operators for extension of active multimedia documents via annotations." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-30012014-082907/.

Full text

Abstract:

Multimedia production is an elaborate activity composed of multiple information management and transformation tasks that support an underlying creative goal. Examples of these activities are structuring, organization, modification and versioning of media elements, all of which depend on the maintenance of supporting documentation and metadata. In professional productions, which can count on proper human and material resources, such documentation is maintained by the production crew, being key to secure a high quality in the final content. In less resourceful configurations, such as amateur-oriented productions, at least reasonable quality standards are desirable in most cases, however the perceived difficulty in managing and transforming content can inhibit amateurs on producing content with acceptable quality. This problem has been tackled in many fronts, for instance via annotation methods, smart browsing methods and authoring techniques, just to name a few. In this dissertation, the primary objective is to take advantage of user-created annotations in order to aid amateur-oriented multimedia authoring. In order to support this objective, the contributions are built around an authoring approach based on structured multimedia documents. First, a custom language for Web-based multimedia documents is defined, based on SMIL (Synchronized Multimedia Integration Language). This language brings several contributions, such as the formalization of an extended graph-based temporal layout model, live editing of document elements and extended reuse features. Second, a model for document annotation and an algebra for document transformations are defined, both of which allows composition and extraction of multimedia document fragments based on annotations. Third, the previous contributions are integrated into a Web-based authoring tool, which allows manipulating a document while it is active. Such manipulations encompass several interaction techniques for enriching, editing, publishing and extending multimedia documents. The contributions have been instantiated with multimedia sessions obtained from synchronous collaboration tools, in scenarios of video-based lectures, meetings and video-based qualitative research. Such instantiations demonstrate the applicability and utility of the contributions
Produção multimídia é uma atividade complexa composta por múltiplas atividades de gerência e transformação de informação, as quais suportam um objetivo de criar conteúdo. Exemplos dessas atividades são estruturação, organização, modificação e versionamento de elementos de mídia, os quais dependem da manutenção de documentos auxiliares e metadados. Em produções profissionais, as quais podem contar com recursos humanos e materiais adequados, tal documentação é mantida pela equipe de produção, sendo instrumental para garantir a uma alta qualidade no produto final. Em configurações com menos recursos, como produções amadoras, ao menos padrões razoáveis de qualidade são desejados na maioria dos casos, contudo a dificuldade em gerenciar e transformar conteúdo pode inibir amadores a produzir conteúdo com qualidade aceitável. Esse problema tem sido atacado em várias frentes, por exemplo via métodos de anotação, métodos de navegação e técnicas de autoria, apenas para nomear algumas. Nesta tese, o objetivo principal é tirar proveito de anotações criadas pelo usuário com o intuito de apoiar autoria multimídia por amadores. De modo a subsidiar esse objetivo, as contribuições são construídas em torno uma abordagem de autoria baseada em documentos multimídia estruturados. Primeiramente, uma linguagem customizada para documentos multimídia baseados na Web é definida, baseada na linguagem SMIL (Synchronized Multimedia Integration Language). Esta linguagem traz diversas contribuições, como a formalização de um modelo estendido para formatação temporal baseado em grafos, edição ao vivo de elementos de um documento e funcionalidades de reúso. Em segundo, um modelo para anotação de documentos e uma álgebra para transformação de documentos são definidos, ambos permitindo composição e extração de fragmentos de documentos multimídia com base em anotações. Em terceiro, as contribuições anteriores são integradas em uma ferramenta de autoria baseada na Web, a qual permite manipular um documento enquanto o mesmo está ativo. Tais manipulações envolvem diferentes técnicas de interação com o objetivo de enriquecer, editar, publicar e estender documentos multimídia interativos. As contribuições são instanciadas com sessões multimídia obtidas de ferramentas de colaboração síncrona, em cenários de aulas baseadas em vídeos, reuniões e pesquisa qualitativa baseada em vídeos. Tais instanciações demonstram a aplicabilidade e utilidade das contribuições

APA, Harvard, Vancouver, ISO, and other styles

43

Ugarte, Ari. "Combining machine learning and evolution for the annotation of metagenomics data." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066732/document.

Full text

Abstract:

La métagénomique sert à étudier les communautés microbiennes en analysant de l’ADN extrait directement d’échantillons pris dans la nature, elle permet également d’établir un catalogue très étendu des gènes présents dans les communautés microbiennes. Ce catalogue doit être comparé contre les gènes déjà référencés dans les bases des données afin de retrouver des séquences similaires et ainsi déterminer la fonction des séquences qui le composent. Au cours de cette thèse, nous avons développé MetaCLADE, une nouvelle méthodologie qui améliore la détection des domaines protéiques déjà référencés pour des séquences issues des données métagénomiques et métatranscriptomiques. Pour le développement de MetaCLADE, nous avons modifié un système d’annotations de domaines protéiques qui a été développé au sein du Laboratoire de Biologie Computationnelle et Quantitative appelé CLADE (CLoser sequences for Annotations Directed by Evolution) [17]. En général les méthodes pour l’annotation de domaines protéiques caractérisent les domaines connus avec des modèles probabilistes. Ces modèles probabilistes, appelés Sequence Consensus Models (SCMs) sont construits à partir d’un alignement des séquences homologues appartenant à différents clades phylogénétiques et ils représentent le consensus à chaque position de l’alignement. Cependant, quand les séquences qui forment l’ensemble des homologues sont très divergentes, les signaux des SCMs deviennent trop faibles pour être identifiés et donc l’annotation échoue. Afin de résoudre ce problème d’annotation de domaines très divergents, nous avons utilisé une approche fondée sur l’observation que beaucoup de contraintes fonctionnelles et structurelles d’une protéine ne sont pas globalement conservées parmi toutes les espèces, mais elles peuvent être conservées localement dans des clades. L’approche consiste donc à élargir le catalogue de modèles probabilistes en créant de nouveaux modèles qui mettent l’accent sur les caractéristiques propres à chaque clade. MetaCLADE, un outil conçu dans l’objectif d’annoter avec précision des séquences issues des expériences métagénomiques et métatranscriptomiques utilise cette libraire afin de trouver des correspondances entre les modèles et une base de données de séquences métagénomiques ou métatranscriptomiques. En suite, il se sert d’une étape pré-calculée pour le filtrage des séquences qui permet de déterminer la probabilité qu’une prédiction soit considérée vraie. Cette étape pré-calculée est un processus d’apprentissage qui prend en compte la fragmentation de séquences métagénomiques pour les classer.Nous avons montré que l’approche multi source en combinaison avec une stratégie de méta apprentissage prenant en compte la fragmentation atteint une très haute performance
Metagenomics is used to study microbial communities by the analyze of DNA extracted directly from environmental samples. It allows to establish a catalog very extended of genes present in the microbial communities. This catalog must be compared against the genes already referenced in the databases in order to find similar sequences and thus determine their function. In the course of this thesis, we have developed MetaCLADE, a new methodology that improves the detection of protein domains already referenced for metagenomic and metatranscriptomic sequences. For the development of MetaCLADE, we modified an annotation system of protein domains that has been developed within the Laboratory of Computational and Quantitative Biology clade called (closer sequences for Annotations Directed by Evolution) [17]. In general, the methods for the annotation of protein domains characterize protein domains with probabilistic models. These probabilistic models, called sequence consensus models (SCMs) are built from the alignment of homolog sequences belonging to different phylogenetic clades and they represent the consensus at each position of the alignment. However, when the sequences that form the homolog set are very divergent, the signals of the SCMs become too weak to be identified and therefore the annotation fails. In order to solve this problem of annotation of very divergent domains, we used an approach based on the observation that many of the functional and structural constraints in a protein are not broadly conserved among all species, but they can be found locally in the clades. The approach is therefore to expand the catalog of probabilistic models by creating new models that focus on the specific characteristics of each clade. MetaCLADE, a tool designed with the objective of annotate with precision sequences coming from metagenomics and metatranscriptomics studies uses this library in order to find matches between the models and a database of metagenomic or metatranscriptomic sequences. Then, it uses a pre-computed step for the filtering of the sequences which determine the probability that a prediction is a true hit. This pre-calculated step is a learning process that takes into account the fragmentation of metagenomic sequences to classify them. We have shown that the approach multi source in combination with a strategy of meta-learning taking into account the fragmentation outperforms current methods

APA, Harvard, Vancouver, ISO, and other styles

44

Ugarte, Ari. "Combining machine learning and evolution for the annotation of metagenomics data." Electronic Thesis or Diss., Paris 6, 2016. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2016PA066732.pdf.

Full text

Abstract:

La métagénomique sert à étudier les communautés microbiennes en analysant de l’ADN extrait directement d’échantillons pris dans la nature, elle permet également d’établir un catalogue très étendu des gènes présents dans les communautés microbiennes. Ce catalogue doit être comparé contre les gènes déjà référencés dans les bases des données afin de retrouver des séquences similaires et ainsi déterminer la fonction des séquences qui le composent. Au cours de cette thèse, nous avons développé MetaCLADE, une nouvelle méthodologie qui améliore la détection des domaines protéiques déjà référencés pour des séquences issues des données métagénomiques et métatranscriptomiques. Pour le développement de MetaCLADE, nous avons modifié un système d’annotations de domaines protéiques qui a été développé au sein du Laboratoire de Biologie Computationnelle et Quantitative appelé CLADE (CLoser sequences for Annotations Directed by Evolution) [17]. En général les méthodes pour l’annotation de domaines protéiques caractérisent les domaines connus avec des modèles probabilistes. Ces modèles probabilistes, appelés Sequence Consensus Models (SCMs) sont construits à partir d’un alignement des séquences homologues appartenant à différents clades phylogénétiques et ils représentent le consensus à chaque position de l’alignement. Cependant, quand les séquences qui forment l’ensemble des homologues sont très divergentes, les signaux des SCMs deviennent trop faibles pour être identifiés et donc l’annotation échoue. Afin de résoudre ce problème d’annotation de domaines très divergents, nous avons utilisé une approche fondée sur l’observation que beaucoup de contraintes fonctionnelles et structurelles d’une protéine ne sont pas globalement conservées parmi toutes les espèces, mais elles peuvent être conservées localement dans des clades. L’approche consiste donc à élargir le catalogue de modèles probabilistes en créant de nouveaux modèles qui mettent l’accent sur les caractéristiques propres à chaque clade. MetaCLADE, un outil conçu dans l’objectif d’annoter avec précision des séquences issues des expériences métagénomiques et métatranscriptomiques utilise cette libraire afin de trouver des correspondances entre les modèles et une base de données de séquences métagénomiques ou métatranscriptomiques. En suite, il se sert d’une étape pré-calculée pour le filtrage des séquences qui permet de déterminer la probabilité qu’une prédiction soit considérée vraie. Cette étape pré-calculée est un processus d’apprentissage qui prend en compte la fragmentation de séquences métagénomiques pour les classer.Nous avons montré que l’approche multi source en combinaison avec une stratégie de méta apprentissage prenant en compte la fragmentation atteint une très haute performance
Metagenomics is used to study microbial communities by the analyze of DNA extracted directly from environmental samples. It allows to establish a catalog very extended of genes present in the microbial communities. This catalog must be compared against the genes already referenced in the databases in order to find similar sequences and thus determine their function. In the course of this thesis, we have developed MetaCLADE, a new methodology that improves the detection of protein domains already referenced for metagenomic and metatranscriptomic sequences. For the development of MetaCLADE, we modified an annotation system of protein domains that has been developed within the Laboratory of Computational and Quantitative Biology clade called (closer sequences for Annotations Directed by Evolution) [17]. In general, the methods for the annotation of protein domains characterize protein domains with probabilistic models. These probabilistic models, called sequence consensus models (SCMs) are built from the alignment of homolog sequences belonging to different phylogenetic clades and they represent the consensus at each position of the alignment. However, when the sequences that form the homolog set are very divergent, the signals of the SCMs become too weak to be identified and therefore the annotation fails. In order to solve this problem of annotation of very divergent domains, we used an approach based on the observation that many of the functional and structural constraints in a protein are not broadly conserved among all species, but they can be found locally in the clades. The approach is therefore to expand the catalog of probabilistic models by creating new models that focus on the specific characteristics of each clade. MetaCLADE, a tool designed with the objective of annotate with precision sequences coming from metagenomics and metatranscriptomics studies uses this library in order to find matches between the models and a database of metagenomic or metatranscriptomic sequences. Then, it uses a pre-computed step for the filtering of the sequences which determine the probability that a prediction is a true hit. This pre-calculated step is a learning process that takes into account the fragmentation of metagenomic sequences to classify them. We have shown that the approach multi source in combination with a strategy of meta-learning taking into account the fragmentation outperforms current methods

APA, Harvard, Vancouver, ISO, and other styles

45

Miu, Tudor Alin. "Online learning of personalised human activity recognition models from user-provided annotations." Thesis, University of Newcastle upon Tyne, 2017. http://hdl.handle.net/10443/3635.

Full text

Abstract:

In Human Activity Recognition (HAR), supervised and semi-supervised training are important tools for devising parametric activity models. For the best modelling performance, large amounts of annotated personalised sample data are typically required. Annotating often represents the bottleneck in the overall modelling process as it usually involves retrospective analysis of experimental ground truth, like video footage. These approaches typically neglect that prospective users of HAR systems are themselves key sources of ground truth for their own activities. This research therefore involves the users of HAR monitors in the annotation process. The process relies solely on users' short term memory and engages with them to parsimoniously provide annotations for their own activities as they unfold. E ects of user input are optimised by using Online Active Learning (OAL) to identify the most critical annotations which are expected to lead to highly optimal HAR model performance gains. Personalised HAR models are trained from user-provided annotations as part of the evaluation, focusing mainly on objective model accuracy. The OAL approach is contrasted with Random Selection (RS) { a naive method which makes uninformed annotation requests. A range of simulation-based annotation scenarios demonstrate that using OAL brings bene ts in terms of HAR model performance over RS. Additionally, a mobile application is implemented and deployed in a naturalistic context to collect annotations from a panel of human participants. The deployment is proof that the method can truly run in online mode and it also shows that considerable HAR model performance gains can be registered even under realistic conditions. The ndings from this research point to the conclusion that online learning from userprovided annotations is a valid solution to the problem of constructing personalised HAR models.

APA, Harvard, Vancouver, ISO, and other styles

46

Shmeleva, Nataliya V. "Making sense of cDNA : automated annotation, storing in an interactive database, mapping to genomic DNA." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/25178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Scheeff, Eric David. "Multiple alignments of protein structures and their application to sequence annotation with hidden Markov models /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3112860.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Alili, Hiba. "Intégration de données basée sur la qualité pour l'enrichissement des sources de données locales dans le Service Lake." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED019.

Full text

Abstract:

De nos jours, d’énormes volumes de données sont créés en continu et les utilisateurs s’attendent à ce que ceux-ci soient collectés, stockés et traités quasiment en temps réel. Ainsi, les lacs de données sont devenus une solution attractive par rapport aux entrepôts de données classiques coûteux et fastidieux (nécessitant une démarche ETL), pour les entreprises qui souhaitent stocker leurs données. Malgré leurs volumes, les données stockées dans les lacs de données des entreprises sont souvent incomplètes voire non mises à jour vis-à-vis des besoins (requêtes) des utilisateurs.Les sources de données locales ont donc besoin d’être enrichies. Par ailleurs, la diversité et l’expansion du nombre de sources d’information disponibles sur le web a rendu possible l’extraction des données en temps réel. Ainsi, afin de permettre d’accéder et de récupérer l’information de manière simple et interopérable, les sources de données sont de plus en plus intégrées dans les services Web. Il s’agit plus précisément des services de données, y compris les services DaaS du Cloud Computing. L’enrichissement manuel des sources locales implique plusieurs tâches fastidieuses telles que l’identification des services pertinents, l’extraction et l’intégration de données hétérogènes, la définition des mappings service-source, etc. Dans un tel contexte, nous proposons une nouvelle approche d’intégration de données centrée utilisateur. Le but principal est d’enrichir les sources de données locales avec des données extraites à partir du web via les services de données. Cela permettrait de satisfaire les requêtes des utilisateurs tout en respectant leurs préférences en terme de coût d’exécution et de temps de réponse et en garantissant la qualité des résultats obtenus
In the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and timeconsumingETL (Extract, Transform, Load) processes are used, towards data lakes in order to manage their increasinglygrowing data. Yet the stored knowledge in companies’ databases, even though in the constructed data lakes, can never becomplete and up-to-date, because of the continuous production of data. Local data sources often need to be augmentedand enriched with information coming from external data sources. Unfortunately, the data enrichment process is one of themanual labors undertaken by experts who enrich data by adding information based on their expertise or select relevantdata sources to complete missing information. Such work can be tedious, expensive and time-consuming, making itvery promising for automation. We present in this work an active user-centric data integration approach to automaticallyenrich local data sources, in which the missing information is leveraged on the fly from web sources using data services.Accordingly, our approach enables users to query for information about concepts that are not defined in the data sourceschema. In doing so, we take into consideration a set of user preferences such as the cost threshold and the responsetime necessary to compute the desired answers, while ensuring a good quality of the obtained results

APA, Harvard, Vancouver, ISO, and other styles

49

Campos, Martin Rafael [Verfasser], Achim [Gutachter] Tresch, and Andreas [Gutachter] Beyer. "Hidden Markov Models for Genomic Segmentation and Annotation / Rafael Campos Martin ; Gutachter: Achim Tresch, Andreas Beyer." Köln : Universitäts- und Stadtbibliothek Köln, 2019. http://d-nb.info/1232912719/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Okuda, Nozomu. "The Annotation Cost of Context Switching: How Topic Models and Active Learning [May Not] Work Together." BYU ScholarsArchive, 2017. https://scholarsarchive.byu.edu/etd/6906.

Full text

Abstract:

The labeling of language resources is a time consuming task, whether aided by machine learning or not. Much of the prior work in this area has focused on accelerating human annotation in the context of machine learning, yielding a variety of active learning approaches. Most of these attempt to lead an annotator to label the items which are most likely to improve the quality of an automated, machine learning-based model. These active learning approaches seek to understand the effect of item selection on the machine learning model, but give significantly less emphasis to the effect of item selection on the human annotator. In this work, we consider a sentiment labeling task where existing, traditional active learning seems to have little or no value. We focus instead on the human annotator by ordering the items for better annotator efficiency.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Model annotation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles