Rozprawy doktorskie na temat „Annotations de données”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 45 najlepszych rozpraw doktorskich naukowych na temat „Annotations de données”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Alec, Céline. "Enrichissement et peuplement d’ontologie à partir de textes et de données du LOD : Application à l’annotation automatique de documents". Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS228/document.
Pełny tekst źródłaThis thesis deals with an approach, guided by an ontology, designed to annotate documents from a corpus where each document describes an entity of the same type. In our context, all documents have to be annotated with concepts that are usually too specific to be explicitly mentioned in the texts. In addition, the annotation concepts are represented initially only by their name, without any semantic information connected to them. Finally, the characteristics of the entities described in the documents are incomplete. To accomplish this particular process of annotation of documents, we propose an approach called SAUPODOC (Semantic Annotation of Population Using Ontology and Definitions of Concepts) which combines several tasks to (1) populate and (2) enrich a domain ontology. The population step (1) adds to the ontology information from the documents in the corpus but also from the Web of Data (Linked Open Data or LOD). The LOD represents today a promising source for many applications of the Semantic Web, provided that appropriate techniques of data acquisition are developed. In the settings of SAUPODOC, the ontology population has to take into account the diversity of the data in the LOD: multiple, equivalent, multi-valued or absent properties. The correspondences to be established, between the vocabulary of the ontology to be populated and that of the LOD, are complex, thus we propose a model to facilitate their specification. Then, we show how this model is used to automatically generate SPARQL queries and facilitate the interrogation of the LOD and the population of the ontology. The latter, once populated, is then enriched (2) with the annotation concepts and definitions that are learned through examples of annotated documents. Reasoning on these definitions finally provides the desired annotations. Experiments have been conducted in two areas of application, and the results, compared with the annotations obtained with classifiers, show the interest of the approach
Liu, Jixiong. "Semantic Annotations for Tabular Data Using Embeddings : Application to Datasets Indexing and Table Augmentation". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS529.
Pełny tekst źródłaWith the development of Open Data, a large number of data sources are made available to communities (including data scientists and data analysts). This data is the treasure of digital services as long as data is cleaned, unbiased, as well as combined with explicit and machine-processable semantics in order to foster exploitation. In particular, structured data sources (CSV, JSON, XML, etc.) are the raw material for many data science processes. However, this data derives from different domains for which consumers are not always familiar with (knowledge gap), which complicates their appropriation, while this is a critical step in creating machine learning models. Semantic models (in particular, ontologies) make it possible to explicitly represent the implicit meaning of data by specifying the concepts and relationships present in the data. The provision of semantic labels on datasets facilitates the understanding and reuse of data by providing documentation on the data that can be easily used by a non-expert. Moreover, semantic annotation opens the way to search modes that go beyond simple keywords and allow the use of queries of a high conceptual level on the content of the datasets but also their structure while overcoming the problems of syntactic heterogeneity encountered in tabular data. This thesis introduces a complete pipeline for the extraction, interpretation, and applications of tables in the wild with the help of knowledge graphs. We first refresh the exiting definition of tables from the perspective of table interpretation and develop systems for collecting and extracting tables on the Web and local files. Three table interpretation systems are further proposed based on either heuristic rules or graph representation models facing the challenges observed from the literature. Finally, we introduce and evaluate two table augmentation applications based on semantic annotations, namely data imputation and schema augmentation
Lutz, Quentin. "Graph-based contributions to machine-learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT010.
Pełny tekst źródłaA graph is a mathematical object that makes it possible to represent relationships (called edges) between entities (called nodes). Graphs have long been a focal point in a number of problems ranging from work by Euler to PageRank and shortest-path problems. In more recent times, graphs have been used for machine learning.With the advent of social networks and the world-wide web, more and more datasets can be represented using graphs. Those graphs are ever bigger, sometimes with billions of edges and billions of nodes. Designing efficient algorithms for analyzing those datasets has thus proven necessary. This thesis reviews the state of the art and introduces new algorithms for the clustering and the embedding of the nodes of massive graphs. Furthermore, in order to facilitate the handling of large graphs and to apply the techniques under study, we introduce Scikit-network, a free and open-source Python library which was developed during the thesis. Many tasks, such as the classification or the ranking of the nodes using centrality measures, can be carried out thanks to Scikit-network.We also tackle the problem of labeling data. Supervised machine learning techniques require labeled data to be trained. The quality of this labeled data has a heavy influence on the quality of the predictions of those techniques once trained. However, building this data cannot be achieved through the sole use of machines and requires human intervention. We study the data labeling problem in a graph-based setting, and we aim at describing the solutions that require as little human intervention as possible. We characterize those solutions and illustrate how they can be applied in real use-cases
Savonnet, Marinette. "Systèmes d'Information Scientifique : des modèles conceptuels aux annotations sémantiques Application au domaine de l'archéologie et des sciences du vivant". Habilitation à diriger des recherches, Université de Bourgogne, 2013. http://tel.archives-ouvertes.fr/tel-00917782.
Pełny tekst źródłaMefteh, Wafa. "Approche ontologique pour la modélisation et le raisonnement sur les trajectoires : prise en compte des aspects thématiques, temporels et spatiaux". Thesis, La Rochelle, 2013. http://www.theses.fr/2013LAROS405/document.
Pełny tekst źródłaThe evolution of systems capture data on moving objects has given birth to new generations of applications in various fields. Captured data, commonly called ”trajectories”, are at the heart of applications that analyze and monitor road, maritime and air traffic or also those that optimize public transport. They are also used in the video game, movies, sports and field biology to study animal behavior, by motion capture systems. Today, the data produced by these sensors are raw spatio-temporal characters hiding semantically rich and meaningful informations to an expert data. So, the objective of this thesis is to automatically associate the spatio-temporal data descriptions or concepts related to the behavior of moving objects, interpreted by humans, but also by machines. Based on this observation, we propose a process based on the experience of real-world moving objects, including vessel and plane, to an ontological model for the generic path. We present some applications of interest to experts in the field and show the inability to use the paths in their raw state. Indeed, the analysis of these queries identified three types of semantic components : thematic, spatial and temporal. These components must be attached to data paths leading to enter an annotation that transforms raw semantic paths process trajectories. To exploit the semantic trajectories, we construct a high-level ontology for the domain of the path which models the raw data and their annotations. Given the need of complete reasoning with concepts and spatial and temporal operators, we propose the solution for reuse of ontologies time space. In this thesis, we also present our results from a collaboration with a research team that focuses on the analysis and understanding of the behavior of marine mammals in their natural environment. We describe the process used in the first two areas, which share raw data representing the movement of seals to ontological trajectory model seals. We pay particular attention to the contribution of the upper ontology defined in a contextual framework for ontology application. Finally, this thesis presents the difficulty of implementation on real data size (hundreds of thousands) when reasoning through inference mechanisms using business rules
Tran, Hoang Tung. "Automatic tag correction in videos : an approach based on frequent pattern mining". Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4028/document.
Pełny tekst źródłaThis thesis presents a new system for video auto tagging which aims at correcting the tags provided by users for videos uploaded on the Internet. Most existing auto-tagging systems rely mainly on the textual information and learn a great number of classifiers (on per possible tag) to tag new videos. However, the existing user-provided video annotations are often incorrect and incomplete. Indeed, users uploading videos might often want to rapidly increase their video’s number-of-view by tagging them with popular tags which are irrelevant to the video. They can also forget an obvious tag which might greatly help an indexing process. In this thesis, we limit the use this questionable textual information and do not build a supervised model to perform the tag propagation. We propose to compare directly the visual content of the videos described by different sets of features such as SIFT-based Bag-Of-visual-Words or frequent patterns built from them. We then propose an original tag correction strategy based on the frequency of the tags in the visual neighborhood of the videos. We have also introduced a number of strategies and datasets to evaluate our system. The experiments show that our method can effectively improve the existing tags and that frequent patterns build from Bag-Of-visual-Words are useful to construct accurate visual features
Kellou-Menouer, Kenza. "Découverte de schéma pour les données du Web sémantique". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV047/document.
Pełny tekst źródłaAn increasing number of linked data sources are published on the Web. However, their schema may be incomplete or missing. In addition, data do not necessarily follow their schema. This flexibility for describing the data eases their evolution, but makes their exploitation more complex. In our work, we have proposed an automatic and incremental approach enabling schema discovery from the implicit structure of the data. To complement the description of the types in a schema, we have also proposed an approach for finding the possible versions (patterns) for each of them. It proceeds online without having to download or browse the source. This can be expensive or even impossible because the sources may have some access limitations, either on the query execution time, or on the number of queries.We have also addressed the problem of annotating the types in a schema, which consists in finding a set of labels capturing their meaning. We have proposed annotation algorithms which provide meaningful labels using external knowledge bases. Our approach can be used to find meaningful type labels during schema discovery, and also to enrichthe description of existing types.Finally, we have proposed an approach to evaluate the gap between a data source and itsschema. To this end, we have proposed a setof quality factors and the associated metrics, aswell as a schema extension allowing to reflect the heterogeneity among instances of the sametype. Both factors and schema extension are used to analyze and improve the conformity between a schema and the instances it describes
Paganini, Julien. "L'analyse de données génomiques et l'annotation à l'heure des NGS : la bioinformatique 2.0". Thesis, Aix-Marseille, 2015. http://www.theses.fr/2015AIXM4105.
Pełny tekst źródłaRecent technological advances in terms of genomic sequencing data led to a strong growth of available data and the emergence of new needs. Initially limited to the analysis of simple sequence or limited amount of data, bioinformatics has to adapt to this new technological and scientific context to meet the new challenges offered. Through different projects in different genomic era, this thesis fits into this contexts change where bioinfomatics is no longer limited to the use of tool with unitary goal and human dependent steps. Focused on the development of complex analysis strategies for the development or the availability of fully automated tools and high-value data, this work introduce the important role of bioinformatics version 2.0. We will show how it is able to answer to precise biological question through specific strategy that integrate all the biological concepts, existing bioinformatics tools and human expertise related to the domain. To conclude, we discuss about the role and the impact of the bioinformatics 2.0 that requires a expert vision at biological and computers level adapted to NGS data
Reverdy, Clément. "Annotation et synthèse basée données des expressions faciales de la Langue des Signes Française". Thesis, Lorient, 2019. http://www.theses.fr/2019LORIS550.
Pełny tekst źródłaFrench Sign Language (LSF) represents part of the identity and culture of the deaf community in France. One way to promote this language is to generate signed content through virtual characters called signing avatars. The system we propose is part of a more general project of gestural synthesis of LSF by concatenation that allows to generate new sentences from a corpus of annotated motion data captured via a marker-based motion capture device (MoCap) by editing existing data. In LSF, facial expressivity is particularly important since it is the vector of numerous information (e.g., affective, clausal or adjectival). This thesis aims to integrate the facial aspect of LSF into the concatenative synthesis system described above. Thus, a processing pipeline is proposed, from data capture via a MoCap device to facial animation of the avatar from these data and to automatic annotation of the corpus thus constituted. The first contribution of this thesis concerns the employed methodology and the representation by blendshapes both for the synthesis of facial animations and for automatic annotation. It enables the analysis/synthesis scheme to be processed at an abstract level, with homogeneous and meaningful descriptors. The second contribution concerns the development of an automatic annotation method based on the recognition of expressive facial expressions using machine learning techniques. The last contribution lies in the synthesis method, which is expressed as a rather classic optimization problem but in which we have included
Casallas-Gutiérrez, Rubby. "Objets historiques et annotations pour les environnements logiciels". Université Joseph Fourier (Grenoble), 1996. http://tel.archives-ouvertes.fr/tel-00004982.
Pełny tekst źródłaBayle, Yann. "Apprentissage automatique de caractéristiques audio : application à la génération de listes de lecture thématiques". Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0087/document.
Pełny tekst źródłaThis doctoral dissertation presents, discusses and proposes tools for the automatic information retrieval in big musical databases.The main application is the supervised classification of musical themes to generate thematic playlists.The first chapter introduces the different contexts and concepts around big musical databases and their consumption.The second chapter focuses on the description of existing music databases as part of academic experiments in audio analysis.This chapter notably introduces issues concerning the variety and unequal proportions of the themes contained in a database, which remain complex to take into account in supervised classification.The third chapter explains the importance of extracting and developing relevant audio features in order to better describe the content of music tracks in these databases.This chapter explains several psychoacoustic phenomena and uses sound signal processing techniques to compute audio features.New methods of aggregating local audio features are proposed to improve song classification.The fourth chapter describes the use of the extracted audio features in order to sort the songs by themes and thus to allow the musical recommendations and the automatic generation of homogeneous thematic playlists.This part involves the use of machine learning algorithms to perform music classification tasks.The contributions of this dissertation are summarized in the fifth chapter which also proposes research perspectives in machine learning and extraction of multi-scale audio features
Zaag, Rim. "Enrichissement de profils transcriptomiques par intégration de données hétérogènes : annotation fonctionnelle de gènes d'Arabidopsis thaliana impliqués dans la réponse aux stress". Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLE013/document.
Pełny tekst źródłaIn the era of computational biology, functional annotation remains a major challenge. Recent annotation methods are based on the guilt by association assumption and rely on data integration to identify functional partners. However, most of these methods suffer from data heterogeneity and a lack of biological context specificity which would probably explain the high rate of false positives among predictions. This thesis develops an approach of molecular data integration controlling their heterogeneity in order to annotate Arabidopsis thaliana genes involved in stress response. The major contributions of this thesis are: (1) functional annotation of groups of co-expressed genes by omics data integration (2) the construction of a coregulatory gene network through a cross-analysis of the coexpressed groups strengthening the functional links between genes (3) the development of a supervised learning method for the inference of gene function centered on the GO Slim terms with a control of the FDR. By identifying a decision rule by term, this method was used to predict the function of 47 orphan or partially annotated genes
Alili, Hiba. "Intégration de données basée sur la qualité pour l'enrichissement des sources de données locales dans le Service Lake". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED019.
Pełny tekst źródłaIn the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and timeconsumingETL (Extract, Transform, Load) processes are used, towards data lakes in order to manage their increasinglygrowing data. Yet the stored knowledge in companies’ databases, even though in the constructed data lakes, can never becomplete and up-to-date, because of the continuous production of data. Local data sources often need to be augmentedand enriched with information coming from external data sources. Unfortunately, the data enrichment process is one of themanual labors undertaken by experts who enrich data by adding information based on their expertise or select relevantdata sources to complete missing information. Such work can be tedious, expensive and time-consuming, making itvery promising for automation. We present in this work an active user-centric data integration approach to automaticallyenrich local data sources, in which the missing information is leveraged on the fly from web sources using data services.Accordingly, our approach enables users to query for information about concepts that are not defined in the data sourceschema. In doing so, we take into consideration a set of user preferences such as the cost threshold and the responsetime necessary to compute the desired answers, while ensuring a good quality of the obtained results
Ghufran, Mohammad. "Découverte et réconciliation de données numeriques relatives aux personnes pour la gestion des ressources humaines". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLC062/document.
Pełny tekst źródłaFinding the appropriate individual to hire is a crucial part of any organization. With the number of applications increasing due to the introduction of online job portals, it is desired to automatically match applicants with job offers. Existing approaches that match applicants with job offers take resumes as they are and do not attempt to complete the information on a resume by looking for more information on the Internet. The objective of this thesis is to fill this gap by discovering online resources pertinent to an applicant. To this end, a novel method for extraction of key information from resumes is proposed. This is a challenging task since resumes can have diverse structures and formats, and the entities present within are ambiguous. Identification of Web results using the key information and their reconciliation is another challenge. We propose an algorithm to generate queries, and rank the results to obtain the most pertinent online resources. In addition, we specifically tackle reconciliation of social network profiles through a method that is able to identify profiles of individuals across different networks. Moreover, a method to resolve ambiguity in locations, or predict it when absent, is also presented. Experiments on real data sets are conducted for all the different algorithms proposed in this thesis and they show good results
Jedidi, Anis. "MODÉLISATION GÉNÉRIQUE DE DOCUMENTS MULTIMÉDIA PAR DES MÉTADONNÉES : MÉCANISMES D'ANNOTATION ET D'INTERROGATION". Phd thesis, Université Paul Sabatier - Toulouse III, 2005. http://tel.archives-ouvertes.fr/tel-00424059.
Pełny tekst źródłaDiallo, Gayo. "Une architecture à base d'ontologies pour la gestion unifiée des données structurées et non structurées". Université Joseph Fourier (Grenoble), 2006. http://www.theses.fr/2006GRE10241.
Pełny tekst źródłaOrganizations' information systems contain different kinds of data, dispersed in several sources. The purpose of the managing heterogeneous data is to offer a transparent access to this set of sources. We are interested in the management of structured (relational databases) and unstructured "multilingual textual sources") data. We especially describe an approach for taking textual sources into account in an integration system. The approach we propose is based on the use of Semantic Web technologies and different kinds of ontologies. Ontologies are used to define the global schema (global ontology) and the sources to be integrated (local ontologies). Local ontologies are obtained in a semi-automatic way using reverse engineering techniques. Ontologies are also used for the hybrid representation oftextual sources. The hybrid representation combines cataloguing infonnation, vectors oftenns and concepts and optionally named entities identified in documents. We have designed and implemented an ontology server to manage multiple ontologies and support queries. A first application domain of our work has been the brain field. We have developed or enriched ontologies for brain knowledge management and semantic characterization
Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM048.
Pełny tekst źródłaThis dissertation delves into the use of textual metadata for image understanding. We seek to exploit this additional textual information as weak supervision to improve the learning of recognition models. There is a recent and growing interest for methods that exploit such data because they can potentially alleviate the need for manual annotation, which is a costly and time-consuming process. We focus on two types of visual data with associated textual information. First, we exploit news images that come with descriptive captions to address several face related tasks, including face verification, which is the task of deciding whether two images depict the same individual, and face naming, the problem of associating faces in a data set to their correct names. Second, we consider data consisting of images with user tags. We explore models for automatically predicting tags for new images, i. E. Image auto-annotation, which can also used for keyword-based image search. We also study a multimodal semi-supervised learning scenario for image categorisation. In this setting, the tags are assumed to be present in both labelled and unlabelled training data, while they are absent from the test data. Our work builds on the observation that most of these tasks can be solved if perfectly adequate similarity measures are used. We therefore introduce novel approaches that involve metric learning, nearest neighbour models and graph-based methods to learn, from the visual and textual data, task-specific similarities. For faces, our similarities focus on the identities of the individuals while, for images, they address more general semantic visual concepts. Experimentally, our approaches achieve state-of-the-art results on several standard and challenging data sets. On both types of data, we clearly show that learning using additional textual information improves the performance of visual recognition systems
Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.
Pełny tekst źródłaAlborzi, Seyed Ziaeddin. "Automatic Discovery of Hidden Associations Using Vector Similarity : Application to Biological Annotation Prediction". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0035/document.
Pełny tekst źródłaThis thesis presents: 1) the development of a novel approach to find direct associations between pairs of elements linked indirectly through various common features, 2) the use of this approach to directly associate biological functions to protein domains (ECDomainMiner and GODomainMiner), and to discover domain-domain interactions, and finally 3) the extension of this approach to comprehensively annotate protein structures and sequences. ECDomainMiner and GODomainMiner are two applications to discover new associations between EC Numbers and GO terms to protein domains, respectively. They find a total of 20,728 and 20,318 non-redundant EC-Pfam and GO-Pfam associations, respectively, with F-measures of more than 0.95 with respect to a “Gold Standard” test set extracted from InterPro. Compared to around 1500 manually curated associations in InterPro, ECDomainMiner and GODomainMiner infer a 13-fold increase in the number of available EC-Pfam and GO-Pfam associations. These function-domain associations are then used to annotate thousands of protein structures and millions of protein sequences for which their domain composition is known but that currently lack experimental functional annotations. Using inferred function-domain associations and considering taxonomy information, thousands of annotation rules have automatically been generated. Then, these rules have been utilized to annotate millions of protein sequences in the TrEMBL database
García-Flores, Jorge Juárez. "Annotation sémantique des spécifications informatiques de besoins par la méthode d'Exploration Contextuelle : une contribution des méthodes linguistiques aux conceptions de logiciels". Paris 4, 2007. http://www.theses.fr/2007PA040015.
Pełny tekst źródłaThis PhD research proposes a linguistic oriented annotation method to support requirements engineering activities (requirements elicitation, validation or modelling). Our approach presents a method for the automatic extraction of action sentences from software requirements specifications (SRS). Its aim is to annotate actions sentences from industrial SRS documents, and to recognize action parameters (action’s controller, goal and constraints). It presents a linguistic analysis of action markers and a technique to automatically annotate action sentences by means of Contextual Exploration rules. Discourse analysis of SRS is based on the Cognitive and Applicative Grammar linguistic theory. The main results our work are : 1) a typology of action verbs for requirements specifications, 2) a base of linguistic markers and rules for semantic annotation of actions on SRS documents, and 3) an implementations of this rules on the EXCOM semantic annotation system, which automatically attributes action annotations to a corpora of commercial (French and Spanish) SRS documents
Naert, Lucie. "Capture, annotation and synthesis of motions for the data-driven animation of sign language avatars". Thesis, Lorient, 2020. http://www.theses.fr/2020LORIS561.
Pełny tekst źródłaThis thesis deals with the capture, annotation, synthesis and evaluation of arm and hand motions for the animation of avatars communicating in Sign Languages (SL). Currently, the production and dissemination of SL messages often depend on video recordings which lack depth information and for which editing and analysis are complex issues. Signing avatars constitute a powerful alternative to video. They are generally animated using either procedural or data-driven techniques. Procedural animation often results in robotic and unrealistic motions, but any sign can be precisely produced. With data-driven animation, the avatar's motions are realistic but the variety of the signs that can be synthesized is limited and/or biased by the initial database. As we considered the acceptance of the avatar to be a prime issue, we selected the data-driven approach but, to address its main limitation, we propose to use annotated motions present in an SL Motion Capture database to synthesize novel SL signs and utterances absent from this initial database. To achieve this goal, our first contribution is the design, recording and perceptual evaluation of a French Sign Language (LSF) Motion Capture database composed of signs and utterances performed by deaf LSF teachers. Our second contribution is the development of automatic annotation techniques for different tracks based on the analysis of the kinematic properties of specific joints and existing machine learning algorithms. Our last contribution is the implementation of different motion synthesis techniques based on motion retrieval per phonological component and on the modular reconstruction of new SL content with the additional use of motion generation techniques such as inverse kinematics, parameterized to comply to the properties of real motions
Dufresne, Yoann. "Algorithmique pour l’annotation automatique de peptides non ribosomiques". Thesis, Lille 1, 2016. http://www.theses.fr/2016LIL10147/document.
Pełny tekst źródłaThe monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. However, most of the online molecular resources only provide atomic structures but not monomeric structures. So, we designed a software called smiles2monomers (s2m) to infer monomeric structures from chemical ones. The underlying algorithm is composed of two steps: a search of the monomers using a subgraph isomorphism algorithm fitted to our data and a tiling algorithm to obtain the best coverage of the polymer by non-overlapping monomers. The search is based on a Markovian index improving the execution time by 30% compared to the state of art. The tiling is performed using a greedy algorithm refined by a “branch & cut” algorithm. s2m had been tested on two different already annotated datasets. The software reconstructed the manual annotations with an excellent sensibility in a very short time. Norine database, the reference knowledge base about specific polymers called Non Ri bosomal Peptides (NRP), is developed by our research group. s2m, executed on the Norine database, alerted us about wrong manual annotations. So, s2m not only creates new annotations, but also facilitates the process of annotation curation. The new annotations generated by the software are currently used for the discovery of new NRP, new activities and may be used to create completely new and artificial NRP
Gagnon, Mathieu. "Vers une méthode d’acquisition et d’analyse de données pour le dépistage précoce de la maladie d’Alzheimer dans un environnement intelligent". Mémoire, Université de Sherbrooke, 2018. http://hdl.handle.net/11143/11832.
Pełny tekst źródłaHayer, Juliette. "Développement d'une base de connaissances du virus de l'hépatite B, HBVdb, pour l'étude de la résistance aux traitements : intégration d'outils d'analyses de séquences et application à la modélisation moléculaire de la polymérase". Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10023/document.
Pełny tekst źródłaWe developed HBVdb (http://hbvdb.ibcp.fr) to allow researchers to investigate the geneticcharacteristics and variability of the HBV sequences and viral resistance to treatment. HBVdb contains a collection of computer-annotated sequences based on manually annotated reference genomes. The automatic annotation procedure ensures standardized nomenclature for all HBV entries across the database. HBVdb is accessible through a dedicated website integrating generic and specialized analysis tools (annotation, genotyping, resistance profile detection), and pre- computed datasets. The HBV polymerase is the main target of anti-HBV drugs, nucleos(t)ides analogues (NA), which inhibit the activity of reverse transcriptase (RT), but NA resistance mutations appeared. Nevertheless, another enzymatic domain could be a potential drug target: RNase H domain, linked to RT, and involved in degradation of the RNA during the reverse transcription. To overcome the lack of experimental solved structure, thanks to sequences analysis from HBVdb, we built an homology model of RNase H, which helped to define the features of this type 1 RNase H. Finally, to confirm assumptions from this model and to put it in a more global context, we built an extensive HBV polymerase model, which includes the RT and RNase H domains, and helps to answer the question about the existence of connection domain linking them. We performed analyses on this model, regarding the interactions between the RT catalytic site and the Tenofovir, mapping known resistance mutations and the most variables positions of the HBV polymerase
Derathé, Arthur. "Modélisation de la qualité de gestes chirurgicaux laparoscopiques". Thesis, Université Grenoble Alpes, 2020. https://thares.univ-grenoble-alpes.fr/2020GRALS021.pdf.
Pełny tekst źródłaSous cœlioscopie, le traitement chirurgical permet une meilleure prise en charge du patient, et sa pratique est de plus en plus fréquente en routine clinique. Cette pratique présente néanmoins ses difficultés propres pour le chirurgien, et nécessite une formation prolongée pendant l’internat et en post-internat. Pour faciliter cette formation, il est notamment possible de développer des outils d’évaluation et d’analyse de la pratique chirurgicale.Dans cette optique, l’objectif de ce travail de thèse est d’étudier la faisabilité d’une méthodologie proposant, à partir d’un traitement algorithmique, des analyses à portée clinique pertinente pour le chirurgien. J’ai donc traité les problèmes suivants : Il m’a fallu recueillir et annoter un jeu de données, implémenter un environnement d’apprentissage dédié à la prédiction d’un aspect spécifique de la pratique chirurgicale, et proposer une approche permettant de traduire mes résultats algorithmiques sous une forme pertinente pour le chirurgien. Dès que cela était possible, nous avons cherché à valider ces différentes étapes de la méthodologie
Hacid, Kahina. "Handling domain knowledge in system design models. An ontology based approach". Phd thesis, Toulouse, INPT, 2018. http://oatao.univ-toulouse.fr/20157/7/HACID_kahina.pdf.
Pełny tekst źródłaKrömer, Cora Felicitas. "Crise de lecture : la lecture, une idée neuve à l'ère du numérique ? : Le cas des ouvrages de fiction et de leurs commentaires en ligne". Thesis, Le Mans, 2020. http://www.theses.fr/2020LEMA3010.
Pełny tekst źródłaDigital technology is transforming the production, circulation and reception of written culture. These changes provide an opportunity to examine reading practices, whose decline is regularly deplored, and the pivotal moment in HSS– new objects, terrains and methods– that this examination confronts us with. Starting from the question, still unanswered– why and how do people read?– this thesis analyses ordinary reading experiences shared through reviews posted on the online reader community Babelio. It tests the potentialities and limits of an ad hoc methodology, a mixed-methods approach deploying quali-quantitative and computer-assisted methods (database and text mining). The use of preliminary work on reading and literary exchanges, coming from various disciplines in the humanities, enables a deeper understanding of the new modalities associated with the phenomenon of reading in the digital age. The confrontation of critical commentaries with theoretical notions on the act and effects of reading makes it possible to: underline the importance and taste for online sharing of readers as well as its commercial exploitation by social networks dedicated to this cultural activity; verify the experimental value of the concepts of cooperation between text and reader, of immersion, and of pleasures of reading on printed media. Thus, in the digital literary sphere, it is not necessarily reading itself that proves to be a new idea, but rather the possibility of sharing between peers within specific online communities
Singh, Dory. "Extraction des relations de causalité dans les textes économiques par la méthode de l’exploration contextuelle". Thesis, Paris 4, 2017. http://www.theses.fr/2017PA040155.
Pełny tekst źródłaThe thesis describes a process of extraction of causal information, which contrary to econometric, is essentially based on linguistic knowledge. Econometric exploits mathematic or statistic models, which are now, subject of controversy. So, our approach intends to complete or to support the econometric models. It deals with to annotate automatically textual segments according to Contextual Exploration (CE) method. The CE is a linguistic and computational strategy aimed at extracting knowledge according to points of view. Therefore, this contribution adopts the discursive point of view of causality where the categories are structured in a semantic map. These categories allow to elaborate abductive rules implemented in the systems EXCOM2 and SEMANTAS
Thuilier, Juliette. "Contraintes préférentielles et ordre des mots en français". Phd thesis, Université Paris-Diderot - Paris VII, 2012. http://tel.archives-ouvertes.fr/tel-00781228.
Pełny tekst źródłaEl, Khelifi Aymen. "Approche générique d’extraction automatique des événements et leur exploitation". Thesis, Paris 4, 2012. http://www.theses.fr/2012PA040189.
Pełny tekst źródłaIn the framework of our thesis, we proposed a generic approach for the automatic extraction of events and their exploitation. This approach is divided into four independent and reusable components. The first component of pretreatment, in which texts are cleaned and segmented. During the second stage, events are extracted based on our algorithm AnnotEC which has polynomial complexity. AnnotEC is associated with semantic maps and dedicated linguistic resources. We have proposed two new similarity measures SimCatégoreille and SimEvent to group similar events using clustering algorithms.Annotations, added throughout the first three steps, are used at the last component by summarizing files configurable by users. The approach was evaluated on a corpus of Web 2.0, we compared the obtained results with machine learning methods and linguistic compiling methods and we got good results
Muzeau, Julien. "Système de vision pour la sécurité des personnes sur les remontées mécaniques". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT075.
Pełny tekst źródłaWith the increase in the number of visitors in mountain ranges and the multiplication of accidents on skilifts attributed to human behavior, safety has become a major issue for resort managers.To fight this phenomenon, the start-up from Grenoble Bluecime developed a computer vision system, named SIVAO, which is able to detect a hazardous situation at the boarding of a skilift. The operation of the system breaks down into three steps. First, the chair (or vehicle) is detected in the image. Then, the presence of passengers is confirmed or invalidated. Finally, the position of the security railing is determined. If passengers are present on the vehicle and if the security railing is not down, then the situation is considered as hazardous. In that case, an alarm is triggered, in order to inform the skiers or the operator who can slow down the skilift to secure the vehicle.Despite convincing results, numerous difficulties have to be overcome by SIVAO: various variabilities (vehicle size, boarding orientation, meteorological conditions, number of passengers), camera vibration, complex configuration in the context of a new plant, etc.The MIVAO project, in partnership with the Hubert Curien laboratory, the Bluecime start-up and the Sofival company, was born in order to overcome the previous challenges. The goal is to build an artificial intelligence able to detect, even anticipate, a hazardous situation on vehicles of a skilift, in order to guarantee the security of passengers. Within this project, the general goal of the Gipsa-lab is the automatic annotation, in the least supervised way possible, of chairlift videos.Firstly, we present a classification method whose aim is to confirm or invalidate the presence of passengers on each vehicle. In fact, this preliminary information is critical for the analysis of a potential danger. The proposed technique is based on hand-made features which have a physical interpretation. We show that, by including a priori knowledge, the obtained results are comptetitive against those from complex neural networks, allowing real-time functioning as well.Then, we detail a process for passenger counting on each vehicle in the most unsupervised way possible. This pipeline consists in a dimensionality reduction step followed by a data clustering stage. The latter aims, in the context of our project, at gathering tracks whose vehicles carry the same number of passengers. One can then deduce, from a small number of labels obtained by hand, the number of people present during each track. In particular, we detail two algorithms developed during this thesis. The first one proposes a generalisation of the density-based clustering method DBSCAN, via the introduction of the concept of ellipsoidal neighborhood. The second conciliates Gaussian mixture and spectral clusterings so as to discover non-convex data groups.Finally, we address the problem of automatic extraction of vehicles from camera images, as well as the modeling of their trajectory. To do this, we propose a first method which consists in removing the noise from the optical flow by means of the optical strain. We also present a technique for automatically determining the duration of a vehicle track via frequency analysis.Moreover, we detail an annotation work whose objective is to define clipping paths, pixel by pixel, over the passengers and vehicles in sequences of fourty consecutive images
Silveira, Mastella Laura. "Exploitation sémantique des modèles d'ingénierie : application à la modélisation des réservoirs pétroliers". Phd thesis, École Nationale Supérieure des Mines de Paris, 2010. http://pastel.archives-ouvertes.fr/pastel-00005770.
Pełny tekst źródłaTang, My Thao. "Un système interactif et itératif extraction de connaissances exploitant l'analyse formelle de concepts". Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0060/document.
Pełny tekst źródłaIn this thesis, we present a methodology for interactive and iterative extracting knowledge from texts - the KESAM system: A tool for Knowledge Extraction and Semantic Annotation Management. KESAM is based on Formal Concept Analysis for extracting knowledge from textual resources that supports expert interaction. In the KESAM system, knowledge extraction and semantic annotation are unified into one single process to benefit both knowledge extraction and semantic annotation. Semantic annotations are used for formalizing the source of knowledge in texts and keeping the traceability between the knowledge model and the source of knowledge. The knowledge model is, in return, used for improving semantic annotations. The KESAM process has been designed to permanently preserve the link between the resources (texts and semantic annotations) and the knowledge model. The core of the process is Formal Concept Analysis that builds the knowledge model, i.e. the concept lattice, and ensures the link between the knowledge model and annotations. In order to get the resulting lattice as close as possible to domain experts' requirements, we introduce an iterative process that enables expert interaction on the lattice. Experts are invited to evaluate and refine the lattice; they can make changes in the lattice until they reach an agreement between the model and their own knowledge or application's need. Thanks to the link between the knowledge model and semantic annotations, the knowledge model and semantic annotations can co-evolve in order to improve their quality with respect to domain experts' requirements. Moreover, by using FCA to build concepts with definitions of sets of objects and sets of attributes, the KESAM system is able to take into account both atomic and defined concepts, i.e. concepts that are defined by a set of attributes. In order to bridge the possible gap between the representation model based on a concept lattice and the representation model of a domain expert, we then introduce a formal method for integrating expert knowledge into concept lattices in such a way that we can maintain the lattice structure. The expert knowledge is encoded as a set of attribute dependencies which is aligned with the set of implications provided by the concept lattice, leading to modifications in the original lattice. The method also allows the experts to keep a trace of changes occurring in the original lattice and the final constrained version, and to access how concepts in practice are related to concepts automatically issued from data. The method uses extensional projections to build the constrained lattices without changing the original data and provide the trace of changes. From an original lattice, two different projections produce two different constrained lattices, and thus, the gap between the representation model based on a concept lattice and the representation model of a domain expert is filled with projections
Er, Ngurah Agus Sanjaya. "Techniques avancées pour l'extraction d'information par l'exemple". Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0060.
Pełny tekst źródłaSearching for information on the Web is generally achieved by constructing a query from a set of keywords and firing it to a search engine. This traditional method requires the user to have a relatively good knowledge of the domain of the targeted information to come up with the correct keywords. The search results, in the form of Web pages, are ranked based on the relevancy of each Web page to the given keywords. For the same set of keywords, the Web pages returned by the search engine would be ranked differently depending on the user. Moreover, finding specific information such as a country and its capital city would require the user to browse through all the documents and reading its content manually. This is not only time consuming but also requires a great deal of effort. We address in this thesis an alternative method of searching for information, i.e. by giving examples of the information in question. First, we try to improve the accuracy of the search by example systems by expanding the given examples syntactically. Next, we use truth discovery paradigm to rank the returned query results. Finally, we investigate the possibility of expanding the examples semantically through labelling each group of elements of the examples
Khan, Imran. "Cloud-based cost-efficient application and service provisioning in virtualized wireless sensor networks". Thesis, Evry, Institut national des télécommunications, 2015. http://www.theses.fr/2015TELE0019/document.
Pełny tekst źródłaWireless Sensor Networks (WSNs) are becoming ubiquitous and are used in diverse applications domains. Traditional deployments of WSNs are domain-specific, with applications usually embedded in the WSN, precluding the re-use of the infrastructure by other applications. This can lead to redundant deployments. Now with the advent of IoT, this approach is less and less viable. A potential solution lies in the sharing of a same WSN by multiple applications and services, to allow resource- and cost-efficiency. In this dissertation, three architectural solutions are proposed for this purpose. The first solution consists of two parts: the first part is a novel multilayer WSN virtualization architecture that allows the provisioning of multiple applications and services over the same WSN deployment. The second part of this contribution is the extended architecture that allows virtualized WSN infrastructure to interact with a WSN Platform-as-a-Service (PaaS) at a higher level of abstraction. Both these solutions are implemented and evaluated using two scenario-based proof-of-concept prototypes using Java SunSpot kit. The second architectural solution is a novel data annotation architecture for the provisioning of semantic applications in virtualized WSNs. It is capable of providing in-network, distributed, real-time annotation of raw sensor data and uses overlays as the cornerstone. This architecture is implemented and evaluated using Java SunSpot, AdvanticSys kits and Google App Engine. The third architectural solution is the enhancement to the data annotation architecture on two fronts. One is a heuristic-based genetic algorithm used for the selection of capable nodes for storing the base ontology. The second front is the extension to the proposed architecture to support ontology creation, distribution and management. The simulation results of the algorithm are presented and the ontology management extension is implemented and evaluated using a proof-of-concept prototype using Java SunSpot kit. As another contribution, an extensive state-of-the-art review is presented that introduces the basics of WSN virtualization and motivates its pertinence with carefully selected scenarios. This contribution substantially improves current state-of-the-art reviews in terms of the scope, motivation, details, and future research issues
Ben, Salamah Janan. "Extraction de connaissances dans des textes arabes et français par une méthode linguistico-computationnelle". Thesis, Paris 4, 2017. http://www.theses.fr/2017PA040137.
Pełny tekst źródłaIn this thesis, we proposed a multilingual generic approach for the automatic information extraction. Particularly, events extraction of price variation and temporal information extraction linked to temporal referential. Our approach is based on the constitution of several semantic maps by textual analysis in order to formalize the linguistic traces expressed by categories. We created a database for an expert system to identify and annotate information (categories and their characteristics) based on the contextual rule groups. Two algorithms AnnotEC and AnnotEV have been applied in the SemanTAS platform to validate our assumptions. We have obtained a satisfactory result; Accuracy and recall are around 80%. We presented extracted knowledge by a summary file. In order to approve the multilingual aspect of our approach, we have carried out experiments on French and Arabic. We confirmed the scalability level by the annotation of large corpus
Amardeilh, Florence. "Web Sémantique et Informatique Linguistique : propositions méthodologiques et réalisation d'une plateforme logicielle". Phd thesis, Université de Nanterre - Paris X, 2007. http://tel.archives-ouvertes.fr/tel-00146213.
Pełny tekst źródłaKhan, Imran. "Cloud-based cost-efficient application and service provisioning in virtualized wireless sensor networks". Electronic Thesis or Diss., Evry, Institut national des télécommunications, 2015. http://www.theses.fr/2015TELE0019.
Pełny tekst źródłaWireless Sensor Networks (WSNs) are becoming ubiquitous and are used in diverse applications domains. Traditional deployments of WSNs are domain-specific, with applications usually embedded in the WSN, precluding the re-use of the infrastructure by other applications. This can lead to redundant deployments. Now with the advent of IoT, this approach is less and less viable. A potential solution lies in the sharing of a same WSN by multiple applications and services, to allow resource- and cost-efficiency. In this dissertation, three architectural solutions are proposed for this purpose. The first solution consists of two parts: the first part is a novel multilayer WSN virtualization architecture that allows the provisioning of multiple applications and services over the same WSN deployment. The second part of this contribution is the extended architecture that allows virtualized WSN infrastructure to interact with a WSN Platform-as-a-Service (PaaS) at a higher level of abstraction. Both these solutions are implemented and evaluated using two scenario-based proof-of-concept prototypes using Java SunSpot kit. The second architectural solution is a novel data annotation architecture for the provisioning of semantic applications in virtualized WSNs. It is capable of providing in-network, distributed, real-time annotation of raw sensor data and uses overlays as the cornerstone. This architecture is implemented and evaluated using Java SunSpot, AdvanticSys kits and Google App Engine. The third architectural solution is the enhancement to the data annotation architecture on two fronts. One is a heuristic-based genetic algorithm used for the selection of capable nodes for storing the base ontology. The second front is the extension to the proposed architecture to support ontology creation, distribution and management. The simulation results of the algorithm are presented and the ontology management extension is implemented and evaluated using a proof-of-concept prototype using Java SunSpot kit. As another contribution, an extensive state-of-the-art review is presented that introduces the basics of WSN virtualization and motivates its pertinence with carefully selected scenarios. This contribution substantially improves current state-of-the-art reviews in terms of the scope, motivation, details, and future research issues
Yu, Mengyao. "Exploitation des données issues d'études d'association pangénomiques pour caractériser les voies biologiques associées au risque génétique du prolapsus de la valve mitrale GWAS-driven gene-set analyses, genetic and functional follow-up suggest GLIS1 as a susceptibility gene for mitral valve prolapse Up-dated genome-wide association study and functional annotation reveal new risk loci for mitral valve prolapse". Thesis, Sorbonne Paris Cité, 2019. https://wo.app.u-paris.fr/cgi-bin/WebObjects/TheseWeb.woa/wa/show?t=2203&f=17890.
Pełny tekst źródłaMitral valve prolapse (MVP) is a common heart valve disease affecting nearly 1 in 40 individuals in the general population. It is the first indication for valve repair and/or replacement and moreover, a risk factor for mitral regurgitation, an established cause of endocarditis and sudden death. MVP is characterized by excess extracellular matrix secretion and cellular disorganization which leads to bulky valves that are unable to coapt correctly during ventricular systole. Even though several genes including FLNA, DCHS1 TNS1, and LMCD1 were reported to be associated with MVP, these explain partially its heritability. However, understanding the biological mechanisms underlying the genetic susceptibility to MVP is necessary to characterize its triggering mechanisms. In this thesis, I aimed 1) to characterize globally the biological mechanisms involved in the genetic risk for MVP in the context of genome-wide association studies (GWAS), and 2) improve the genotyping resolution using genetic imputation, which allowed the discovery of additional risk genes for MVP. In the first part of my study, I applied pathway enrichment tools (i-GSEA4GWAS, DEPICT) to the GWAS data. I was able to show that genes at risk loci are involved in biological functions relevant to actin filament organization, cytoskeleton biology, and cardiac development. The enrichment for positive regulation of transcription, cell proliferation, and migration motivated the follow-up of GLIS1, a transcription factor that regulates Hedgehog signalling. I followed up the association with MVP in a dataset of cases and controls from the UK Biobank and, in combination with previously available data, I found a genome-wide significant association with MVP (OR=1.22, P=4.36 ×10-10). Through collaborative efforts, immunohistochemistry experiments in mouse indicated that Glis1 is expressed during embryonic development predominantly in nuclei of endothelial and interstitial cells of mitral valves, while Glis1 knockdown using morpholinos caused atrioventricular regurgitation in zebrafish. In the second part of my work, I generated larger genotyping datasets using a imputation based on Haplotyp Refernece Consortium and TOPMed, two large and highly dense imputation panels that were recently made available. I first compared the imputation accuracy between data using HRC and TopMED and found that both panels have low imputation accuracy for rare allele (MAF<0.01). However, the imputation accuracy increased with the input sample size for common variants (MAF>0.05), especially when genotyping platforms were harmonised. I was able to fine map established loci (e.g Chr 2) and also able to identify six novel and promising associated loci. All new loci are driven by common variants that I confirmed as high profile regulatory variants through an extensive computationally-based functional annotations at promising loci that pointed at several candidate genes for valve biology and development (e.g PDGFD and ACTN4). In summary, my PhD work applied up-to-data high throughput genetic association methods and functional enrichment and annotation to GWAS data. My results provide novel insights into the genetics, molecular and cellular basis of valve disease. Further genetic confirmation through replication, but also through biological experiments are expected to consolidate these statistically and computationally supported results
Nouvel, Damien. "Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale". Phd thesis, Université François Rabelais - Tours, 2012. http://tel.archives-ouvertes.fr/tel-00788630.
Pełny tekst źródłaCasallas, Rubby. "Objets historiques et annotations pour les environnements logiciels". Phd thesis, 1996. http://tel.archives-ouvertes.fr/tel-00004982.
Pełny tekst źródłaGuyet, Thomas. "Interprétation collaborative de séries temporelles. Application à des données de réanimation médicale". Phd thesis, 2007. http://tel.archives-ouvertes.fr/tel-00264145.
Pełny tekst źródłaDiallo, Gayo. "Une Architecture à base d'Ontologies pour la Gestion Unifiées des Données Structurées et non Structurées". Phd thesis, 2006. http://tel.archives-ouvertes.fr/tel-00221392.
Pełny tekst źródłaKannan, Sivakumar. "Molecular protein function prediction using sequence similarity-based and similarity-free approaches". Thèse, 2007. http://hdl.handle.net/1866/15681.
Pełny tekst źródłaM'Begnan, Nagnan Arthur. "Développement d'outils web de détection d'annotations manuscrites dans les imprimés anciens". Thèse, 2021. http://depot-e.uqtr.ca/id/eprint/9663/1/eprint9663.pdf.
Pełny tekst źródła