Rozprawy doktorskie na temat „Données sémantiques”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Données sémantiques”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Ait, Oubelli Lynda. "Transformations sémantiques pour l'évolution des modèles de données". Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0040.
Pełny tekst źródłaWhen developing a complex system, data models are the key to a successful engineering process because they contain and organize all the information manipulated by the different functions involved in system design. The fact that the data models evolve throughout the design raises problems of maintenance of the data already produced. Our work addresses the issue of evolving data models in a model-driven engineering environment (IDM). We focus on minimizing the impact of the evolution of the data model on the system development process in the specific area of space engineering. In the space industry, model-driven engineering (MDI) is a key area for modeling data exchange with satellites. When preparing a space mission, the associated data models are often updated and must be compared from one version to another. Thus, because of the growth of the changes, it becomes difficult to follow them. New methods and techniques to understand and represent the differences and commonalities between different versions of the model are essential. Recent research deals with the evolution process between the two architectural layers (M2 / M1) of the IDM. In this thesis, we have explored the use of the (M1 / M0) layers of the same architecture to define a set of complex operators and their composition that encapsulate both the evolution of the data model and the data migration. The use of these operators improves the quality of results when migrating data, ensuring the complete preservation of the information contained in the data. In the first part of this thesis, we focused on how to deal with structural differences during the evolution process. The proposed approach is based on the detection of differences and the construction of evolution operators. Then, we studied the performance of the model-based approach (MBD) on two space missions, named PHARAO and MICROSCOPE. Then, we presented a semantic observational approach to deal with the evolution of data models at M1 level. The main interest of the proposed approach is the transposition of the problem of accessibility of the information in a data model, into a problem of path in a labeled directed graph. The approach proved to be able to capture all the evolutions of a data model in a logical operator list instead of a non-exhaustive list of evolution operators. It is generic because, regardless of the type of input data model, if the data model is correctly interpreted to ldg and then project it onto a set of lts, we can check the conservation of the information
Folch, Helka. "Articuler les classifications sémantiques induites d'un domaine". Paris 13, 2002. http://www.theses.fr/2002PA132015.
Pełny tekst źródłaAseervatham, Sujeevan. "Apprentissage à base de Noyaux Sémantiques pour le Traitement de Données Textuelles". Phd thesis, Université Paris-Nord - Paris XIII, 2007. http://tel.archives-ouvertes.fr/tel-00274627.
Pełny tekst źródłaDans le cadre de cette thèse, nous nous intéressons principalement à deux axes.
Le premier axe porte sur l'étude des problématiques liées au traitement de données textuelles structurées par des approches à base de noyaux. Nous présentons, dans ce contexte, un noyau sémantique pour les documents structurés en sections notamment sous le format XML. Le noyau tire ses informations sémantiques à partir d'une source de connaissances externe, à savoir un thésaurus. Notre noyau a été testé sur un corpus de documents médicaux avec le thésaurus médical UMLS. Il a été classé, lors d'un challenge international de catégorisation de documents médicaux, parmi les 10 méthodes les plus performantes sur 44.
Le second axe porte sur l'étude des concepts latents extraits par des méthodes statistiques telles que l'analyse sémantique latente (LSA). Nous présentons, dans une première partie, des noyaux exploitant des concepts linguistiques provenant d'une source externe et des concepts statistiques issus de la LSA. Nous montrons qu'un noyau intégrant les deux types de concepts permet d'améliorer les performances. Puis, dans un deuxième temps, nous présentons un noyau utilisant des LSA locaux afin d'extraire des concepts latents permettant d'obtenir une représentation plus fine des documents.
Aseervatham, Sujeevan. "Apprentissage à base de noyaux sémantiques pour le traitement de données textuelles". Paris 13, 2007. https://theses.hal.science/tel-00274627.
Pełny tekst źródłaSemantic Kernel-based Machine Learning for Textual Data Processing. Since the early eighties, statistical methods and, more specifically, the machine learning for textual data processing have known a considerable growth of interest. This is mainly due to the fact that the number of documents to process is growing exponentially. Hus, expert-based methods have become too costly, losing the research focus to the profit of machine learning-based methods. In this thesis, we focus on two main issues. The first one is the processing of semi-structured textual data with kernel-based methods. We present,in this context,as emantic kernel for documents structured by sections under the XML format. This kernel captures these manticin formation with theuse of anexternal source of knowledge e. G. ,at hesaurus. Our kernel was evaluated on a medical document corpus with the UMLS thesaurus. It was ranked in the top ten of the best methods, according to the F1-score, among 44 algorithms at the 2007 CMC Medical NLP International Challenge. The second issue is the study of the use of latent concepts extracted by statistical methods such as the Latent Semantic Analysis (LSA). We present, in a first part, kernels based on linguistic concepts from external sources and on latent concepts of the LSA. We show that a kernel integrating both kinds of concepts improves the text categorization performances. Then, in a second part, we present a kernel that uses local LSAs to extract latent concepts. Local latent concepts are used to have a more finer representation of the documents
Castagliola, Carole. "Héritage et valuation dans les réseaux sémantiques pour les bases de données objets". Compiègne, 1991. http://www.theses.fr/1991COMPD363.
Pełny tekst źródłaPedraza, Linares Esperanza. "SGBD sémantiques pour un environnement bureautique : intégrité et gestion de transactions". Grenoble 1, 1988. http://tel.archives-ouvertes.fr/tel-00009437.
Pełny tekst źródłaCoquil, David. "Conception et Mise en Oeuvre de Proxies Sémantiques et Coopératifs". Lyon, INSA, 2006. http://theses.insa-lyon.fr/publication/2006ISAL0020/these.pdf.
Pełny tekst źródłaOne major issue related to the large-scale deployment of distributed information systems such as the Web is that of the efficient access to data for which caches are a possible solution. Web caches exist at the client level, at the server level, and on intermediate servers, the proxies. The conception and the implementation of efficient Web caches and especially proxies is the main focus of the thesis. Three performance improvement techniques are studied: replacement, prefetching and cooperation policies. Contrarily to traditional approaches that mainly us low-level parameters, we apply semantic catching techniques based on the indexing of documents and on analysis of user access patterns. Algorithms for the measurement of the usefulness of a document for a cache are detailed. This value called temperature is used to define a replacement policy and a prefetching heuristics. These techniques are used in a video server cache management application. A cooperative architecture based on the exchange of documents and of temperature monitoring results is defined. Another application of proxies and semantic catching is also presented in the context of content-based multimedia queries. Using previous research focused on integrating content-based queries with classical databases, we define a cooperative architecture dedicated to distributed content-based multimedia queries which basic components are cooperative proxies and semantic caches. Finally an application of temperature for the management of cache index for the members of theme-based virtual communities
Mokhtari, Noureddine. "Extraction et exploitation d'annotations sémantiques contextuelles à partir de texte". Nice, 2010. http://www.theses.fr/2010NICE4045.
Pełny tekst źródłaThis thesis falls within the framework of the European project SevenPro (Semantic Virtual Engineering Environment for Product Design) whose aim is to improve the engineering process of production in manufacturing companies, through acquisition, formalization and exploitation of knowledge. We propose a methodological approach and software for generating contextual semantic annotations from text. Our approach is based on ontologies and Semantic Web technologies. In the first part, we propose a model of the concept of "context" for the text. This modeling can be seen as a projection of various aspects of "context" covered by the definitions in literature. We also propose a model of contextual semantic annotations, with the definition of different types of contextual relationships that may exist in the text. Then, we propose a generic methodology for the generation of contextual semantic annotations based on domain ontology that operates at best with the knowledge contained in texts. The novelty in the methodology is that it uses language automatic processing techniques and grammar extraction (automatically generated) field relations, concepts and values of property in order to produce semantic annotations associated with contextual relations. In addition, we take into account the context of occurrence of semantic annotations for their generation. A system that supports this methodology has been implemented and evaluated
Lechervy, Alexis. "Apprentissage interactif et multi-classes pour la détection de concepts sémantiques dans les données multimédia". Phd thesis, Université de Cergy Pontoise, 2012. http://tel.archives-ouvertes.fr/tel-00781763.
Pełny tekst źródłaFrancis, Danny. "Représentations sémantiques d'images et de vidéos". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Pełny tekst źródłaRecent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Cailhol, Simon. "Planification interactive de trajectoire en Réalité Virtuelle sur la base de données géométriques, topologiques et sémantiques". Thesis, Toulouse, INPT, 2015. http://www.theses.fr/2015INPT0058/document.
Pełny tekst źródłaTo save time and money while designing new products, industry needs tools to design, test and validate the product using virtual prototypes. These virtual prototypes must enable to test the product at all Product Lifecycle Management (PLM) stages. Many operations in product’s lifecycle involve human manipulation of product components (product assembly, disassembly or maintenance). Cue to the increasing integration of industrial products, these manipulations are performed in cluttered environment. Virtual Reality (VR) enables real operators to perform these operations with virtual prototypes. This research work introduces a novel path planning architecture allowing collaboration between a VR user and an automatic path planning system. This architecture is based on an original environment model including semantic, topological and geometric information. The automatic path planning process split in two phases. First, coarse planning uses semantic and topological information. This phase defines a topological path. Then, fine planning uses semantic and geometric information to define a geometrical trajectory within the topological path defined by the coarse planning. The collaboration between VR user and automatic path planner is made of two modes: on one hand, the user is guided along a pre-computed path through a haptic device, on the other hand, the user can go away from the proposed solution and doing it, he starts a re-planning process. Efficiency and ergonomics of both interaction modes is improved thanks to control sharing methods. First, the authority of the automatic system is modulated to provide the user with a sensitive guidance while he follows it and to free the user (weakened guidance) when he explores possible better ways. Second, when the user explores possible better ways, his intents are predicted (thanks to geometrical data associated to topological elements) and integrated in the re-planning process to guide the coarse planning. This thesis is divided in five chapters. The first one exposes the industrial context that motivated this work. Following a description of environment modeling tools, the second chapter introduces the multi-layer environment model proposed. The third chapter presents the path planning techniques from robotics research and details the two phases path planning process developed. The fourth introduce previous work on interactive path planning and control sharing techniques before to describe the interaction modes and control sharing techniques involved in our interactive path planner. Finally, last chapter introduces the experimentations performed with our path planner and analyses their results
Belghaouti, Fethi. "Interopérabilité des systèmes distribués produisant des flux de données sémantiques au profit de l'aide à la prise de décision". Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL003.
Pełny tekst źródłaInternet is an infinite source of data coming from sources such as social networks or sensors (home automation, smart city, autonomous vehicle, etc.). These heterogeneous and increasingly large data can be managed through semantic web technologies, which propose to homogenize, link these data and reason above them, and data flow management systems, which mainly address the problems related to volume, volatility and continuous querying. The alliance of these two disciplines has seen the growth of semantic data stream management systems also called RSP (RDF Stream Processing Systems). The objective of this thesis is to allow these systems, via new approaches and "low cost" algorithms, to remain operational, even more efficient, even for large input data volumes and/or with limited system resources.To reach this goal, our thesis is mainly focused on the issue of "Processing semantic data streamsin a context of computer systems with limited resources". It directly contributes to answer the following research questions : (i) How to represent semantic data stream ? And (ii) How to deal with input semantic data when their rates and/or volumes exceed the capabilities of the target system ?As first contribution, we propose an analysis of the data in the semantic data streams in order to consider a succession of star graphs instead of just a success of andependent triples, thus preserving the links between the triples. By using this approach, we significantly impoved the quality of responses of some well known sampling algoithms for load-shedding. The analysis of the continuous query allows the optimisation of this solution by selection the irrelevant data to be load-shedded first. In the second contribution, we propose an algorithm for detecting frequent RDF graph patterns in semantic data streams.We called it FreGraPaD for Frequent RDF Graph Patterns Detection. It is a one pass algorithm, memory oriented and "low-cost". It uses two main data structures : A bit-vector to build and identify the RDF graph pattern, providing thus memory space optimization ; and a hash-table for storing the patterns.The third contribution of our thesis consists of a deterministic load-shedding solution for RSP systems, called POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). It uses very low-cost boolean operators, that we apply on the built binary patterns of the data and the continuous query inorder to determine which data is not relevant to be ejected upstream of the system. It guarantees a recall of 100%, reduces the system load and improves response time. Finally, in the fourth contribution, we propose Patorc (Pattern Oriented Compression for RSP systems). Patorc is an online compression toolfor RDF streams. It is based on the frequent patterns present in RDF data streams that factorizes. It is a data lossless compression solution whith very possible querying without any need to decompression.This thesis provides solutions that allow the extension of existing RSP systems and makes them able to scale in a bigdata context. Thus, these solutions allow the RSP systems to deal with one or more semantic data streams arriving at different speeds, without loosing their response quality while ensuring their availability, even beyond their physical limitations. The conducted experiments, supported by the obtained results show that the extension of existing systems with the new solutions improves their performance. They illustrate the considerable decrease in their engine’s response time, increasing their processing rate threshold while optimizing the use of their system resources
El, Haddadi Anass. "Fouille multidimensionnelle sur les données textuelles visant à extraire les réseaux sociaux et sémantiques pour leur exploitation via la téléphonie mobile". Toulouse 3, 2011. http://thesesups.ups-tlse.fr/1378/.
Pełny tekst źródłaCompetition is a fundamental concept of the liberal economy tradition that requires companies to resort to Competitive Intelligence (CI) in order to be advantageously positioned on the market, or simply to survive. Nevertheless, it is well known that it is not the strongest of the organizations that survives, nor the most intelligent, but rather, the one most adaptable to change, the dominant factor in society today. Therefore, companies are required to remain constantly on a wakeful state to watch for any change in order to make appropriate solutions in real time. However, for a successful vigil, we should not be satisfied merely to monitor the opportunities, but before all, to anticipate risks. The external risk factors have never been so many: extremely dynamic and unpredictable markets, new entrants, mergers and acquisitions, sharp price reduction, rapid changes in consumption patterns and values, fragility of brands and their reputation. To face all these challenges, our research consists in proposing a Competitive Intelligence System (CIS) designed to provide online services. Through descriptive and statistics exploratory methods of data, Xplor EveryWhere display, in a very short time, new strategic knowledge such as: the profile of the actors, their reputation, their relationships, their sites of action, their mobility, emerging issues and concepts, terminology, promising fields etc. The need for security in XPlor EveryWhere arises out of the strategic nature of information conveyed with quite a substantial value. Such security should not be considered as an additional option that a CIS can provide just in order to be distinguished from one another. Especially as the leak of this information is not the result of inherent weaknesses in corporate computer systems, but above all it is an organizational issue. With Xplor EveryWhere we completed the reporting service, especially the aspect of mobility. Lastly with this system, it's possible to: View updated information as we have access to our strategic database server in real-time, itself fed daily by watchmen. They can enter information at trade shows, customer visits or after meetings
Belghaouti, Fethi. "Interopérabilité des systèmes distribués produisant des flux de données sémantiques au profit de l'aide à la prise de décision". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLL003.
Pełny tekst źródłaInternet is an infinite source of data coming from sources such as social networks or sensors (home automation, smart city, autonomous vehicle, etc.). These heterogeneous and increasingly large data can be managed through semantic web technologies, which propose to homogenize, link these data and reason above them, and data flow management systems, which mainly address the problems related to volume, volatility and continuous querying. The alliance of these two disciplines has seen the growth of semantic data stream management systems also called RSP (RDF Stream Processing Systems). The objective of this thesis is to allow these systems, via new approaches and "low cost" algorithms, to remain operational, even more efficient, even for large input data volumes and/or with limited system resources.To reach this goal, our thesis is mainly focused on the issue of "Processing semantic data streamsin a context of computer systems with limited resources". It directly contributes to answer the following research questions : (i) How to represent semantic data stream ? And (ii) How to deal with input semantic data when their rates and/or volumes exceed the capabilities of the target system ?As first contribution, we propose an analysis of the data in the semantic data streams in order to consider a succession of star graphs instead of just a success of andependent triples, thus preserving the links between the triples. By using this approach, we significantly impoved the quality of responses of some well known sampling algoithms for load-shedding. The analysis of the continuous query allows the optimisation of this solution by selection the irrelevant data to be load-shedded first. In the second contribution, we propose an algorithm for detecting frequent RDF graph patterns in semantic data streams.We called it FreGraPaD for Frequent RDF Graph Patterns Detection. It is a one pass algorithm, memory oriented and "low-cost". It uses two main data structures : A bit-vector to build and identify the RDF graph pattern, providing thus memory space optimization ; and a hash-table for storing the patterns.The third contribution of our thesis consists of a deterministic load-shedding solution for RSP systems, called POL (Pattern Oriented Load-shedding for RDF Stream Processing systems). It uses very low-cost boolean operators, that we apply on the built binary patterns of the data and the continuous query inorder to determine which data is not relevant to be ejected upstream of the system. It guarantees a recall of 100%, reduces the system load and improves response time. Finally, in the fourth contribution, we propose Patorc (Pattern Oriented Compression for RSP systems). Patorc is an online compression toolfor RDF streams. It is based on the frequent patterns present in RDF data streams that factorizes. It is a data lossless compression solution whith very possible querying without any need to decompression.This thesis provides solutions that allow the extension of existing RSP systems and makes them able to scale in a bigdata context. Thus, these solutions allow the RSP systems to deal with one or more semantic data streams arriving at different speeds, without loosing their response quality while ensuring their availability, even beyond their physical limitations. The conducted experiments, supported by the obtained results show that the extension of existing systems with the new solutions improves their performance. They illustrate the considerable decrease in their engine’s response time, increasing their processing rate threshold while optimizing the use of their system resources
Bernard, Luc. "Développement d'un jeu de structures de données et de contraintes sémantiques pour la compilation(séparée) du langage ADA". Doctoral thesis, Universite Libre de Bruxelles, 1985. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/213624.
Pełny tekst źródłaPuget, Dominique. "Aspects sémantiques dans les Systèmes de Recherche d'Informations". Toulouse 3, 1993. http://www.theses.fr/1993TOU30139.
Pełny tekst źródłaZaidi, Houda. "Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes". Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1094/document.
Pełny tekst źródłaData quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns
Mecharnia, Thamer. "Approches sémantiques pour la prédiction de présence d'amiante dans les bâtiments : une approche probabiliste et une approche à base de règles". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG036.
Pełny tekst źródłaNowadays, Knowledge Graphs are used to represent all kinds of data and they constitute scalable and interoperable resources that can be used by decision support tools. The Scientific and Technical Center for Building (CSTB) was asked to develop a tool to help identify materials containing asbestos in buildings. In this context, we have created and populated the ASBESTOS ontology which allows the representation of building data and the results of diagnostics carried out in order to detect the presence of asbestos in the used products. We then relied on this knowledge graph to develop two approaches which make it possible to predict the presence of asbestos in products in the absence of the reference of the marketed product actually used.The first approach, called the hybrid approach, is based on external resources describing the periods when the marketed products are asbestos-containing to calculate the probability of the existence of asbestos in a building component. This approach addresses conflicts between external resources, and incompleteness of listed data by applying a pessimistic fusion approach that adjusts the calculated probabilities using a subset of diagnostics.The second approach, called CRA-Miner, is inspired by inductive logic programming (ILP) methods to discover rules from the knowledge graph describing buildings and asbestos diagnoses. Since the reference of specific products used during construction is never specified, CRA-Miner considers temporal data, ASBESTOS ontology semantics, product types and contextual information such as part-of relations to discover a set of rules that can be used to predict the presence of asbestos in construction elements.The evaluation of the two approaches carried out on the ASBESTOS ontology populated with the data provided by the CSTB show that the results obtained, in particular when the two approaches are combined, are quite promising
Zaidi, Houda. "Amélioration de la qualité des données : correction sémantique des anomalies inter-colonnes". Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1094.
Pełny tekst źródłaData quality represents a major challenge because the cost of anomalies can be very high especially for large databases in enterprises that need to exchange information between systems and integrate large amounts of data. Decision making using erroneous data has a bad influence on the activities of organizations. Quantity of data continues to increase as well as the risks of anomalies. The automatic correction of these anomalies is a topic that is becoming more important both in business and in the academic world. In this report, we propose an approach to better understand the semantics and the structure of the data. Our approach helps to correct automatically the intra-column anomalies and the inter-columns ones. We aim to improve the quality of data by processing the null values and the semantic dependencies between columns
Belaid, Nabil. "Modélisation de services et de workflows sémantiques à base d'ontologies de services et d'indexations". Phd thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2011. https://tel.archives-ouvertes.fr/tel-00605153.
Pełny tekst źródłaServices and workflows allow computer processing and information exchange. However, only information relevant to their computer management (storage, delivery, etc. ) is specified in the syntactic description languages such as WSDL, BPEL or XPDL. Indeed, these descriptions do not explicitly link the services and workflows to the implemented functions. To overcome these limitations, we propose an approach based on the definition of ontology of services (shared conceptualizations) and semantic indexations. Our proposal in ontology based databases to store and index the different services and workflows. The implementation of our approach is a prototype that enables to store, search, replace, reuse existing IT services and workflows and build new ones incrementally. This work is validated by being applied to the geological modeling field
Cayèré, Cécile. "Modélisation de trajectoires sémantiques et calcul de similarité intégrés à un ETL". Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS042.
Pełny tekst źródłaOver the last decade, we have seen a rise in popularity of mobile applications based on phone location. These applications collect mobility tracks which describe the movement of users overtime. In the DA3T regional project, we hypothesise that the analysis of tourists’ mobility tracks can help planners in the management and enhancement of tourist areas. The objective is to design methods and tools to help analyse these tracks. This thesis focuses on the processing of mobility tracks and proposes a modular platform for creating and executing processing chains on these data. Throughout the modules of a processing chain, the raw mobility track evolves into semantic trajectories. The contributions of this thesis are: (i) a multi-level and multi-aspect semantic trajectory model and (ii) two measures that compute the similarity between two semantic trajectories along spatial, temporal and thematic dimensions. Our model (i) is used as a transition model between modules of a processing chain. We tested it by instantiating semantic trajectories from different datasets of various domains. Our two measures (ii) are integrated in our platform as processing modules. These measures present originalities: one is the combination of sub-measures, each allowing to evaluate the similarity of trajectories on the three dimensions and according to three different levels of granularity, the other is the combination of two bidimensional sub-measures centred around a particular dimension. We evaluated our two measures by comparing them to other measures and to the opinion of geographers
Savonnet, Marinette. "Systèmes d'Information Scientifique : des modèles conceptuels aux annotations sémantiques Application au domaine de l'archéologie et des sciences du vivant". Habilitation à diriger des recherches, Université de Bourgogne, 2013. http://tel.archives-ouvertes.fr/tel-00917782.
Pełny tekst źródłaGhederim, Alexandra. "Une extension des modèles sémantiques par un ordre sur les attributs : application à la migration de schémas relationnels vers des schémas orientés objet". Lyon 1, 1996. http://www.theses.fr/1996LYO10303.
Pełny tekst źródłaPérinet, Amandine. "Analyse distributionnelle appliquée aux textes de spécialité : réduction de la dispersion des données par abstraction des contextes". Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD056/document.
Pełny tekst źródłaIn specialised domains, the applications such as information retrieval for machine translation rely on terminological resources for taking into account terms or semantic relations between terms or groupings of terms. In order to face up to the cost of building these resources, automatic methods have been proposed. Among those methods, the distributional analysis uses the repeated information in the contexts of the terms to detect a relation between these terms. While this hypothesis is usually implemented with vector space models, those models suffer from a high number of dimensions and data sparsity in the matrix of contexts. In specialised corpora, this contextual information is even sparser and less frequent because of the smaller size of the corpora. Likewise, complex terms are usually ignored because of their very low number of occurrences. In this thesis, we tackle the problem of data sparsity on specialised texts. We propose a method that allows making the context matrix denser, by performing an abstraction of distributional contexts. Semantic relations acquired from corpora are used to generalise and normalise those contexts. We evaluated the method robustness on four corpora of different sizes, different languages and different domains. The analysis of the results shows that, while taking into account complex terms in distributional analysis, the abstraction of distributional contexts leads to defining semantic clusters of better quality, that are also more consistent and more homogeneous
Fauconnier, Jean-Philippe. "Acquisition de liens sémantiques à partir d'éléments de mise en forme des textes : exploitation des structures énumératives". Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30023.
Pełny tekst źródłaThe past decade witnessed significant advances in the field of relation extraction from text, facilitating the building of lexical or semantic resources. However, the methods proposed so far (supervised learning, kernel methods, distant supervision, etc.) don't fully exploit the texts : they are usually applied at the sentential level and they don't take into account the layout and the formatting of texts. In such a context, this thesis aims at expanding those methods and makes them layout-aware for extracting relations expressed beyond sentence boundaries. For this purpose, we rely on the semantics conveyed by typographical (bullets, emphasis, etc.) and dispositional (visual indentations, carriage returns, etc.) features. Those features often substitute purely discursive formulations. In particular, the study reported here is dealing with the relations carried by the vertical enumerative structures. Although they display discontinuities between their various components, the enumerative structures can be dealt as a whole at the semantic level. They form textual structures prone to hierarchical relations. This study was divided into two parts. (i) The first part describes a model representing the hierarchical structure of documents. This model is falling within the theoretical framework representing the textual architecture : an abstraction of the layout and the formatting, as well as a strong connection with the rhetorical structure are achieved. However, our model focuses primarily on the efficiency of the analysis process rather than on the expressiveness of the representation. A bottom-up method intended for building automatically this model is presented and evaluated on a corpus of PDF documents. (ii) The second part aims at integrating this model into the process of relation extraction. In particular, we focused on vertical enumerative structures. A multidimensional typology intended for characterizing those structures was established and used into an annotation task. Thanks to corpus-based observations, we proposed a two-step method, by supervised learning, for qualifying the nature of the relation and identifying its arguments. The evaluation of our method showed that exploiting the formatting and the layout of documents, in combination with standard lexico-syntactic features, improves those two tasks
Belabbess, Badre. "Automatisation de détections d'anomalies en temps réel par combinaison de traitements numériques et sémantiques". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC2180/document.
Pełny tekst źródłaComputer systems involving anomaly detection are emerging in both research and industry. Thus, fields as varied as medicine (identification of malignant tumors), finance (detection of fraudulent transactions), information technologies (network intrusion detection) and environment (pollution situation detection) are widely impacted. Machine learning offers a powerful set of approaches that can help solve these use cases effectively. However, it is a cumbersome process with strict rules that involve a long list of tasks such as data analysis and cleaning, dimension reduction, sampling, algorithm selection, optimization of hyper-parameters. etc. It also involves several experts who will work together to find the right approaches. In addition, the possibilities opened today by the world of semantics show that it is possible to take advantage of web technologies to reason intelligently on raw data to extract information with high added value. The lack of systems combining numeric approaches to machine learning and semantic techniques of the web of data is the main motivation behind the various works proposed in this thesis. Finally, the anomalies detected do not necessarily mean abnormal situations in reality. Indeed, the presence of external information could help decision-making by contextualizing the environment as a whole. Exploiting the space domain and social networks makes it possible to build contexts enriched with sensor data. These spatio-temporal contexts thus become an integral part of anomaly detection and must be processed using a Big Data approach.In this thesis, we present three systems with different architectures, each focused on an essential element of big data, real-time, semantic web and machine learning ecosystems:WAVES: Big Data platform for real-time analysis of RDF data streams captured from dense networks of IoT sensors. Its originality lies in its ability to reason intelligently on raw data in order to infer implicit information from explicit information and assist in decision-making. This platform was developed as part of a FUI project whose main use case is the detection of anomalies in a drinking water network. RAMSSES: Hybrid machine learning system whose originality is to combine advanced numerical approaches as well as proven semantic techniques. It has been specifically designed to remove the heavy burden of machine learning that is time-consuming, complex, error-prone, and often requires a multi-disciplinary team. SCOUTER: Intelligent system of "web scrapping" allowing the contextualization of singularities related to the Internet of Things by exploiting both spatial information and the web of data
Lefrançois, Maxime. "Représentation des connaissances sémantiques lexicales de la Théorie Sens-Texte : conceptualisation, représentation, et opérationnalisation des définitions lexicographiques". Phd thesis, Université Nice Sophia Antipolis, 2014. http://tel.archives-ouvertes.fr/tel-01071945.
Pełny tekst źródłaAnia, Briseño Ignacio de Jesús. "Bases d'objets : une infrastructure de représentation de connaissances pour la gestion de données en CAO". Grenoble INPG, 1988. http://tel.archives-ouvertes.fr/tel-00326591.
Pełny tekst źródłaYahaya, Alassan Mahaman Sanoussi. "Amélioration du système de recueils d'information de l'entreprise Semantic Group Company grâce à la constitution de ressources sémantiques". Thesis, Paris 10, 2017. http://www.theses.fr/2017PA100086/document.
Pełny tekst źródłaTaking into account the semantic aspect of the textual data during the classification task has become a real challenge in the last ten years. This difficulty is in addition to the fact that most of the data available on social networks are short texts, which in particular results in making methods based on the "bag of words" representation inefficient. The approach proposed in this research project is different from the approaches proposed in previous work on the enrichment of short messages for three reasons. First, we do not use external knowledge like Wikipedia because typically short messages that are processed by the company come from specific domains. Secondly, the data to be processed are not used for the creation of resources because of the operation of the tool. Thirdly, to our knowledge there is no work on the one hand, which uses structured data such as the company's data to constitute semantic resources, and on the other hand, which measure the impact of enrichment on a system Interactive grouping of text flows. In this thesis, we propose the creation of resources enabling to enrich the short messages in order to improve the performance of the tool of the semantic grouping of the company Succeed Together. The tool implements supervised and unsupervised classification methods. To build these resources, we use sequential data mining techniques
Pierens, Matthieu. "Les sentiments négatifs à travers les siècles : l'évolution des champs sémantiques de la colère, de la peur et de la douleur en français dans la base textuelle FRANTEXT (1500-2000)". Paris 7, 2014. http://www.theses.fr/2014PA070015.
Pełny tekst źródłaThis thesis deals with the evolution of semantic fields of anger, fear and pain throughout the whole FRANTEXT textual database from the 16th to the end of the 20th century. To do so, we have conducted a diachronic study of lexemes in these fields and the three fields considered in their entirety by adopting a periodization of half a century. For each of the 39 lexemes, we have presented the evolution of its frequency, the perception of affect by language users, the nature of the experiencer, of the causes, the symptoms and the most salient metaphors, relying on the study of collocations and the most significant co-occurrences. We have shown that the range of lexemes vaiy greatly according to the era and the genre whenever it concerns emotional symptoms or metaphors / metonymies expressing intensity, appearance or control. This variability can be explained by socio-cultural changes that seem most likely to account for the ongoing reconfiguration of the system of affects. In addition, our study has also emphasized the heuristic value of semantic fields and highlighted the large variability in their frequency and their mutual relations. Finally, regarding meaning change, we have proposed a descriptive model reflecting the changes in the combinatorial of the word (prototypical vs. Peripherical uses) depending on whether its overall frequency in the corpus increases or decreases in the context of ma:or historical 'aces characterizing the evolution of the field in question
Abergel, Violette. "Relevé numérique d’art pariétal : définition d’une approche innovante combinant propriétés géométriques, visuelles et sémantiques au sein d’un environnement de réalité mixte". Thesis, Paris, HESAM, 2020. http://www.theses.fr/2020HESAE021.
Pełny tekst źródłaThe advances of the last decades in the fields of computer science and metrology have led to the development of efficient measurement tools allowing the digitization of the environment. Although digital technology has not fundamentally overhauled the principles of metric measurement, the improvement of their accuracy, automation and storage capacity has, on the other hand, been a decisive development in many fields. In the case of rock art surveying, their introduction has allowed a massive gathering of 2D and 3D data, meeting various needs for study, monitoring, documentation, archiving, or dissemination. These data provide new and valuable supports for the understanding of the objects of study, in particular concerning their morphological characterization. However, in spite of their great potentials, they often remain under-exploited due to the lack of tools facilitating their manipulation, analysis, and semantic enrichment in multidisciplinary study contexts. Moreover, these methods tend to relegate the cognitive and analytical engagement of the observer behind the measurement tool, causing a deep break between on-site study moments and all off-site processing, or in other words, between real and virtual work environments.This thesis proposes to address these problems by defining an integrated approach allowing the fusion of the geometric, visual and semantic aspects of surveying within a single multimodal mixed reality environment. At the crossroads of the fields of heritage information systems and mixed reality, our goal is to ensure an informational continuity between in situ and ex situ analysis activities. This study led to the development of a functional proof of concept allowing the visualization of 2D and 3D digital data from surveys and their semantic annotation in augmented reality through a web interface
Mazoyer, Béatrice. "Social Media Stories. Event detection in heterogeneous streams of documents applied to the study of information spreading across social and news media". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASC009.
Pełny tekst źródłaSocial Media, and Twitter in particular, has become a privileged source of information for journalists in recent years. Most of them monitor Twitter, in the search for newsworthy stories. This thesis aims to investigate and to quantify the effect of this technological change on editorial decisions. Does the popularity of a story affects the way it is covered by traditional news media, regardless of its intrinsic interest?To highlight this relationship, we take a multidisciplinary approach at the crossroads of computer science and economics: first, we design a novel approach to collect a representative sample of 70% of all French tweets emitted during an entire year. Second, we study different types of algorithms to automatically discover tweets that relate to the same stories. We test several vector representations of tweets, looking at both text and text-image representations, Third, we design a new method to group together Twitter events and media events. Finally, we design an econometric instrument to identify a causal effect of the popularity of an event on Twitter on its coverage by traditional media. We show that the popularity of a story on Twitter does have an effect on the number of articles devoted to it by traditional media, with an increase of about 1 article per 1000 additional tweets
Triperina, Evangelia. "Visual interactive knowledge management for multicriteria decision making and ranking in linked open data environments". Thesis, Limoges, 2020. http://www.theses.fr/2020LIMO0010.
Pełny tekst źródłaThe dissertation herein involves research in the field of the visual representations aided by semantic technologies and ontologies in order to support decisions and policy making procedures, in the framework of research and academic information systems. The visualizations will be also supported by data mining and knowledge extraction processes in the linked data environment. To elaborate, visual analytics’ techniques will be employed for the organization of the visualizations in order to present the information in such a way that will utilize the human perceptual abilities and that will eventually assist the decision support and policy making procedures. Furthermore, the visual representation and consequently the decision and policy making processes will be ameliorated by the means of the semantic technologies based on conceptual models in the form of ontologies. Thus, the main objective of the proposed doctoral thesis consists the combination of the key semantic technologies with interactive visualisations techniques based mainly on graph’s perception in order to make decision support systems more effective. The application field will be the research and academic information systems
Sy, Mohameth François. "Utilisation d'ontologies comme support à la recherche et à la navigation dans une collection de documents". Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20211/document.
Pełny tekst źródłaDomain ontologies provide a knowledge model where the main concepts of a domain are organized through hierarchical relationships. In conceptual Information Retrieval Systems (IRS), where they are used to index documents as well as to formulate a query, their use allows to overcome some ambiguities of classical IRSs based on natural language processes.One of the contributions of this study consists in the use of ontologies within IRSs, in particular to assess the relevance of documents with respect to a given query. For this matching process, a simple and intuitive aggregation approach is proposed, that incorporates user dependent preferences model on one hand, and semantic similarity measures attached to a domain ontology on the other hand. This matching strategy allows justifying the relevance of the results to the user. To complete this explanation, semantic maps are built, to help the user to grasp the results at a glance. Documents are displayed as icons that detail their elementary scores. They are organized so that their graphical distance on the map reflects their relevance to a query represented as a probe. As Information Retrieval is an iterative process, it is necessary to involve the users in the control loop of the results relevancy in order to better specify their information needs. Inspired by experienced strategies in vector models, we propose, in the context of conceptual IRS, to formalize ontology based relevance feedback. This strategy consists in searching a conceptual query that optimizes a tradeoff between relevant documents closeness and irrelevant documents remoteness, modeled through an objective function. From a set of concepts of interest, a heuristic is proposed that efficiently builds a near optimal query. This heuristic relies on two simple properties of semantic similarities that are proved to ensure semantic neighborhood connectivity. Hence, only an excerpt of the ontology dag structure is explored during query reformulation.These approaches have been implemented in OBIRS, our ontological based IRS and validated in two ways: automatic assessment based on standard collections of tests, and case studies involving experts from biomedical domain
Gaignard, Alban. "Partage et production de connaissances distribuées dans des plateformes scientifiques collaboratives". Phd thesis, Université de Nice Sophia-Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00827926.
Pełny tekst źródłaMidouni, Sid Ahmed Djallal. "Une approche orientée service pour la recherche sémantique de contenus multimédias". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEI056/document.
Pełny tekst źródłaMultimedia data sources from various fields (medical, tourism, trade, art and culture, etc.) became essential on the web. Accessing to multimedia data in distributed systems poses new challenges due to many system parameters: volume, diversity of interfaces, representation format, location, etc. In addition, the growing needs of users and applications to incorporate semantics in the information retrieval pose new issues. To take into account this new complexity, we are interested in our research of data integration solutions based on web services. In this thesis, we propose an approach-oriented service for the semantic search of multimedia content. We called this approach SeSaM (Semantic Search of Multimedia content). SeSaM is based on the definition of a new pattern of services to access multimedia content, which is the MaaS services (Multimedia as a Services). It is based on a two-phase process: description and discovery of MaaS services. As for the MaaS services description, we have defined the SA4MaaS language (Semantic Annotation for MaaS services), which is an extension of SAWSDL (W3C recommendation). The main idea of this language is the integration, in addition to business domain semantic, of multimedia information semantics in the MaaS services description. As for the MaaS service discovery, we have proposed a new matchmaker MaaS-MX (MaaS services Matchmaker) adapted to the MaaS services description model. MaaS-MX is composed of two essential steps: domain matching and multimedia matching. Domain matching consists in comparing the business domain description of MaaS services and the query, whereas multimedia matching compares the multimedia description of MaaS services and the query. The approach has been implemented and evaluated in two different domains: medical and tourism. The results indicate that using both domain and multimedia matching considerably improves the performance of multimedia data retrieving systems
Lebboss, Georges. "Contribution à l’analyse sémantique des textes arabes". Thesis, Paris 8, 2016. http://www.theses.fr/2016PA080046/document.
Pełny tekst źródłaThe Arabic language is poor in electronic semantic resources. Among those resources there is Arabic WordNet which is also poor in words and relationships.This thesis focuses on enriching Arabic WordNet by synsets (a synset is a set of synonymous words) taken from a large general corpus. This type of corpus does not exist in Arabic, so we had to build it, before subjecting it to a number of pretreatments.We developed, Gilles Bernard and myself, a method of word vectorization called GraPaVec which can be used here. I built a system which includes a module Add2Corpus, pretreatments, word vectorization using automatically generated frequency patterns, which yields a data matrix whose rows are the words and columns the patterns, each component representing the frequency of a word in a pattern.The word vectors are fed to the neural model Self Organizing Map (SOM) ;the classification produced constructs synsets. In order to validate the method, we had to create a gold standard corpus (there are none in Arabic for this area) from Arabic WordNet, and then compare the GraPaVec method with Word2Vec and Glove ones. The result shows that GraPaVec gives for this problem the best results with a F-measure 25 % higher than the others. The generated classes will be used to create new synsets to be included in Arabic WordNet
Valceschini-Deza, Nathalie. "Accès sémantique aux bases de données textuelles". Nancy 2, 1999. http://www.theses.fr/1999NAN21021.
Pełny tekst źródłaAssele, Kama Ariane. "Interopérabilité sémantique et entreposage de données cliniques". Paris 6, 2013. http://www.theses.fr/2013PA066359.
Pełny tekst źródłaIn medicine, data warehouses allow to integrate various data sources for decisional analysis. The integrated data often come from distributed and heterogeneous sources, in order to provide an overview of information to analysts and deciders. The clinical data warehousing raises the issue of medical knowledge representation constantly evolving, requiring the use of new methodologies to integrate the semantic dimension of the study domain. The storage problem is related to the complexity of the field to describe and model, but more importantly, to the need to combine domain knowledge with data. Therefore, one of the research topics in the field of data warehouses is about the cohabitation of knowledge and data, and the role of ontologies in data warehouse modeling, data integration and data mining. This work, carried out in an INSERM research laboratory specialized in knowledge health engineering (UMRS 872 EQ20), is part of issue on modeling, sharing and clinical data use, within a semantic interoperability platform. To address this issue, we support the thesis that: (i) the integration of a standardized information model with a knowledge model allows to implement semantic data warehouses in order to optimize the data use; (ii) the use of terminological and ontological resources aids the interconnection of distributed and heterogeneous resources; (iii) data representation impact its exploitation and helps to optimization of decision support systems (e. G. Monitoring tools). Using innovative methods and Semantic Web tools, we have optimized the integration and exploitation of clinical data for the implementation of a monitoring system to assess the evolution of bacterial resistance to antibiotics in Europe. As a first step, we defined the multidimensional model of a semantic data warehouse based on existing standards such as HL7. We subsequently articulated these data with domain knowledge of infectious diseases. For this, we have represented the data across their structure, vocabulary and semantics in an ontology called « data definition ontology », to map data to the domain ontology via mapping rules. We proposed a method for semi-automatic generation of « data definition ontology » from a database schema, using existing tools and projects results. Finally, the data warehouse and semantic resources are accessed and used via a semantic interoperability system developed in the framework of the DebugIT European project (Detecting and Eliminating Bacteria UsinG Information Technology), that we have experimented within the G. Pompidou university hospital (HEGP, France)
Assouroko, Ibrahim. "Gestion de données et dynamiques des connaissances en ingénierie numérique : contribution à l'intégration de l'ingénierie des exigences, de la conception mécanique et de la simulation numérique". Compiègne, 2012. http://www.theses.fr/2012COMP2030.
Pełny tekst źródłaOver the last twenty years, the deep changes noticed in the field of product development, led to methodological change in the field of design. These changes have, in fact, benefited from the significant development of Information and Communication Technologies (ICT) (such as PLM systems dedicated to the product lifecycle management), and from collaborative engineering approaches, playing key role in the improvement of product development process (PDP). In the current PLM market, PLM solutions from different vendors still present strong heterogeneities, and remain on proprietary technologies and formats for competitiveness and profitability reasons, what does not ease communication and sharing between various ICTs contributing to the PDP. Our research work focuses on PDP, and aims to contribute to the improvement of the integrated management of mechanical design and numerical simulation data in a PLM context. The research contribution proposes an engineering knowledge capitalization solution based on a product semantic relationship management approach, organized as follows : (1) a data structuring approach driven by so called semi-structured entities with a structure able to evolve along the PDP, (2) a conceptual model describing the fundamental concepts of the proposed approach, (3) a methodology that facilitates and improves the management and reuse of engineering knowledge within design project, and (4) a knowledge capitalization approach based on the management of semantic relationships that exist or may exist between engineering entities within the product development process
Bennara, Mahdi. "Linked service integration on the semantic web". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI055.
Pełny tekst źródłaService Oriented Computing allows interoperability between distributed systems. In the last years, the emergence of the semantic Web opened new challenges for the research community regarding semantic interoperability on the data and processing levels. The convergence of service orientation and the semantic Web together is a promising effort to solve the problems that hampered both research fields. On the one hand, service orientation allows interoperability on the data and processing levels, and on the other hand, semantic Web allows the automation of high-level service manipulation tasks. In our research, we detail the challenges encountered by the research community to integrate the service orientation practices with the semanticWeb, more precisely, integrating REST-based services with the semantic Web implementation based on Linked Data principles to obtain RESTful Linked Services. The challenges in question are : description, discovery, selection and composition. We proposed a solution for each of these challenges. The contributions we proposed are : The descriptor structure, a semantically-enabled discovery algorithm, a Skyline-based selection algorithm and composition directories. We think that these contributions can be adopted by service providers on the Web in order to allow a seamless integration of semantic Web practices with the service technologies and REST in particular. This allows the automation of high-level service manipulation tasks, such as semantically-enabled discovery, QoS-based selection and the composition of heterogeneous services, be it on the data or processing level, in order to create value-added composite services
Khammaci, Tahar. "Contribution à l'étude du processus de développement de logiciels : assistance à base de connaissance et modélisation des objets logiciels". Nancy 1, 1991. http://www.theses.fr/1991NAN10287.
Pełny tekst źródłaChoquet, Rémy. "Partage de données biomédicales : modèles, sémantique et qualité". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2011. http://tel.archives-ouvertes.fr/tel-00824931.
Pełny tekst źródłaSaïs, Fatiha. "Intégration sémantique de données guidée par une ontologie". Paris 11, 2007. http://www.theses.fr/2007PA112300.
Pełny tekst źródłaThis thesis deals with semantic data integration guided by an ontology. Data integration aims at combining autonomous and heterogonous data sources. To this end, all the data should be represented according to the same schema and according to a unified semantics. This thesis is divided into two parts. In the first one, we present an automatic and flexible method for data reconciliation with an ontology. We consider the case where data are represented in tables. The reconciliation result is represented in the SML format which we have defined. Its originality stems from the fact that it allows representing all the established mappings but also information that is imperfectly identified. In the second part, we present two methods of reference reconciliation. This problem consists in deciding whether different data descriptions refer to the same real world entity. We have considered this problem when data is described according to the same schema. The first method, called L2R, is logical: it translates the schema and the data semantics into a set of logical rules which allow inferring correct decisions both of reconciliation and no reconciliation. The second method, called N2R, is numerical. It translates the schema semantics into an informed similarity measure used by a numerical computation of the similarity of the reference pairs. This computation is expressed in a non linear equation system solved by using an iterative method. Our experiments on real datasets demonstrated the robustness and the feasibility of our approaches. The solutions that we bring to the two problems of reconciliation are completely automatic and guided only by an ontology
Pantin, Jérémie. "Détection et caractérisation sémantique de données textuelles aberrantes". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS347.pdf.
Pełny tekst źródłaMachine learning answers to the problem of handling dedicated tasks with a wide variety of data. Such algorithms can be either simple or difficult to handle depending of the data. Low dimensional data (2-dimension or 3-dimension) with an intuitive representation (average of baguette price by years) are easier to interpret/explain for a human than data with thousands of dimensions. For low dimensional data, the error leads to a significant shift against normal data, but for the case of high dimensional data it is different. Outlier detection (or anomaly detection, or novelty detection) is the study of outlying observations for detecting what is normal and abnormal. Methods that perform such task are algorithms, methods or models that are based on data distributions. Different families of approaches can be found in the literature of outlier detection, and they are mainly independent of ground truth. They perform outlier analysis by detecting the principal behaviors of majority of observations. Thus, data that differ from normal distribution are considered noise or outlier. We detail the application of outlier detection with text. Despite recent progress in natural language processing, computer still lack profound understanding of human language in absence of information. For instance, the sentence "A smile is a curve that set everything straight" has several levels of understanding and a machine can encounter hardship to chose the right level of lecture. This thesis presents the analysis of high-dimensional outliers, applied to text. Recent advances in anomaly detection and outlier detection are not significantly represented with text data and we propose to highlight the main differences with high-dimensional outliers. We also approach ensemble methods that are nearly nonexistent in the literature for our context. Finally, an application of outlier detection for elevate results on abstractive summarization is conducted. We propose GenTO, a method that prepares and generates split of data in which anomalies and outliers are inserted. Based on this method, evaluation and benchmark of outlier detection approaches is proposed with documents. The proposed taxonomy allow to identify difficult and hierarchised outliers that the literature tackles without knowing. Also, learning without supervision often leads models to rely in some hyperparameter. For instance, Local Outlier Factor relies to the k-nearest neighbors for computing the local density. Thus, choosing the right value for k is crucial. In this regard, we explore the influence of such parameter for text data. While choosing one model can leads to obvious bias against real-world data, ensemble methods allow to mitigate such problem. They are particularly efficient with outlier analysis. Indeed, the selection of several values for one hyperparameter can help to detect strong outliers.Importance is then tackled and can help a human to understand the output of black box model. Thus, the interpretability of outlier detection models is questioned. We find that for numerous dataset, a low number of features can be selected as oracle. The association of complete models and restrained models helps to mitigate the black-box effect of some approaches. In some cases, outlier detection refers to noise removal or anomaly detection. Some applications can benefit from the characteristic of such task. Mail spam detection and fake news detection are one example, but we propose to use outlier detection approaches for weak signal exploration in marketing project. Thus, we find that the model of the literature help to improve unsupervised abstractive summarization, and also to find weak signals in text
Ben, salem Aïcha. "Qualité contextuelle des données : détection et nettoyage guidés par la sémantique des données". Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015USPCD054/document.
Pełny tekst źródłaNowadays, complex applications such as knowledge extraction, data mining, e-learning or web applications use heterogeneous and distributed data. The quality of any decision depends on the quality of the used data. The absence of rich, accurate and reliable data can potentially lead an organization to make bad decisions.The subject covered in this thesis aims at assisting the user in its quality ap-proach. The goal is to better extract, mix, interpret and reuse data. For this, the data must be related to its semantic meaning, data types, constraints and comments.The first part deals with the semantic schema recognition of a data source. This enables the extraction of data semantics from all the available information, inculding the data and the metadata. Firstly, it consists of categorizing the data by assigning it to a category and possibly a sub-category, and secondly, of establishing relations between columns and possibly discovering the semantics of the manipulated data source. These links detected between columns offer a better understanding of the source and the alternatives for correcting data. This approach allows automatic detection of a large number of syntactic and semantic anomalies.The second part is the data cleansing using the reports on anomalies returned by the first part. It allows corrections to be made within a column itself (data homogeni-zation), between columns (semantic dependencies), and between lines (eliminating duplicates and similar data). Throughout all this process, recommendations and analyses are provided to the user
Tran, Ba-Huy. "Une approche sémantique pour l’exploitation de données environnementales : application aux données d’un observatoire". Thesis, La Rochelle, 2017. http://www.theses.fr/2017LAROS025.
Pełny tekst źródłaThe need to collect long-term observations for research on environmental issues led to the establishment of "Zones Ateliers" by the CNRS. Thus, for several years, many databases of a spatio-temporal nature are collected by different teams of researchers. To facilitate transversal analysis of different observations, it is desirable to cross-reference information from these data sources. Nevertheless, these sources are constructed independently of each other, which raise problems of data heterogeneity in the analysis.Therefore, this thesis proposes to study the potentialities of ontologies as both objects of modeling, inference, and interoperability. The aim is to provide experts in the field with a suitable method for exploiting heterogeneous data. Being applied in the environmental domain, ontologies must take into account the spatio-temporal characteristics of these data. As the need for modeling concepts and spatial and temporal operators, we rely on the solution of reusing the ontologies of time and space. Then, a spatial-temporal data integration approach with a reasoning mechanism on the relations of these data has been introduced. Finally, data mining methods have been adapted to spatio-temporal RDF data to discover new knowledge from the knowledge-base. The approach was then applied within the Geminat prototype, which aims to help understand farming practices and their relationships with the biodiversity in the "zone atelier Plaine and Val de Sèvre". From data integration to knowledge analysis, it provides the necessary elements to exploit heterogeneous spatio-temporal data as well as to discover new knowledge
Nachabe, Ismail Lina. "Automatic sensor discovery and management to implement effective mechanism for data fusion and data aggregation". Thesis, Evry, Institut national des télécommunications, 2015. http://www.theses.fr/2015TELE0021/document.
Pełny tekst źródłaThe constant evolution of technology in terms of inexpensive and embedded wireless interfaces and powerful chipsets has leads to the massive usage and development of wireless sensor networks (WSNs). This potentially affects all aspects of our lives ranging from home automation (e.g. Smart Buildings), passing through e-Health applications, environmental observations and broadcasting, food sustainability, energy management and Smart Grids, military services to many other applications. WSNs are formed of an increasing number of sensor/actuator/relay/sink devices, generally self-organized in clusters and domain dedicated, that are provided by an increasing number of manufacturers, which leads to interoperability problems (e.g., heterogeneous interfaces and/or grounding, heterogeneous descriptions, profiles, models …). Moreover, these networks are generally implemented as vertical solutions not able to interoperate with each other. The data provided by these WSNs are also very heterogeneous because they are coming from sensing nodes with various abilities (e.g., different sensing ranges, formats, coding schemes …). To tackle this heterogeneity and interoperability problems, these WSNs’ nodes, as well as the data sensed and/or transmitted, need to be consistently and formally represented and managed through suitable abstraction techniques and generic information models. Therefore, an explicit semantic to every terminology should be assigned and an open data model dedicated for WSNs should be introduced. SensorML, proposed by OGC in 2010, has been considered an essential step toward data modeling specification in WSNs. Nevertheless, it is based on XML schema only permitting basic hierarchical description of the data, hence neglecting any semantic representation. Furthermore, most of the researches that have used semantic techniques for developing their data models are only focused on modeling merely sensors and actuators (this is e.g. the case of SSN-XG). Other researches dealt with data provided by WSNs, but without modelling the data type, quality and states (like e.g. OntoSensor). That is why the main aim of this thesis is to specify and formalize an open data model for WSNs in order to mask the aforementioned heterogeneity and interoperability between different systems and applications. This model will also facilitate the data fusion and aggregation through an open management architecture like environment as, for example, a service oriented one. This thesis can thus be split into two main objectives: 1)To formalize a semantic open data model for generically describing a WSN, sensors/actuators and their corresponding data. This model should be light enough to respect the low power and thus low energy limitation of such network, generic for enabling the description of the wide variety of WSNs, and extensible in a way that it can be modified and adapted based on the application. 2)To propose an upper service model and standardized enablers for enhancing sensor/actuator discovery, data fusion, data aggregation and WSN control and management. These service layer enablers will be used for improving the data collection in a large scale network and will facilitate the implementation of more efficient routing protocols, as well as decision making mechanisms in WSNs
Nguyen, Thanh Binh. "L'interrogation du web de données garantissant des réponses valides par rapport à des critères donnés". Thesis, Orléans, 2018. http://www.theses.fr/2018ORLE2053/document.
Pełny tekst źródłaThe term Linked Open Data (LOD) is proposed the first time by Tim Berners-Lee since 2006.Since then, LOD has evolved impressively with thousands datasets on the Web of Data, which has raised a number of challenges for the research community to retrieve and to process LOD.In this thesis, we focus on the problem of quality of retrieved data from various sources of the LOD and we propose a context-driven querying system that guarantees the quality of answers with respect to the quality context defined by users. We define a fragment of constraints and propose two approaches: the naive and the rewriting, which allows us to filter dynamically valid answers at the query time instead of validating them at the data source level. The naive approach performs the validation process by generating and evaluating sub-queries for each candidate answer w.r.t. each constraint. While the rewriting approach uses constraints as rewriting rules to reformulate query into a set of auxiliary queries such that the answers of rewritten-queries are not only the answers of the query but also valid answers w.r.t. all integrated constraints. The proof of the correction and completeness of our rewriting system is presented after formalizing the notion of a valid answers w.r.t. a context. These two approaches have been evaluated and have shown the feasibility of our system.This is our main contribution: we extend the set of well-known query-rewriting systems (Chase, Chase& backchase, PerfectRef, Xrewrite, etc.) with a new effective solution for the new purpose of filtering query results based on constraints in user context. Moreover, we also enlarge the trigger condition of the constraint compared with other works by using the notion of one-way MGU
Lelong, Romain. "Accès sémantique aux données massives et hétérogènes en santé". Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR030/document.
Pełny tekst źródłaClinical data are produced as part of the practice of medicine by different health professionals, in several places and in various formats. They therefore present an heterogeneity both in terms of their nature and structure and are furthermore of a particularly large volume, which make them considered as Big Data. The work carried out in this thesis aims at proposing an effective information retrieval method within the context of this type of complex and massive data. First, the access to clinical data constrained by the need to model clinical information. This can be done within Electronic Health Records and, in a larger extent, within data Warehouses. In this thesis, I proposed a proof of concept of a search engine allowing the access to the information contained in the Semantic Health Data Warehouse of the Rouen University Hospital. A generic data model allows this data warehouse to view information as a graph of data, thus enabling to model the information while preserving its conceptual complexity. In order to provide search functionalities adapted to this generic representation of data, a query language allowing access to clinical information through the various entities of which it is composed has been developed and implemented as a part of this thesis’s work. Second, the massiveness of clinical data is also a major technical challenge that hinders the implementation of an efficient information retrieval. The initial implementation of the proof of concept highlighted the limits of a relational database management systems when used in the context of clinical data. A migration to a NoSQL key-value store has been then completed. Although offering good atomic data access performance, this migration nevertheless required additional developments and the design of a suitable hardware and applicative architecture toprovide advanced search functionalities. Finally, the contribution of this work within the general context of the Semantic Health Data Warehouse of the Rouen University Hospital was evaluated. The proof of concept proposed in this work was used to access semantic descriptions of information in order to meet the criteria for including and excluding patients in clinical studies. In this evaluation, a total or partial response is given to 72.97% of the criteria. In addition, the genericity of the tool has also made it possible to use it in other contexts such as documentary and bibliographic information retrieval in health