Tesis sobre el tema "Données RDF"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 44 mejores tesis para su investigación sobre el tema "Données RDF".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Lesnikova, Tatiana. "Liage de données RDF : évaluation d'approches interlingues". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM011/document.
Texto completoThe Semantic Web extends the Web by publishing structured and interlinked data using RDF.An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly difficult when resources are described in different natural languages.This thesis investigates the effectiveness of linguistic resources for interlinking RDF data sets. For this purpose, we introduce a general framework in which each RDF resource is represented as a virtual document containing text information of neighboring nodes. The context of a resource are the labels of the neighboring nodes. Once virtual documents are created, they are projected in the same space in order to be compared. This can be achieved by using machine translation or multilingual lexical resources. Once documents are in the same space, similarity measures to find identical resources are applied. Similarity between elements of this space is taken for similarity between RDF resources.We performed evaluation of cross-lingual techniques within the proposed framework. We experimentally evaluate different methods for linking RDF data. In particular, two strategies are explored: applying machine translation or using references to multilingual resources. Overall, evaluation shows the effectiveness of cross-lingual string-based approaches for linking RDF resources expressed in different languages. The methods have been evaluated on resources in English, Chinese, French and German. The best performance (over 0.90 F-measure) was obtained by the machine translation approach. This shows that the similarity-based method can be successfully applied on RDF resources independently of their type (named entities or thesauri concepts). The best experimental results involving just a pair of languages demonstrated the usefulness of such techniques for interlinking RDF resources cross-lingually
Tanasescu, Adrian. "Vers un accès sémantique aux données : approche basée sur RDF". Lyon 1, 2007. http://www.theses.fr/2007LYO10069.
Texto completoThe thesis mainly focuses on information retrival through RDF documents querying. Therefore, we propose an approach able to provide complete and pertinent answers to a user formulated SPARQL query. The approach mainly consists of (1) determining, through a similarity measure, whether two RDF graphs are contradictory, by using the associated ontological knowledge, and (2) building pertinent answers through the combination of statements belonging to non contradicting RDF graphs that partially answer a given query. We also present an RDF storage and querying platform, named SyRQuS, whose query answering plan is entirely based on the former proposed querying approach. SyRQuS is a Web based plateform that mainly provides users with a querying interface where queries can be formulated using SPARQL
Ben, Ellefi Mohamed. "La recommandation des jeux de données basée sur le profilage pour le liage des données RDF". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT276/document.
Texto completoWith the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community
Ouksili, Hanane. "Exploration et interrogation de données RDF intégrant de la connaissance métier". Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLV069.
Texto completoAn increasing number of datasets is published on the Web, expressed in languages proposed by the W3C to describe Web data such as RDF, RDF(S) and OWL. The Web has become a unprecedented source of information available for users and applications, but the meaningful usage of this information source is still a challenge. Querying these data sources requires the knowledge of a formal query language such as SPARQL, but it mainly suffers from the lack of knowledge about the source itself, which is required in order to target the resources and properties relevant for the specific needs of the application. The work described in this thesis addresses the exploration of RDF data sources. This exploration is done according to two complementary ways: discovering the themes or topics representing the content of the data source, and providing a support for an alternative way of querying the data sources by using keywords instead of a query formulated in SPARQL. The proposed exploration approach combines two complementary strategies: thematic-based exploration and keyword search. Theme discovery from an RDF dataset consists in identifying a set of sub-graphs which are not necessarily disjoints, and such that each one represents a set of semantically related resources representing a theme according to the point of view of the user. These themes can be used to enable a thematic exploration of the data source where users can target the relevant theme and limit their exploration to the resources composing this theme. Keyword search is a simple and intuitive way of querying data sources. In the case of RDF datasets, this search raises several problems, such as indexing graph elements, identifying the relevant graph fragments for a specific query, aggregating these relevant fragments to build the query results, and the ranking of these results. In our work, we address these different problems and we propose an approach which takes as input a keyword query and provides a list of sub-graphs, each one representing a candidate result for the query. These sub-graphs are ordered according to their relevance to the query. For both keyword search and theme identification in RDF data sources, we have taken into account some external knowledge in order to capture the users needs, or to bridge the gap between the concepts invoked in a query and the ones of the data source. This external knowledge could be domain knowledge allowing to refine the user's need expressed by a query, or to refine the definition of themes. In our work, we have proposed a formalization to this external knowledge and we have introduced the notion of pattern to this end. These patterns represent equivalences between properties and paths in the dataset. They are evaluated and integrated in the exploration process to improve the quality of the result
Michel, Franck. "Intégrer des sources de données hétérogènes dans le Web de données". Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4002/document.
Texto completoTo a great extent, the success of the Web of Data depends on the ability to reach out legacy data locked in silos inaccessible from the web. In the last 15 years, various works have tackled the problem of exposing various structured data in the Resource Description Format (RDF). Meanwhile, the overwhelming success of NoSQL databases has made the database landscape more diverse than ever. NoSQL databases are strong potential contributors of valuable linked open data. Hence, the object of this thesis is to enable RDF-based data integration over heterogeneous data sources and, in particular, to harness NoSQL databases to populate the Web of Data. We propose a generic mapping language, xR2RML, to describe the mapping of heterogeneous data sources into an arbitrary RDF representation. xR2RML relies on and extends previous works on the translation of RDBs, CSV/TSV and XML into RDF. With such an xR2RML mapping, we propose either to materialize RDF data or to dynamically evaluate SPARQL queries on the native database. In the latter, we follow a two-step approach. The first step performs the translation of a SPARQL query into a pivot abstract query based on the xR2RML mapping of the target database to RDF. In the second step, the abstract query is translated into a concrete query, taking into account the specificities of the database query language. Great care is taken of the query optimization opportunities, both at the abstract and the concrete levels. To demonstrate the effectiveness of our approach, we have developed a prototype implementation for MongoDB, the popular NoSQL document store. We have validated the method using a real-life use case in Digital Humanities
Bouhamoum, Redouane. "Découverte automatique de schéma pour les données irrégulières et massives". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG081.
Texto completoThe web of data is a huge global data space, relying on semantic web technologies, where a high number of sources are published and interlinked. This data space provides an unprecedented amount of knowledge available for novel applications, but the meaningful usage of its sources is often difficult due to the lack of schema describing the content of these data sources. Several automatic schema discovery approaches have been proposed, but while they provide good quality schemas, their use for massive data sources is a challenge as they rely on costly algorithms. In our work, we are interested in both the scalability and the incrementality of schema discovery approaches for RDF data sources where the schema is incomplete or missing.Furthermore, we extend schema discovery to take into account not only the explicit information provided by a data source, but also the implicit information which can be inferred.Our first contribution consists of a scalable schema discovery approach which extracts the classes describing the content of a massive RDF data source.We have proposed to extract a condensed representation of the source, which will be used as an input to the schema discovery process in order to improve its performances.This representation is a set of patterns, each one representing a combination of properties describing some entities in the dataset. We have also proposed a scalable schema discovery approach relying on a distributed clustering algorithm that forms groups of structurally similar entities representing the classes of the schema.Our second contribution aims at maintaining the generated schema consistent with the data source it describes, as this latter may evolve over time. We propose an incremental schema discovery approach that modifies the set of extracted classes by propagating the changes occurring at the source, in order to keep the schema consistent with its evolutions.Finally, the goal of our third contribution is to extend schema discovery to consider the whole semantics expressed by a data source, which is represented not only by the explicitly declared triples, but also by the ones which can be inferred through reasoning. We propose an extension allowing to take into account all the properties of an entity during schema discovery, represented either by explicit or by implicit triples, which will improve the quality of the generated schema
Rihany, Mohamad. "Keyword Search and Summarization Approaches for RDF Dataset Exploration". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG030.
Texto completoAn increasing number of datasets are published on the Web, expressed in the standard languages proposed by the W3C such as RDF, RDF (S), and OWL. These datasets represent an unprecedented amount of data available for users and applications. In order to identify and use the relevant datasets, users and applications need to explore them using queries written in SPARQL, a query language proposed by the W3C. But in order to write a SPARQL query, a user should not only be familiar with the query language but also have knowledge about the content of the RDF dataset in terms of the resources, classes or properties it contains. The goal of this thesis is to provide approaches to support the exploration of these RDF datasets. We have studied two alternative and complementary exploration techniques, keyword search and summarization of an RDF dataset. Keyword search returns RDF graphs in response to a query expressed as a set of keywords, where each resulting graph is the aggregation of elements extracted from the source dataset. These graphs represent possible answers to the keyword query, and they can be ranked according to their relevance. Keyword search in RDF datasets raises the following issues: (i) identifying for each keyword in the query the matching elements in the considered dataset, taking into account the differences of terminology between the keywords and the terms used in the RDF dataset, (ii) combining the matching elements to build the result by defining aggregation algorithms that find the best way of linking matching elements, and finally (iii), finding appropriate metrics to rank the results, as several matching elements may exist for each keyword and consequently several graphs may be returned. In our work, we propose a keyword search approach that addresses these issues. Providing a summarized view of an RDF dataset can help a user in identifying if this dataset is relevant to his needs, and in highlighting its most relevant elements. This could be useful for the exploration of a given dataset. In our work, we propose a novel summarization approach based on the underlying themes of a dataset. Our theme-based summarization approach consists of extracting the existing themes in a data source, and building the summarized view so as to ensure that all these discovered themes are represented. This raises the following questions: (i) how to identify the underlying themes in an RDF dataset? (ii) what are the suitable criteria to identify the relevant elements in the themes extracted from the RDF graph? (iii) how to aggregate and connect the relevant elements to create a theme summary? and finally, (iv) how to create the summary for the whole RDF graph from the generated theme summaries? In our work, we propose a theme-based summarization approach for RDF datasets which answers these questions and provides a summarized representation ensuring that each theme is represented proportionally to its importance in the initial dataset
Lozano, Aparicio Jose Martin. "Data exchange from relational databases to RDF with target shape schemas". Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I063.
Texto completoResource Description Framework (RDF) is a graph data model which has recently found the use of publishing on the web data from relational databases. We investigate data exchange from relational databases to RDF graphs with target shapes schemas. Essentially, data exchange models a process of transforming an instance of a relational schema, called the source schema, to a RDF graph constrained by a target schema, according to a set of rules, called source-to-target tuple generating dependencies. The output RDF graph is called a solution. Because the tuple generating dependencies define this process in a declarative fashion, there might be many possible solutions or no solution at all. We study constructive relational to RDF data exchange setting with target shapes schemas, which is composed of a relational source schema, a shapes schema for the target schema, a set of mappings that uses IRI constructors. Furthermore, we assume that any two IRI constructors are non-overlapping. We propose a visual mapping language (VML) that helps non-expert users to specify mappings in this setting. Moreover, we develop a tool called ShERML that performs data exchange with the use of VML and for users that want to understand the model behind VML mappings, we define R2VML, a text-based mapping language, that captures VML and presents a succinct syntax for defining mappings.We investigate the problem of checking consistency: a data exchange setting is consistent if for every input source instance, there is at least one solution. We show that the consistency problem is coNP-complete and provide a static analysis algorithm of the setting that allows to decide if the setting is consistent or not. We study the problem of computing certain answers. An answer is certain if the answer holds in every solution. Typically, certain answers are computed using a universal solution. However, in our setting a universal solution might not exist. Thus, we introduce the notion of universal simulation solution, which always exists and allows to compute certain answers to any class of queries that is robust under simulation. One such class is nested regular expressions (NREs) that are forward i.e., do not use the inverse operation. Using universal simulation solution renders tractable the computation of certain answers to forward NREs (data-complexity).Finally, we investigate the shapes schema elicitation problem that consists of constructing a target shapes schema from a constructive relational to RDF data exchange setting without the target shapes schema. We identity two desirable properties of a good target schema, which are soundness i.e., every produced RDF graph is accepted by the target schema; and completeness i.e., every RDF graph accepted by the target schema can be produced. We propose an elicitation algorithm that is sound for any schema-less data exchange setting, but also that is complete for a large practical class of schema-less settings
Kellou-Menouer, Kenza. "Découverte de schéma pour les données du Web sémantique". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV047/document.
Texto completoAn increasing number of linked data sources are published on the Web. However, their schema may be incomplete or missing. In addition, data do not necessarily follow their schema. This flexibility for describing the data eases their evolution, but makes their exploitation more complex. In our work, we have proposed an automatic and incremental approach enabling schema discovery from the implicit structure of the data. To complement the description of the types in a schema, we have also proposed an approach for finding the possible versions (patterns) for each of them. It proceeds online without having to download or browse the source. This can be expensive or even impossible because the sources may have some access limitations, either on the query execution time, or on the number of queries.We have also addressed the problem of annotating the types in a schema, which consists in finding a set of labels capturing their meaning. We have proposed annotation algorithms which provide meaningful labels using external knowledge bases. Our approach can be used to find meaningful type labels during schema discovery, and also to enrichthe description of existing types.Finally, we have proposed an approach to evaluate the gap between a data source and itsschema. To this end, we have proposed a setof quality factors and the associated metrics, aswell as a schema extension allowing to reflect the heterogeneity among instances of the sametype. Both factors and schema extension are used to analyze and improve the conformity between a schema and the instances it describes
Taki, Sara. "Anonymisation de données liées en utilisant la confidentialité différentielle". Electronic Thesis or Diss., Bourges, INSA Centre Val de Loire, 2023. http://www.theses.fr/2023ISAB0009.
Texto completoThis thesis studies the problem of privacy in linked open data (LOD). Thiswork is at the intersection of long lines of work on data privacy and linked open data.Our goal is to study how the presence of semantics impacts the publication of data andpossible data leaks. We consider RDF as the format to represent LOD and DifferentialPrivacy (DP) as the main privacy concept. DP was initially conceived to define privacyin the relational database (RDB) domain and is based on a quantification of the difficultyfor an attacker observing an output to identify which database among a neighborhoodis used to produce it.The objective of this thesis is four-fold: O1) to improve the privacy of LOD. Inparticular, to propose an approach to construct usable DP-mechanisms on RDF; O2) tostudy how neighborhood definitions over RDB in the presence of foreign key (FK) constraints translate to RDF; O3) to propose new neighborhood definitions over relationaldatabase translating into existing graph concepts to ease the design of DP mechanisms;and O4) to support the implementation of sanitization mechanisms for RDF graphs witha rigorous formal foundation.For O1, we propose a novel approach based on graph projection to adapt DP toRDF. For O2, we determine the privacy model resulting from the translation of popularprivacy model over RDB with FK constraints to RDF. For O3, we propose the restrictdeletion neighborhood over RDB with FK constraints whose translation to the RDFgraph world is equivalent to typed-node neighborhood. Moreover, we propose a looserdefinition translating to typed-outedge neighborhood. For O4, we propose a graphtransformation language based on graph rewriting to serve as a basis for constructingvarious sanitization mechanisms on attributed graphs.We support all our theoretical contributions with proof-of-concept prototypes thatimplement our proposals and are evaluated on real datasets to show the applicability ofour work
Yang, Jitao. "Un modèle de données pour bibliothèques numériques". Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112085.
Texto completoDigital Libraries are complex information systems, storing digital resources (e.g., text, images, sound, audio), as well as knowledge about digital or non-digital resources; this knowledge is referred to as metadata. We propose a data model for digital libraries supporting resource identification, use of metadata and re-use of stored resources, as well as a query language supporting discovery of resources. The model that we propose is inspired by the architecture of the Web, which forms a solid, universally accepted basis for the notions and services expected from a digital library. We formalize our model as a first-order theory, in order to be able to express the basic concepts of digital libraries without being constrained by any technical considerations. The axioms of the theory give the formal semantics of the notions of the model, and at the same time, provide a definition of the knowledge that is implicit in a digital library. The theory is then translated into a Datalog program that, given a digital library, allows to efficiently complete the digital library with the knowledge implicit in it. The goal of our research is to contribute to the information management technology of digital libraries. In this way, we are able to demonstrate the theoretical feasibility of our digital library model, by showing that it can be efficiently implemented. Moreover, we demonstrate our model’s practical feasibility by providing a full translation of the model into RDF and of the query language into SPARQL. We provide a sound and complete calculus for reasoning on the RDF graphs resulting from translation. Based on this calculus, we prove the correctness of both translations, showing that the translation functions preserve the semantics of the digital library and of the query language
Picalausa, Francois. "Guarded structural indexes: theory and application to relational RDF databases". Doctoral thesis, Universite Libre de Bruxelles, 2013. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209432.
Texto completoCet accroissement du volume de données semi-structurées a suscité un intérêt croissant pour le développement de bases de données adaptées. Parmi les différentes approches proposées, on peut distinguer les approches relationnelles et les approches graphes, comme détaillé au Chapitre 3. Les premières visent à exploiter les moteurs de bases de données relationnelles existants, en y intégrant des techniques spécialisées. Les secondes voient les données semistructurées comme des graphes, c’est-à-dire un ensemble de noeuds liés entre eux par des arêtes étiquetées, dont elles exploitent la structure. L’une des techniques de ce domaine, connue sous le nom d’indexation structurelle, vise à résumer les graphes de données, de sorte à pouvoir identifier rapidement les données utiles au traitement d’une requête.
Les index structurels classiques sont construits sur base des notions de simulation et de bisimulation sur des graphes. Ces notions, qui sont d’usage dans de nombreux domaines tels que la vérification, la sécurité, et le stockage de données, sont des relations sur les noeuds des graphes. Fondamentalement, ces notions caractérisent le fait que deux noeuds partagent certaines caractéristiques telles qu’un même voisinage.
Bien que les approches graphes soient efficaces en pratique, elles présentent des limitations dans le cadre de RDF et son langage de requêtes SPARQL. Les étiquettes sont, dans cette optique, distinctes des noeuds du graphe .Dans le modèle décrit par RDF et supporté par SPARQL, les étiquettes et noeuds font néanmoins partie du même ensemble. C’est pourquoi, les approches graphes ne supportent qu’un sous-ensemble des requêtes SPARQL. Au contraire, les approches relationnelles sont fidèles au modèle RDF, et peuvent répondre au différentes requêtes SPARQL.
La question à laquelle nous souhaitons répondre dans cette thèse est de savoir si les approches relationnelles et graphes sont incompatible, ou s’il est possible de les combiner de manière avantageuse. En particulier, il serait souhaitable de pouvoir conserver la performance des approches graphe, et la généralité des approches relationnelles. Dans ce cadre, nous réalisons un index structurel adapté aux données relationnelles.
Nous nous basons sur une méthodologie décrite par Fletcher et ses coauteurs pour la conception d’index structurels. Cette méthodologie repose sur trois composants principaux. Un premier composant est une caractérisation dite structurelle du langage de requêtes à supporter. Il s’agit ici de pouvoir identifier les données qui sont retournées en même temps par n’importe quelle requête du langage aussi précisément que possible. Un second composant est un algorithme qui doit permettre de grouper efficacement les données qui sont retournées en même temps, d’après la caractérisation structurelle. Le troisième composant est l’index en tant que tel. Il s’agit d’une structure de données qui doit permettre d’identifier les groupes de données, générés par l’algorithme précédent pour répondre aux requêtes.
Dans un premier temps, il faut remarquer que le langage SPARQL pris dans sa totalité ne se prête pas à la réalisation d’index structurels efficaces. En effet, le fondement des requêtes SPARQL se situe dans l’expression de requêtes conjonctives. La caractérisation structurelle des requêtes conjonctives est connue, mais ne se prête pas à la construction d’algorithmes efficaces pour le groupement. Néanmoins, l’étude empirique des requêtes SPARQL posées en pratique que nous réalisons au Chapitre 5 montre que celles-ci sont principalement des requêtes conjonctives acycliques. Les requêtes conjonctives acycliques sont connues dans la littérature pour admettre des algorithmes d’évaluation efficaces.
Le premier composant de notre index structurel, introduit au Chapitre
6, est une caractérisation des requêtes conjonctives acycliques. Cette
caractérisation est faite en termes de guarded simulation. Pour les graphes la
notion de simulation est une version restreinte de la notion de bisimulation.
Similairement, nous introduisons la notion de guarded simulation comme une
restriction de la notion de guarded bisimulation, une extension connue de la
notion de bisimulation aux données relationelles.
Le Chapitre 7 offre un second composant de notre index structurel. Ce composant est une structure de données appelée guarded structural index qui supporte le traitement de requêtes conjonctives quelconques. Nous montrons que, couplé à la caractérisation structurelle précédente, cet index permet d’identifier de manière optimale les données utiles au traitement de requêtes conjonctives acycliques.
Le Chapitre 8 constitue le troisième composant de notre index structurel et propose des méthodes efficaces pour calculer la notion de guarded simulation. Notre algorithme consiste essentiellement en une transformation d’une base de données en un graphe particulier, sur lequel les notions de simulation et guarded simulation correspondent. Il devient alors possible de réutiliser les algorithmes existants pour calculer des relations de simulation.
Si les chapitres précédents définissent une base nécessaire pour un index structurel visant les données relationnelles, ils n’intègrent pas encore cet index dans le contexte d’un moteur de bases de données relationnelles. C’est ce que propose le Chapitre 9, en développant des méthodes qui permettent de prendre en compte l’index durant le traitement d’une requête SPARQL. Des résultats expérimentaux probants complètent cette étude.
Ce travail apporte donc une première réponse positive à la question de savoir s’il est possible de combiner de manière avantageuse les approches relationnelles et graphes de stockage de données RDF.
Doctorat en Sciences de l'ingénieur
info:eu-repo/semantics/nonPublished
Galicia, Auyón Jorge Armando. "Revisiting Data Partitioning for Scalable RDF Graph Processing Combining Graph Exploration and Fragmentation for RDF Processing Query Optimization for Large Scale Clustered RDF Data RDFPart- Suite: Bridging Physical and Logical RDF Partitioning. Reverse Partitioning for SPARQL Queries: Principles and Performance Analysis. ShouldWe Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System? EXGRAF: Exploration et Fragmentation de Graphes au Service du Traitement Scalable de Requˆetes RDF". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2021. http://www.theses.fr/2021ESMA0001.
Texto completoThe Resource Description Framework (RDF) and SPARQL are very popular graph-based standards initially designed to represent and query information on the Web. The flexibility offered by RDF motivated its use in other domains and today RDF datasets are great information sources. They gather billions of triples in Knowledge Graphs that must be stored and efficiently exploited. The first generation of RDF systems was built on top of traditional relational databases. Unfortunately, the performance in these systems degrades rapidly as the relational model is not suitable for handling RDF data inherently represented as a graph. Native and distributed RDF systems seek to overcome this limitation. The former mainly use indexing as an optimization strategy to speed up queries. Distributed and parallel RDF systems resorts to data partitioning. The logical representation of the database is crucial to design data partitions in the relational model. The logical layer defining the explicit schema of the database provides a degree of comfort to database designers. It lets them choose manually or automatically (through advisors) the tables and attributes to be partitioned. Besides, it allows the partitioning core concepts to remain constant regardless of the database management system. This design scheme is no longer valid for RDF databases. Essentially, because the RDF model does not explicitly enforce a schema since RDF data is mostly implicitly structured. Thus, the logical layer is inexistent and data partitioning depends strongly on the physical implementations of the triples on disk. This situation contributes to have different partitioning logics depending on the target system, which is quite different from the relational model’s perspective. In this thesis, we promote the novel idea of performing data partitioning at the logical level in RDF databases. Thereby, we first process the RDF data graph to support logical entity-based partitioning. After this preparation, we present a partitioning framework built upon these logical structures. This framework is accompanied by data fragmentation, allocation, and distribution procedures. This framework was incorporated to a centralized (RDF_QDAG) and a distributed (gStoreD) triple store. We conducted several experiments that confirmed the feasibility of integrating our framework to existent systems improving their performances for certain queries. Finally, we design a set of RDF data partitioning management tools including a data definition language (DDL) and an automatic partitioning wizard
Huang, Xin. "Querying big RDF data : semantic heterogeneity and rule-based inconsistency". Electronic Thesis or Diss., Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB124.
Texto completoSemantic Web is the vision of next generation of Web proposed by Tim Berners-Lee in 2001. Indeed, with the rapid development of Semantic Web technologies, large-scale RDF data already exist as linked open data, and their number is growing rapidly. Traditional Semantic Web querying and reasoning tools are designed to run in stand-alone environment. Therefor, Processing large-scale bulk data computation using traditional solutions will result in bottlenecks of memory space and computational performance inevitably. Large volumes of heterogeneous data are collected from different data sources by different organizations. In this context, different sources always exist inconsistencies and uncertainties which are difficult to identify and evaluate. To solve these challenges of Semantic Web, the main research contents and innovative approaches are proposed as follows. For these purposes, we firstly developed an inference based semantic entity resolution approach and linking mechanism when the same entity is provided in multiple RDF resources described using different semantics and URIs identifiers. We also developed a MapReduce based rewriting engine for Sparql query over big RDF data to handle the implicit data described intentionally by inference rules during query evaluation. The rewriting approach also deal with the transitive closure and cyclic rules to provide a rich inference language as RDFS and OWL. The second contribution concerns the distributed inconsistency processing. We extend the approach presented in first contribution by taking into account inconsistency in the data. This includes: (1)Rules based inconsistency detection with the help of our query rewriting engine; (2)Consistent query evaluation in three different semantics. The third contribution concerns the reasoning and querying over large-scale uncertain RDF data. We propose an MapReduce based approach to deal with large-scale reasoning with uncertainty. Unlike possible worlds semantic, we propose an algorithm for generating intensional Sparql query plan over probabilistic RDF graph for computing the probabilities of results within the query
Alam, Mehwish. "Découverte interactive de connaissances dans le web des données". Thesis, Université de Lorraine, 2015. http://www.theses.fr/2015LORR0158/document.
Texto completoRecently, the “Web of Documents” has become the “Web of Data”, i.e., the documents are annotated in the form of RDF making this human processable data directly processable by machines. This data can further be explored by the user using SPARQL queries. As web clustering engines provide classification of the results obtained by querying web of documents, a framework for providing classification over SPARQL query answers is also needed to make sense of what is contained in the data. Exploratory Data Mining focuses on providing an insight into the data. It also allows filtering of non-interesting parts of data by directly involving the domain expert in the process. This thesis contributes in aiding the user in exploring Linked Data with the help of exploratory data mining. We study three research directions, i.e., 1) Creating views over RDF graphs and allow user interaction over these views, 2) assessing the quality and completing RDF data and finally 3) simultaneous navigation/exploration over heterogeneous and multiple resources present on Linked Data. Firstly, we introduce a solution modifier i.e., View By to create views over RDF graphs by classifying SPARQL query answers with the help of Formal Concept Analysis. In order to navigate the obtained concept lattice and extract knowledge units, we develop a new tool called RV-Explorer (Rdf View eXplorer) which implements several navigational modes. However, this navigation/exploration reveal several incompletions in the data sets. In order to complete the data, we use association rule mining for completing RDF data. Furthermore, for providing navigation and exploration directly over RDF graphs along with background knowledge, RDF triples are clustered w.r.t. background knowledge and these clusters can then be navigated and interactively explored. Finally, it can be concluded that instead of providing direct exploration we use FCA as an aid for clustering RDF data and allow user to explore these clusters of data and enable the user to reduce his exploration space by interaction
Dia, Amadou Fall. "Filtrage sémantique et gestion distribuée de flux de données massives". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS495.
Texto completoOur daily use of the Internet and related technologies generates, at a rapid and variable speeds, large volumes of heterogeneous data issued from sensor networks, search engine logs, multimedia content sites, weather forecasting, geolocation, Internet of Things (IoT) applications, etc. Processing such data in conventional databases (Relational Database Management Systems) may be very expensive in terms of time and memory storage resources. To effectively respond to the needs of rapid decision-making, these streams require real-time processing. Data Stream Management Systems (SGFDs) evaluate queries on the recent data of a stream within structures called windows. The input data are different formats such as CSV, XML, RSS, or JSON. This heterogeneity lock comes from the nature of the data streams and must be resolved. For this, several research groups have benefited from the advantages of semantic web technologies (RDF and SPARQL) by proposing RDF data streams processing systems called RSPs. However, large volumes of RDF data, high input streams, concurrent queries, combination of RDF streams and large volumes of stored RDF data and expensive processing drastically reduce the performance of these systems. A new approach is required to considerably reduce the processing load of RDF data streams. In this thesis, we propose several complementary solutions to reduce the processing load in centralized environment. An on-the-fly RDF graphs streams sampling approach is proposed to reduce data and processing load while preserving semantic links. This approach is deepened by adopting a graph-oriented summary approach to extract the most relevant information from RDF graphs by using centrality measures issued from the Social Networks Analysis. We also adopt a compressed format of RDF data and propose an approach for querying compressed RDF data without decompression phase. To ensure parallel and distributed data streams management, the presented work also proposes two solutions for reducing the processing load in distributed environment. An engine and parallel processing approaches and distributed RDF graphs streams. Finally, an optimized processing approach for static and dynamic data combination operations is also integrated into a new distributed RDF graphs streams management system
Roatis, Alexandra. "Efficient Querying and Analytics of Semantic Web Data". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112218/document.
Texto completoThe utility and relevance of data lie in the information that can be extracted from it.The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation.In this thesis we leverage mature relational data management technology for querying Semantic Web data.The first part focuses on query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation. In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.The second part of this thesis considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach
Abidi, Amna. "Imperfect RDF Databases : From Modelling to Querying". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2019. http://www.theses.fr/2019ESMA0008/document.
Texto completoThe ever-increasing interest of RDF data on the Web has led to several and important research efforts to enrich traditional RDF data formalism for the exploitation and analysis purpose. The work of this thesis is a part of the continuation of those efforts by addressing the issue of RDF data management in presence of imperfection (untruthfulness, uncertainty, etc.). The main contributions of this dissertation are as follows. (1) We tackled the trusted RDF data model. Hence, we proposed to extend the skyline queries over trust RDF data, which consists in extracting the most interesting trusted resources according to user-defined criteria. (2) We studied via statistical methods the impact of the trust measure on the Trust-skyline set.(3) We integrated in the structure of RDF data (i.e., subject-property-object triple) a fourth element expressing a possibility measure to reflect the user opinion about the truth of a statement.To deal with possibility requirements, appropriate framework related to language is introduced, namely Pi-SPARQL, that extends SPARQL to be possibility-aware query language.Finally, we studied a new skyline operator variant to extract possibilistic RDF resources that are possibly dominated by no other resources in the sense of Pareto optimality
Ren, Xiangnan. "Traitement et raisonnement distribués des flux RDF". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1139/document.
Texto completoReal-time processing of data streams emanating from sensors is becoming a common task in industrial scenarios. In an Internet of Things (IoT) context, data are emitted from heterogeneous stream sources, i.e., coming from different domains and data models. This requires that IoT applications efficiently handle data integration mechanisms. The processing of RDF data streams hence became an important research field. This trend enables a wide range of innovative applications where the real-time and reasoning aspects are pervasive. The key implementation goal of such application consists in efficiently handling massive incoming data streams and supporting advanced data analytics services like anomaly detection. However, a modern RSP engine has to address volume and velocity characteristics encountered in the Big Data era. In an on-going industrial project, we found out that a 24/7 available stream processing engine usually faces massive data volume, dynamically changing data structure and workload characteristics. These facts impact the engine's performance and reliability. To address these issues, we propose Strider, a hybrid adaptive distributed RDF Stream Processing engine that optimizes logical query plan according to the state of data streams. Strider has been designed to guarantee important industrial properties such as scalability, high availability, fault-tolerant, high throughput and acceptable latency. These guarantees are obtained by designing the engine's architecture with state-of-the-art Apache components such as Spark and Kafka. Moreover, an increasing number of processing jobs executed over RSP engines are requiring reasoning mechanisms. It usually comes at the cost of finding a trade-off between data throughput, latency and the computational cost of expressive inferences. Therefore, we extend Strider to support real-time RDFS+ (i.e., RDFS + owl:sameAs) reasoning capability. We combine Strider with a query rewriting approach for SPARQL that benefits from an intelligent encoding of knowledge base. The system is evaluated along different dimensions and over multiple datasets to emphasize its performance. Finally, we have stepped further to exploratory RDF stream reasoning with a fragment of Answer Set Programming. This part of our research work is mainly motivated by the fact that more and more streaming applications require more expressive and complex reasoning tasks. The main challenge is to cope with the large volume and high-velocity dimensions in a scalable and inference-enabled manner. Recent efforts in this area still missing the aspect of system scalability for stream reasoning. Thus, we aim to explore the ability of modern distributed computing frameworks to process highly expressive knowledge inference queries over Big Data streams. To do so, we consider queries expressed as a positive fragment of LARS (a temporal logic framework based on Answer Set Programming) and propose solutions to process such queries, based on the two main execution models adopted by major parallel and distributed execution frameworks: Bulk Synchronous Parallel (BSP) and Record-at-A-Time (RAT). We implement our solution named BigSR and conduct a series of evaluations. Our experiments show that BigSR achieves high throughput beyond million-triples per second using a rather small cluster of machines
Dehainsala, Hondjack. "Explicitation de la sémantique dans lesbases de données : Base de données à base ontologique et le modèle OntoDB". Phd thesis, Université de Poitiers, 2007. http://tel.archives-ouvertes.fr/tel-00157595.
Texto completoen termes de classes et de propriétés, ainsi que des relations qui les lient. Avec le développement de
modèles d'ontologies stables dans différents domaines, OWL dans le domaine duWeb sémantique,
PLIB dans le domaine technique, de plus en plus de données (ou de métadonnées) sont décrites par référence à ces ontologies. La taille croissante de telles données rend nécessaire de les gérer au sein de bases de données originales, que nous appelons bases de données à base ontologique (BDBO), et qui possèdent la particularité de représenter, outre les données, les ontologies qui en définissent le sens. Plusieurs architectures de BDBO ont ainsi été proposées au cours des dernières années. Les chémas qu'elles utilisent pour la représentation des données sont soit constitués d'une unique table de triplets de type (sujet, prédicat, objet), soit éclatés en des tables unaires et binaires respectivement pour chaque classe et pour chaque propriété. Si de telles représentations permettent une grande flexibilité dans la structure des données représentées, elles ne sont ni susceptibles de passer à grande échelle lorsque chaque instance est décrite par un nombre significatif de propriétés, ni adaptée à la structure des bases de données usuelles, fondée sur les relations n-aires. C'est ce double inconvénient que vise à résoudre le modèle OntoDB. En introduisant des hypothèses de typages qui semblent acceptables dans beaucoup de domaine d'application, nous proposons une architecture de BDBO constituée de quatre parties : les deux premières parties correspondent à la structure usuelle des bases de données : données reposant sur un schéma logique de données, et méta-base décrivant l'ensemble de la structure de tables.
Les deux autres parties, originales, représentent respectivement les ontologies, et le méta-modèle
d'ontologie au sein d'un méta-schéma réflexif. Des mécanismes d'abstraction et de nomination permettent respectivement d'associer à chaque donnée le concept ontologique qui en définit le sens, et d'accéder aux données à partir des concepts, sans se préoccuper de la représentation des données. Cette architecture permet à la fois de gérer de façon efficace des données de grande taille définies par référence à des ontologies (données à base ontologique), mais aussi d'indexer des bases de données usuelles au niveau connaissance en leur adjoignant les deux parties : ontologie et méta-schéma. Le modèle d'architecture que nous proposons a été validé par le développement d'un prototype opérationnel implanté sur le système PostgreSQL avec le modèle d'ontologie PLIB. Nous présentons également une évaluation comparative de nos propositions aux modèles présentés antérieurement.
Khelil, Abdallah. "Gestion et optimisation des données massives issues du Web Combining graph exploration and fragmentation for scalable rdf query processing Should We Be Afraid of Querying Billions of Triples in a Graph-Based Centralized System? EXGRAF : Exploration et Fragmentation de Graphes au Service du Traitement Scalable de Requˆetes RDF". Thesis, Chasseneuil-du-Poitou, Ecole nationale supérieure de mécanique et d'aérotechnique, 2020. http://www.theses.fr/2020ESMA0009.
Texto completoBig Data represents a challenge not only for the socio-economic world but also for scientific research. Indeed, as has been pointed out in several scientific articles and strategic reports, modern computer applications are facing new problems and issues that are mainly related to the storage and the exploitation of data generated by modern observation and simulation instruments. The management of such data represents a real bottleneck which has the effect of slowing down the exploitation of the various data collected not only in the framework of international scientific programs but also by companies, the latter relying increasingly on the analysis of large-scale data. Much of this data is published today on the WEB. Indeed, we are witnessing an evolution of the traditional web, designed basically to manage documents, to a web of data that allows to offer mechanisms for querying semantic information. Several data models have been proposed to represent this information on the Web. The most important is the Resource Description Framework (RDF) which provides a simple and abstract representation of knowledge for resources on the Web. Each semantic Web fact can be encoded with an RDF triple. In order to explore and query structured information expressed in RDF, several query languages have been proposed over the years. In 2008,SPARQL became the official W3C Recommendation language for querying RDF data.The need to efficiently manage and query RDF data has led to the development of new systems specifically designed to process this data format. These approaches can be categorized as centralized that rely on a single machine to manage RDF data and distributed that can combine multiple machines connected with a computer network. Some of these approaches are based on an existing data management system such as Virtuoso and Jena, others relies on an approach specifically designed for the management of RDF triples such as GRIN, RDF3X and gStore. With the evolution ofRDF datasets (e.g. DBPedia) and Sparql, most systems have become obsolete and/or inefficient. For example, no one of existing centralized system is able to manage 1 billion triples provided under the WatDiv benchmark. Distributed systems would allow under certain conditions to improve this point but consequently leads a performance degradation. In this Phd thesis, we propose the centralized system "RDF_QDAG" that allows to find a good compromise between scalability and performance. We propose to combine physical data fragmentation and data graph exploration."RDF_QDAG" supports multiple types of queries based not only on basic graph patterns but also that incorporate filters based on regular expressions and aggregation and sorting functions. "RDF_QDAG" relies on the Volcano execution model, which allows controlling the main memory, avoiding any overflow even if the hardware configuration is limited. To the best of our knowledge, "RDF_QDAG" is the only centralized system that good performance when manage several billion triples. We compared this system with other systems that represent the state of the art in RDF data management: a relational approach (Virtuoso), a graph-based approach (g-Store), an intensive indexing approach (RDF-3X) and two parallel approaches (CliqueSquare and g-Store-D). "RDF_QDAG" surpasses existing systems when it comes to ensuring both scalability and performance
Costabello, Luca. "Contrôle d'accès et présentation contextuelle pour le Web des données". Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00934617.
Texto completoLeblay, Julien. "Techniques d'optimisation pour des données semi-structurées du web sémantique". Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00872883.
Texto completoDelanaux, Rémy. "Intégration de données liées respectueuse de la confidentialité". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1303.
Texto completoIndividual privacy is a major and largely unexplored concern when publishing new datasets in the context of Linked Open Data (LOD). The LOD cloud forms a network of interconnected and publicly accessible datasets in the form of graph databases modeled using the RDF format and queried using the SPARQL language. This heavily standardized context is nowadays extensively used by academics, public institutions and some private organizations to make their data available. Yet, some industrial and private actors may be discouraged by potential privacy issues. To this end, we introduce and develop a declarative framework for privacy-preserving Linked Data publishing in which privacy and utility constraints are specified as policies, that is sets of SPARQL queries. Our approach is data-independent and only inspects the privacy and utility policies in order to determine the sequence of anonymization operations applicable to any graph instance for satisfying the policies. We prove the soundness of our algorithms and gauge their performance through experimental analysis. Another aspect to take into account is that a new dataset published to the LOD cloud is indeed exposed to privacy breaches due to the possible linkage to objects already existing in the other LOD datasets. In the second part of this thesis, we thus focus on the problem of building safe anonymizations of an RDF graph to guarantee that linking the anonymized graph with any external RDF graph will not cause privacy breaches. Given a set of privacy queries as input, we study the data-independent safety problem and the sequence of anonymization operations necessary to enforce it. We provide sufficient conditions under which an anonymization instance is safe given a set of privacy queries. Additionally, we show that our algorithms are robust in the presence of sameAs links that can be explicit or inferred by additional knowledge. To conclude, we evaluate the impact of this safety-preserving solution on given input graphs through experiments. We focus on the performance and the utility loss of this anonymization framework on both real-world and artificial data. We first discuss and select utility measures to compare the original graph to its anonymized counterpart, then define a method to generate new privacy policies from a reference one by inserting incremental modifications. We study the behavior of the framework on four carefully selected RDF graphs. We show that our anonymization technique is effective with reasonable runtime on quite large graphs (several million triples) and is gradual: the more specific the privacy policy is, the lesser its impact is. Finally, using structural graph-based metrics, we show that our algorithms are not very destructive even when privacy policies cover a large part of the graph. By designing a simple and efficient way to ensure privacy and utility in plausible usages of RDF graphs, this new approach suggests many extensions and in the long run more work on privacy-preserving data publishing in the context of Linked Open Data
Slama, Olfa. "Flexible querying of RDF databases : a contribution based on fuzzy logic". Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1S089/document.
Texto completoThis thesis concerns the definition of a flexible approach for querying both crisp and fuzzy RDF graphs. This approach, based on the theory of fuzzy sets, makes it possible to extend SPARQL which is the W3C-standardised query language for RDF, so as to be able to express i) fuzzy user preferences on data (e.g., the release year of an album is recent) and on the structure of the data graph (e.g., the path between two friends is required to be short) and ii) more complex user preferences, namely, fuzzy quantified statements (e.g., most of the albums that are recommended by an artist, are highly rated and have been created by a young friend of this artist). We performed some experiments in order to study the performances of this approach. The main objective of these experiments was to show that the extra cost due to the introduction of fuzziness remains limited/acceptable. We also investigated, in a more general framework, namely graph databases, the issue of integrating the same type of fuzzy quantified statements in a fuzzy extension of Cypher which is a declarative language for querying (crisp) graph databases. Some experimental results are reported and show that the extra cost induced by the fuzzy quantified nature of the queries also remains very limited
Huang, Xin. "Querying big RDF data : semantic heterogeneity and rule-based inconsistency". Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCB124/document.
Texto completoSemantic Web is the vision of next generation of Web proposed by Tim Berners-Lee in 2001. Indeed, with the rapid development of Semantic Web technologies, large-scale RDF data already exist as linked open data, and their number is growing rapidly. Traditional Semantic Web querying and reasoning tools are designed to run in stand-alone environment. Therefor, Processing large-scale bulk data computation using traditional solutions will result in bottlenecks of memory space and computational performance inevitably. Large volumes of heterogeneous data are collected from different data sources by different organizations. In this context, different sources always exist inconsistencies and uncertainties which are difficult to identify and evaluate. To solve these challenges of Semantic Web, the main research contents and innovative approaches are proposed as follows. For these purposes, we firstly developed an inference based semantic entity resolution approach and linking mechanism when the same entity is provided in multiple RDF resources described using different semantics and URIs identifiers. We also developed a MapReduce based rewriting engine for Sparql query over big RDF data to handle the implicit data described intentionally by inference rules during query evaluation. The rewriting approach also deal with the transitive closure and cyclic rules to provide a rich inference language as RDFS and OWL. The second contribution concerns the distributed inconsistency processing. We extend the approach presented in first contribution by taking into account inconsistency in the data. This includes: (1)Rules based inconsistency detection with the help of our query rewriting engine; (2)Consistent query evaluation in three different semantics. The third contribution concerns the reasoning and querying over large-scale uncertain RDF data. We propose an MapReduce based approach to deal with large-scale reasoning with uncertainty. Unlike possible worlds semantic, we propose an algorithm for generating intensional Sparql query plan over probabilistic RDF graph for computing the probabilities of results within the query
Gillani, Syed. "Semantically-enabled stream processing and complex event processing over RDF graph streams". Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES055/document.
Texto completoThere is a paradigm shift in the nature and processing means of today’s data: data are used to being mostly static and stored in large databases to be queried. Today, with the advent of new applications and means of collecting data, most applications on the Web and in enterprises produce data in a continuous manner under the form of streams. Thus, the users of these applications expect to process a large volume of data with fresh low latency results. This has resulted in the introduction of Data Stream Processing Systems (DSMSs) and a Complex Event Processing (CEP) paradigm – both with distinctive aims: DSMSs are mostly employed to process traditional query operators (mostly stateless), while CEP systems focus on temporal pattern matching (stateful operators) to detect changes in the data that can be thought of as events. In the past decade or so, a number of scalable and performance intensive DSMSs and CEP systems have been proposed. Most of them, however, are based on the relational data models – which begs the question for the support of heterogeneous data sources, i.e., variety of the data. Work in RDF stream processing (RSP) systems partly addresses the challenge of variety by promoting the RDF data model. Nonetheless, challenges like volume and velocity are overlooked by existing approaches. These challenges require customised optimisations which consider RDF as a first class citizen and scale the processof continuous graph pattern matching. To gain insights into these problems, this thesis focuses on developing scalable RDF graph stream processing, and semantically-enabled CEP systems (i.e., Semantic Complex Event Processing, SCEP). In addition to our optimised algorithmic and data structure methodologies, we also contribute to the design of a new query language for SCEP. Our contributions in these two fields are as follows: • RDF Graph Stream Processing. We first propose an RDF graph stream model, where each data item/event within streams is comprised of an RDF graph (a set of RDF triples). Second, we implement customised indexing techniques and data structures to continuously process RDF graph streams in an incremental manner. • Semantic Complex Event Processing. We extend the idea of RDF graph stream processing to enable SCEP over such RDF graph streams, i.e., temporalpattern matching. Our first contribution in this context is to provide a new querylanguage that encompasses the RDF graph stream model and employs a set of expressive temporal operators such as sequencing, kleene-+, negation, optional,conjunction, disjunction and event selection strategies. Based on this, we implement a scalable system that employs a non-deterministic finite automata model to evaluate these operators in an optimised manner. We leverage techniques from diverse fields, such as relational query optimisations, incremental query processing, sensor and social networks in order to solve real-world problems. We have applied our proposed techniques to a wide range of real-world and synthetic datasets to extract the knowledge from RDF structured data in motion. Our experimental evaluations confirm our theoretical insights, and demonstrate the viability of our proposed methods
Galarraga, Del Prado Luis. "Extraction des règles d'association dans des bases de connaissances". Thesis, Paris, ENST, 2016. http://www.theses.fr/2016ENST0050/document.
Texto completoThe continuous progress of information extraction (IE) techniques has led to the construction of large general-purpose knowledge bases (KBs). These KBs contain millions of computer-readable facts about real-world entities such as people, organizations and places. KBs are important nowadays because they allow computers to “understand” the real world. They are used in multiple applications in Information Retrieval, Query Answering and Automatic Reasoning, among other fields. Furthermore, the plethora of information available in today’s KBs allows for the discovery of frequent patterns in the data, a task known as rule mining. Such patterns or rules convey useful insights about the data. These rules can be used in several applications ranging from data analytics and prediction to data maintenance tasks. The contribution of this thesis is twofold : First, it proposes a method to mine rules on KBs. The method relies on a mining model tailored for potentially incomplete webextracted KBs. Second, the thesis shows the applicability of rule mining in several data-oriented tasks in KBs, namely facts prediction, schema alignment, canonicalization of (open) KBs and prediction of completeness
Galárraga, Del Prado Luis. "Extraction des règles d'association dans des bases de connaissances". Electronic Thesis or Diss., Paris, ENST, 2016. http://www.theses.fr/2016ENST0050.
Texto completoThe continuous progress of information extraction (IE) techniques has led to the construction of large general-purpose knowledge bases (KBs). These KBs contain millions of computer-readable facts about real-world entities such as people, organizations and places. KBs are important nowadays because they allow computers to “understand” the real world. They are used in multiple applications in Information Retrieval, Query Answering and Automatic Reasoning, among other fields. Furthermore, the plethora of information available in today’s KBs allows for the discovery of frequent patterns in the data, a task known as rule mining. Such patterns or rules convey useful insights about the data. These rules can be used in several applications ranging from data analytics and prediction to data maintenance tasks. The contribution of this thesis is twofold : First, it proposes a method to mine rules on KBs. The method relies on a mining model tailored for potentially incomplete webextracted KBs. Second, the thesis shows the applicability of rule mining in several data-oriented tasks in KBs, namely facts prediction, schema alignment, canonicalization of (open) KBs and prediction of completeness
Symeonidou, Danai. "Automatic key discovery for Data Linking". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112265/document.
Texto completoIn the recent years, the Web of Data has increased significantly, containing a huge number of RDF triples. Integrating data described in different RDF datasets and creating semantic links among them, has become one of the most important goals of RDF applications. These links express semantic correspondences between ontology entities or data. Among the different kinds of semantic links that can be established, identity links express that different resources refer to the same real world entity. By comparing the number of resources published on the Web with the number of identity links, one can observe that the goal of building a Web of data is still not accomplished. Several data linking approaches infer identity links using keys. Nevertheless, in most datasets published on the Web, the keys are not available and it can be difficult, even for an expert, to declare them.The aim of this thesis is to study the problem of automatic key discovery in RDF data and to propose new efficient approaches to tackle this problem. Data published on the Web are usually created automatically, thus may contain erroneous information, duplicates or may be incomplete. Therefore, we focus on developing key discovery approaches that can handle datasets with numerous, incomplete or erroneous information. Our objective is to discover as many keys as possible, even ones that are valid in subparts of the data.We first introduce KD2R, an approach that allows the automatic discovery of composite keys in RDF datasets that may conform to different schemas. KD2R is able to treat datasets that may be incomplete and for which the Unique Name Assumption is fulfilled. To deal with the incompleteness of data, KD2R proposes two heuristics that offer different interpretations for the absence of data. KD2R uses pruning techniques to reduce the search space. However, this approach is overwhelmed by the huge amount of data found on the Web. Thus, we present our second approach, SAKey, which is able to scale in very large datasets by using effective filtering and pruning techniques. Moreover, SAKey is capable of discovering keys in datasets where erroneous data or duplicates may exist. More precisely, the notion of almost keys is proposed to describe sets of properties that are not keys due to few exceptions
Cao, Tien Duc. "Toward Automatic Fact-Checking of Statistic Claims". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLX051/document.
Texto completoDigital content is increasingly produced nowadays in a variety of media such as news and social network sites, personal Web sites, blogs etc. In particular, a large and dynamic part of such content is related to media-worthy events, whether of general interest (e.g., the war in Syria) or of specialized interest to a sub-community of users (e.g., sport events or genetically modified organisms). While such content is primarily meant for the human users (readers), interest is growing in its automatic analysis, understanding and exploitation. Within the ANR project ContentCheck, we are interested in developing textual and semantic tools for analyzing content shared through digital media. The proposed PhD project takes place within this contract, and will be developed based on the interactions with our partner from Le Monde. The PhD project aims at developing algorithms and tools for :Classifying and annotating mixed content (from articles, structured databases, social media etc.) based on an existing set of topics (or ontology) ;Information and relation extraction from a text which may comprise a statement to be fact-checked, with a particular focus on capturing the time dimension ; a sample statement is for instance « VAT on iron in France was the highest in Europe in 2015 ».Building structured queries from extracted information and relations, to be evaluated against reference databases used as trusted information against which facts can be checked
Karanasos, Konstantinos. "View-Based techniques for the efficient management of web data". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00755328.
Texto completoIssa, Subhi. "Linked data quality : completeness and conciseness". Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1274.
Texto completoThe wide spread of Semantic Web technologies such as the Resource Description Framework (RDF) enables individuals to build their databases on the Web, to write vocabularies, and define rules to arrange and explain the relationships between data according to the Linked Data principles. As a consequence, a large amount of structured and interlinked data is being generated daily. A close examination of the quality of this data could be very critical, especially, if important research and professional decisions depend on it. The quality of Linked Data is an important aspect to indicate their fitness for use in applications. Several dimensions to assess the quality of Linked Data are identified such as accuracy, completeness, provenance, and conciseness. This thesis focuses on assessing completeness and enhancing conciseness of Linked Data. In particular, we first proposed a completeness calculation approach based on a generated schema. Indeed, as a reference schema is required to assess completeness, we proposed a mining-based approach to derive a suitable schema (i.e., a set of properties) from data. This approach distinguishes between essential properties and marginal ones to generate, for a given dataset, a conceptual schema that meets the user's expectations regarding data completeness constraints. We implemented a prototype called “LOD-CM” to illustrate the process of deriving a conceptual schema of a dataset based on the user's requirements. We further proposed an approach to discover equivalent predicates to improve the conciseness of Linked Data. This approach is based, in addition to a statistical analysis, on a deep semantic analysis of data and on learning algorithms. We argue that studying the meaning of predicates can help to improve the accuracy of results. Finally, a set of experiments was conducted on real-world datasets to evaluate our proposed approaches
Costabello, Luca. "Contrôle d'Accès et Présentation Contextuels pour le Web des Données". Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00908489.
Texto completoDestandau, Marie. "Path-Based Interactive Visual Exploration of Knowledge Graphs". Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG063.
Texto completoKnowledge Graphs facilitate the pooling and sharing of information from different domains. They rely on small units of information named triples that can be combined to form higher-level statements. Producing interactive visual interfaces to explore collections in Knowledge Graphs is a complex problem, mostly unresolved. In this thesis, I introduce the concept of path outlines to encode aggregate information relative to a chain of triples. I demonstrate 3 applications of the concept withthe design and implementation of 3 open source tools. S-Paths lets users browse meaningful overviews of collections; Path Outlines supports data producers in browsing the statements thatcan be produced from their data; and The Missing Path supports data producers in analysingincompleteness in their data. I show that the concept not only supports interactive visual interfaces for Knowledge Graphs but also helps better their quality
Tran, Duc Minh. "Découverte de règles d'association multi-relationnelles à partir de bases de connaissances ontologiques pour l'enrichissement d'ontologies". Thesis, Université Côte d'Azur (ComUE), 2018. http://www.theses.fr/2018AZUR4041/document.
Texto completoIn the Semantic Web context, OWL ontologies represent explicit domain knowledge based on the conceptualization of domains of interest while the corresponding assertional knowledge is given by RDF data referring to them. In this thesis, based on ideas derived from ILP, we aim at discovering hidden knowledge patterns in the form of multi-relational association rules by exploiting the evidence coming from the assertional data of ontological knowledge bases. Specifically, discovered rules are coded in SWRL to be easily integrated within the ontology, thus enriching its expressive power and augmenting the assertional knowledge that can be derived. Two algorithms applied to populated ontological knowledge bases are proposed for finding rules with a high inductive power: (i) level-wise generated-and-test algorithm and (ii) evolutionary algorithm. We performed experiments on publicly available ontologies, validating the performances of our approach and comparing them with the main state-of-the-art systems. In addition, we carry out a comparison of popular asymmetric metrics, originally proposed for scoring association rules, as building blocks for a fitness function for evolutionary algorithm to select metrics that are suitable with data semantics. In order to improve the system performance, we proposed to build an algorithm to compute metrics instead of querying via SPARQL-DL
Michon, Philippe. "Vers une nouvelle architecture de l'information historique : L'impact du Web sémantique sur l'organisation du Répertoire du patrimoine culturel du Québec". Mémoire, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/8776.
Texto completoFan, Zhengjie. "Apprentissage de Motifs Concis pour le Liage de Donnees RDF". Phd thesis, Université de Grenoble, 2014. http://tel.archives-ouvertes.fr/tel-00986104.
Texto completoBenoit-Cattin, Hugues. "Contribution à la compression sous bandes : études des filtres RIF et de la quantification des sous-bandes : applications en imagerie medicale 2D et 3D". Lyon, INSA, 1995. http://www.theses.fr/1995ISAL0103.
Texto completoThis work deals with subband coding of images and its application to medical images. First, we develop a large study on a coding scheme based on an octave decomposition using ID FIR filters. We compare different types of filters and we investigate the bit allocation problem as weil as the subband quantization. Main results concem the effect of filters properties like regularity and phase characteristic on the coding scheme performances. We also show that the JPEG approach and the lattice pyramidal vector quantizer are weil adapted for the quantization of the low frequency and high frequency subbands respectively. Second, we propose an adaptive subband coding scheme which is used to code medical images modalities. Results obtained on mammography are widely developed. We also extend our coding scheme to code 3D data. The frrst results obtained on 3D data coming from a real 3D scanner prototype (the Morphometer) are presented
Chikhaoui, Mohamed. "Apport des données ASTER et d'un réseau de neurones à rétropropagation à la modélisation de la dégradation du sol d'un bassin marneux du Rif marocain". Thèse, Université de Sherbrooke, 2005. http://savoirs.usherbrooke.ca/handle/11143/2747.
Texto completoHouzé, de l'Aulnoit Agathe. "Acquisition du rythme cardiaque fœtal et analyse de données pour la recherche de facteurs prédictifs de l’acidose fœtale". Thesis, Lille, 2019. http://www.theses.fr/2019LIL2S007.
Texto completoVisual analysis of the fetal heart rate FHR is a good method for screening for fetal hypoxia but is not sufficiently specific. The visual morphological analysis of the FHR during labor is subject to inter- and intra-observer variability – particularly when the FHR is abnormal. Underestimating the severity of an FHR leads to undue risk-taking for the fetus with an increase in morbidity and mortality and overvaluation leads to unnecessary obstetric intervention with an increased rate of caesarean section. This last point also induces a French public health problem.FHR automated analysis reduces inter and intra-individual variability and accesses other calculated parameters aimed at increasing the diagnostic value. The FHR morphological analysis parameters (baseline, number of accelerations, number and typing of decelerations, long-term variability (LTV)) were described as well as others such as the decelerations surfaces, short-term variability (STV) and frequency analyzes. Nevertheless, when attempting to analyze the FHR automatically, the main problem is computation of the baseline against which all the other parameters are determined.Automatic analysis provides information on parameters that cannot be derived in a visual analysis and that are likely to improve screening for fetal acidosis during labor.The main objective of the thesis is to establish a predictive model of fetal acidosis from a FHR automated analysis. The secondary objective is to determine the relevance of the classical basic parameters (CNGOF 2007) (baseline, variability, accelerations, decelerations) and that of other parameters inaccessible to the eye (indices of short-term variability, surfaces of decelerations, frequency analysis ...). Later, we want to identify decision criteria that will help in the obstetric care management.We propose to validate FHR automated analysis during labor through a case-control study; cases were FHR recordings of neonatal acidosis (arterial cord pH less than or equal to 7.15) and controls, FHR recordings of neonatal without acidosis (arterial cord pH upper than or equal to 7.25). This is a monocentric study at the maternity hospital of Saint Vincent de Paul Hospital, GHICL - Lille, on our « Well Born » database (digital archiving of RCF plots since 2011), with a sufficient number of cases on this only center. Since 2011, the Saint Vincent de Paul hospital (GHICL) has had about 70 cases per year of neonatal acidosis (pHa less than or equal to 7.10) (3.41%). The R software will be used for statistical analysis
Zampetakis, Stamatis. "Scalable algorithms for cloud-based Semantic Web data management". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112199/document.
Texto completoIn order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system
Bourdenet, Philippe. "L'espace documentaire en restructuration : l'évolution des services des bibliothèques universitaires". Phd thesis, Conservatoire national des arts et metiers - CNAM, 2013. http://tel.archives-ouvertes.fr/tel-00932683.
Texto completoTayq, Zakaria. "Intégration et supervision des liens Fronthaul dans les réseaux 5G". Thesis, Limoges, 2017. http://www.theses.fr/2017LIMO0092/document.
Texto completoCloud Radio Access Network (RAN) was identified as a key enabler for 5G. Its deployment is however meeting multiple challenges notably in the fronthaul integration, the latter being the segment located between the Digital Unit and the Radio Unit generally based on CPRI. Giving its bit-rate, latency and jitter constrains, Wavelength Division Multiplexing (WDM) is the most adequate solution for its transport. However, the radio technologies recommended for 5G will drastically increase the CPRI bit-rate making its transport very challenging with low-cost WDM. This thesis deals with four main topics : The introduction of a control channel in the CPRI enables offering the WDM infrastructure monitoring and the wavelength tunability in the transceivers. The study of this control channel integration in the fronthaul link is reported in the second chapter as well as an investigation on the wireless transmission of CPRI. The use of Analog Radio over Fiber (A-RoF) can significantly improve the fronthaul spectral efficiency compared to CPRI-based fronthaul enabling, potentially, the transport of 5G interfaces. A thorough investigation on the actual gain brought by this solution is stated in the third chapter. CPRI compression based on uniform and non-uniform quantization is also a solution to enhance the CPRI spectral efficiency. The fourth chapter describes this solution and experimentally shows the achievable compression rates. Finally, establishing a new functional split in the radio equipment was considered as a promising solution for 5G. Two new interfaces have been identified for high and low layer functional splits. A theoretical and experimental study of these new interfaces is reported in the fifth chapter