Letteratura scientifica selezionata sul tema "RDF datasets"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Consulta la lista di attuali articoli, libri, tesi, atti di convegni e altre fonti scientifiche attinenti al tema "RDF datasets".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Articoli di riviste sul tema "RDF datasets":

1

Ri Kim, Ju, e Sung Kook Han. "R2RS: schema-based relational databases mapping to linked datasets". International Journal of Engineering & Technology 7, n. 3.3 (8 giugno 2018): 119. http://dx.doi.org/10.14419/ijet.v7i2.33.13868.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Background/Objectives: The vast amounts of high-quality data stored in relational databases (RDB) is the primary resources for Linked Open Data (LOD) datasets. This paper proposes a schema-based mapping approach from RDB to RDF, which provides succinct and efficient mapping.Methods/Statistical analysis: The various approaches, languages and tools for mapping RDB to LOD have been proposed in the recent years. This paper surveys and analyzes classic mapping approach and language such as Direct Mapping and R2RML. The mapping approaches can be categorized by means of their data modeling. After analyzing the conventional RDB-RDF mapping methods, this paper proposes a new mapping method and discusses its typical features and applications.Findings: There are two types of mapping approaches for the translation of RDB to RDF: instance-based and schema-based mapping approaches. The instance-based mapping approaches generate large amounts of RDF graphs by means of mapping rules. These approaches causes data redundancy since the same data is stored in two ways of RDB and RDF. It is very easy to bring the data inconsistence problem when data update operations occur. The schema-based mapping approaches can effectively avoid data redundancy since the mapping can be accomplished in the conceptual schema level.The architecture of SPARQL endpoint based on schema mapping approach consists of five phases:Generation of mapping description based on mapping rules.SPARQL query statements for RDF graph patterns.Translation of SPARQL query into SQL query.Execution of SQL query in RDB.Interpretation of SQL query result into JSON-LD format.Experiments show the schema-based mapping approach is a straightforward, succinct and efficient mapping method for RDB2RDF.Improvements/Applications: This paper proposes a schema-based mapping approach called R2RS, which shows better performance than the conventional mapping methods. In addition, R2RS also provides the efficient implementation of SPARQL endpoint in RDB.
2

Sultana, Tangina, e Young-Koo Lee. "gRDF: An Efficient Compressor with Reduced Structural Regularities That Utilizes gRePair". Sensors 22, n. 7 (26 marzo 2022): 2545. http://dx.doi.org/10.3390/s22072545.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The explosive volume of semantic data published in the Resource Description Framework (RDF) data model demands efficient management and compression with better compression ratio and runtime. Although extensive work has been carried out for compressing the RDF datasets, they do not perform well in all dimensions. However, these compressors rarely exploit the graph patterns and structural regularities of real-world datasets. Moreover, there are a variety of existing approaches that reduce the size of a graph by using a grammar-based graph compression algorithm. In this study, we introduce a novel approach named gRDF (graph repair for RDF) that uses gRePair, one of the most efficient grammar-based graph compression schemes, to compress the RDF dataset. In addition to that, we have improved the performance of HDT (header-dictionary-triple), an efficient approach for compressing the RDF datasets based on structural properties, by introducing modified HDT (M-HDT). It can detect the frequent graph pattern by employing the data-structure-oriented approach in a single pass from the dataset. In our proposed system, we use M-HDT for indexing the nodes and edge labels. Then, we employ gRePair algorithm for identifying the grammar from the RDF graph. Afterward, the system improves the performance of k2-trees by introducing a more efficient algorithm to create the trees and serialize the RDF datasets. Our experiments affirm that the proposed gRDF scheme can substantially achieve at approximately 26.12%, 13.68%, 6.81%, 2.38%, and 12.76% better compression ratio when compared with the most prominent state-of-the-art schemes such as HDT, HDT++, k2-trees, RDF-TR, and gRePair in the case of real-world datasets. Moreover, the processing efficiency of our proposed scheme also outperforms others.
3

MARX, EDGARD, TOMMASO SORU, SAEEDEH SHEKARPOUR, SÖREN AUER, AXEL-CYRILLE NGONGA NGOMO e KARIN BREITMAN. "TOWARDS AN EFFICIENT RDF DATASET SLICING". International Journal of Semantic Computing 07, n. 04 (dicembre 2013): 455–77. http://dx.doi.org/10.1142/s1793351x13400151.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Over the last years, a considerable amount of structured data has been published on the Web as Linked Open Data (LOD). Despite recent advances, consuming and using Linked Open Data within an organization is still a substantial challenge. Many of the LOD datasets are quite large and despite progress in Resource Description Framework (RDF) data management their loading and querying within a triple store is extremely time-consuming and resource-demanding. To overcome this consumption obstacle, we propose a process inspired by the classical Extract-Transform-Load (ETL) paradigm. In this article, we focus particularly on the selection and extraction steps of this process. We devise a fragment of SPARQL Protocol and RDF Query Language (SPARQL) dubbed SliceSPARQL, which enables the selection of well-defined slices of datasets fulfilling typical information needs. SliceSPARQL supports graph patterns for which each connected subgraph pattern involves a maximum of one variable or Internationalized resource identifier (IRI) in its join conditions. This restriction guarantees the efficient processing of the query against a sequential dataset dump stream. Furthermore, we evaluate our slicing approach on three different optimization strategies. Results show that dataset slices can be generated an order of magnitude faster than by using the conventional approach of loading the whole dataset into a triple store.
4

Hietanen, E., L. Lehto e P. Latvala. "PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (8 giugno 2016): 583–86. http://dx.doi.org/10.5194/isprs-archives-xli-b2-583-2016.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. <br><br> A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
5

Hietanen, E., L. Lehto e P. Latvala. "PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI". ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B2 (8 giugno 2016): 583–86. http://dx.doi.org/10.5194/isprsarchives-xli-b2-583-2016.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. &lt;br&gt;&lt;br&gt; A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
6

Cheng, Long, e Spyros Kotoulas. "Scale-Out Processing of Large RDF Datasets". IEEE Transactions on Big Data 1, n. 4 (1 dicembre 2015): 138–50. http://dx.doi.org/10.1109/tbdata.2015.2505719.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Harbi, Razen, Ibrahim Abdelaziz, Panos Kalnis e Nikos Mamoulis. "Evaluating SPARQL queries on massive RDF datasets". Proceedings of the VLDB Endowment 8, n. 12 (agosto 2015): 1848–51. http://dx.doi.org/10.14778/2824032.2824083.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Gu, Jinguang, Hao Dong, Zhao Liu e Fangfang Xu. "Distributed Top-K Join Queries Optimizing for RDF Datasets". International Journal of Web Services Research 14, n. 3 (luglio 2017): 67–83. http://dx.doi.org/10.4018/ijwsr.2017070105.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In recent years, the scale of RDF datasets is increasing rapidly, the query research on RDF datasets in the transitional centralized environment is unable to meet the increasing demand of data query field, especially the top-k query. Based on the Spark distributed computing system and the HBase distributed storage system, a novel method is proposed for top-k query. A top–k query plan STA (Spark Threshold Algorithm) is proposed to reduce the connection operation of RDF data. Furthermore, a better algorithm SSJA (Spark Simple Join Algorithm) is presented to reduce the sorting related operations for the intermediate data. A cache mechanism is also proposed to speed up the SSJA algorithm. The experimental results show that the SSJA algorithm performs better than the STA algorithm in term of the cost and applicability, and it can significantly improve the SSJA's performance by introducing the cache mechanism.
9

Rakhmawati, Nur Aini, e Lutfi Nur Fadzilah. "Dataset Characteristics Identification for Federated SPARQL Query". Scientific Journal of Informatics 6, n. 1 (24 maggio 2019): 23–33. http://dx.doi.org/10.15294/sji.v6i1.17258.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Nowadays, the amount of data published in the RDF format is increasing. Federated SPARQL query engines that can query from multiple distributed SPARQL endpoints have been developed recently. A federated query engine usually has different performance compared to the others. One of the factors that affect the performance of the query engine is the characteristic of the accessed RDF dataset, such as the number of triples, the number of classes, the number of properties, the number of subjects, the number of entities, the number of objects, and the spreading factor of a dataset. The aim of this work is to identify the characteristic of RDF dataset and create a query set for evaluating a federated engine. The study was conducted by identifying 16 datasets that used by ten research papers in Linked Data area.
10

McGlothlin, James, e Latifur Khan. "Materializing Inferred and Uncertain Knowledge in RDF Datasets". Proceedings of the AAAI Conference on Artificial Intelligence 24, n. 1 (5 luglio 2010): 1951–52. http://dx.doi.org/10.1609/aaai.v24i1.7786.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
There is a growing need for efficient and scalable semantic web queries that handle inference. There is also a growing interest in representing uncertainty in semantic web knowledge bases. In this paper, we present a bit vector schema specifically designed for RDF (Resource Description Framework) datasets. We propose a system for materializing and storing inferred knowledge using this schema. We show experimental results that demonstrate that our solution drastically improves the performance of inference queries. We also propose a solution for materializing uncertain information and probabilities using multiple bit vectors and thresholds.

Tesi sul tema "RDF datasets":

1

AZEVEDO, MARCELO COHEN DE. "AN APPLICATION BUILDER FOR QUERING RDF/RDFS DATASETS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2010. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=15978@1.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Com o crescimento da web semântica, cada vez mais bases de dados em RDF contendo todo tipo de informações, nos mais variados domínios, estão disponíveis para acesso na Internet. Para auxiliar o acesso e a integração dessas informações, esse trabalho apresenta uma ferramenta que permite a geração de aplicações para consultas a bases em RDF e RDFS através da programação por exemplo. Usuários podem criar casos de uso através de operações simples em cima do modelo RFDS da própria base. Esses casos de uso podem ser generalizados e compartilhados com outros usuários, que podem reutilizá-los. Com esse compartilhamento, cria-se a possibilidade desses casos de uso serem customizados e evoluídos colaborativamente no próprio ambiente em que foram desenvolvidos. Novas operações também podem ser criadas e compartilhadas, o que contribui para o aumento gradativo do poder da ferramenta. Finalmente, utilizando um conjunto desses casos de uso, é possível gerar uma aplicação web que abstraia o modelo RDF em que os dados estão representados, tornando possível o acesso a essas informações por usuários que não conheçam o modelo RDF.
Due to increasing popularity of the semantic web, more data sets, containing information about varied domains, have become available for access in the Internet. This thesis proposes a tool to assist accessing and exploring this information. This tool allows the generation of applications for querying databases in RDF and RDFS through programming by example. Users are able to create use cases through simple operations using the RDFS model. These use cases can be generalized and shared with other users, who can reuse them. The shared use cases can be customized and extended collaboratively in the environment which they were developed. New operations can also be created and shared, making the tool increasingly more powerful. Finally, using a set of use cases, it’s possible to generate a web application that abstracts the RDF model where the data is represented, making it possible for lay users to access this information without any knowledge of the RDF model.
2

Arndt, Natanael, Norman Radtke e Michael Martin. "Distributed collaboration on RDF datasets using Git". Universität Leipzig, 2016. https://ul.qucosa.de/id/qucosa%3A15781.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Collaboration is one of the most important topics regarding the evolution of the World Wide Web and thus also for the Web of Data. In scenarios of distributed collaboration on datasets it is necessary to provide support for multiple different versions of datasets to exist simultaneously, while also providing support for merging diverged datasets. In this paper we present an approach that uses SPARQL 1.1 in combination with the version control system Git, that creates commits for all changes applied to an RDF dataset containing multiple named graphs. Further the operations provided by Git are used to distribute the commits among collaborators and merge diverged versions of the dataset. We show the advantages of (public) Git repositories for RDF datasets and how this represents a way to collaborate on RDF data and consume it. With SPARQL 1.1 and Git in combination, users are given several opportunities to participate in the evolution of RDF data.
3

Fernández, Javier D., Miguel A. Martí­nez-Prieto, Axel Polleres e Julian Reindorf. "HDTQ: Managing RDF Datasets in Compressed Space". Springer International Publishing, 2018. http://epub.wu.ac.at/6482/1/HDTQ.pdf.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems.
4

Fernandez, Garcia Javier David, Sabrina Kirrane, Axel Polleres e Simon Steyskal. "HDT crypt: Compression and Encryption of RDF Datasets". IOS Press, 2018. http://epub.wu.ac.at/6489/1/HDTCrypt%2DSWJ.pdf.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The publication and interchange of RDF datasets online has experienced significant growth in recent years, promoted by different but complementary efforts, such as Linked Open Data, the Web of Things and RDF stream processing systems. However, the current Linked Data infrastructure does not cater for the storage and exchange of sensitive or private data. On the one hand, data publishers need means to limit access to confidential data (e.g. health, financial, personal, or other sensitive data). On the other hand, the infrastructure needs to compress RDF graphs in a manner that minimises the amount of data that is both stored and transferred over the wire. In this paper, we demonstrate how HDT - a compressed serialization format for RDF - can be extended to cater for supporting encryption. We propose a number of different graph partitioning strategies and discuss the benefits and tradeoffs of each approach.
5

Sejdiu, Gezim [Verfasser]. "Efficient Distributed In-Memory Processing of RDF Datasets / Gezim Sejdiu". Bonn : Universitäts- und Landesbibliothek Bonn, 2020. http://d-nb.info/1221669214/34.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Moreno, Vega José Ignacio. "A faceted browsing interface for diverse Large-Scale RDF Datasets". Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168108.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Magíster en Ciencias, Mención Computación.
Las bases de conocimiento en RDF contienen información acerca de millones de recursos, las cuales son consultadas utilizando el lenguaje estándar de consultas para RDF: SPARQL. Sin embargo, esta información no está accesible fácilmente porque requiere conocer el lenguaje SPARQL y la estructura de los datos a consultar; requisitos que no cumple un usuario común de internet. Se propone una interfaz de navegación por facetas para estos datos de gran tamaño que no requiere conocimientos previos de la estructura ni de SPARQL. La navegación por facetas consiste en agregar filtros (conocidos como facetas) para mostrar únicamente los elementos que cumplen los requisitos. Interfaces de navegación por facetas para RDF existentes no escalan bien para las bases de conocimientos actuales. Se propone un nuevo sistema que crea índices para búsquedas fáciles y rápidas sobre los datos, permitiendo calcular y sugerir facetas al usuario. Para validar la escalabilidad y eficiencia del sistema, se escogió Wikidata como la base de datos de gran tamaño para realizar los experimentos de desempeño. Luego, se realizó un estudio de usuarios para evaluar la usabilidad e interacción del sistema, los resultados obtenidos muestran en qué aspectos el sistema desempeña bien y cuáles pueden ser mejorados. Un prototipo final junto a un cuestionario fue enviado a contribuidores de Wikidata para descubrir como este sistema puede ayudar a la comunidad.
7

Arndt, Natanael, e Norman Radtke. "Quit diff: calculating the delta between RDF datasets under version control". Universität Leipzig, 2016. https://ul.qucosa.de/id/qucosa%3A15780.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Distributed actors working on a common RDF dataset regularly encounter the issue to compare the status of one graph with another or generally to synchronize copies of a dataset. A versioning system helps to synchronize the copies of a dataset, combined with a difference calculation system it is also possible to compare versions in a log and to determine, in which version a certain statement was introduced or removed. In this demo we present Quit Diff 1, a tool to compare versions of a Git versioned quad store, while it is also applicable to simple unversioned RDF datasets. We are following an approach to abstract from differences on a syntactical level to differences on the level of the RDF data model, while we leave further semantic interpretation on the schema and instance level to specialized applications. Quit Diff can generate patches in various output formats and can be directly integrated in the distributed version control system Git which provides a foundation for a comprehensive co-evolution work flow on RDF datasets.
8

Sherif, Mohamed Ahmed Mohamed. "Automating Geospatial RDF Dataset Integration and Enrichment". Doctoral thesis, Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-215708.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Over the last years, the Linked Open Data (LOD) has evolved from a mere 12 to more than 10,000 knowledge bases. These knowledge bases come from diverse domains including (but not limited to) publications, life sciences, social networking, government, media, linguistics. Moreover, the LOD cloud also contains a large number of crossdomain knowledge bases such as DBpedia and Yago2. These knowledge bases are commonly managed in a decentralized fashion and contain partly verlapping information. This architectural choice has led to knowledge pertaining to the same domain being published by independent entities in the LOD cloud. For example, information on drugs can be found in Diseasome as well as DBpedia and Drugbank. Furthermore, certain knowledge bases such as DBLP have been published by several bodies, which in turn has lead to duplicated content in the LOD . In addition, large amounts of geo-spatial information have been made available with the growth of heterogeneous Web of Data. The concurrent publication of knowledge bases containing related information promises to become a phenomenon of increasing importance with the growth of the number of independent data providers. Enabling the joint use of the knowledge bases published by these providers for tasks such as federated queries, cross-ontology question answering and data integration is most commonly tackled by creating links between the resources described within these knowledge bases. Within this thesis, we spur the transition from isolated knowledge bases to enriched Linked Data sets where information can be easily integrated and processed. To achieve this goal, we provide concepts, approaches and use cases that facilitate the integration and enrichment of information with other data types that are already present on the Linked Data Web with a focus on geo-spatial data. The first challenge that motivates our work is the lack of measures that use the geographic data for linking geo-spatial knowledge bases. This is partly due to the geo-spatial resources being described by the means of vector geometry. In particular, discrepancies in granularity and error measurements across knowledge bases render the selection of appropriate distance measures for geo-spatial resources difficult. We address this challenge by evaluating existing literature for point set measures that can be used to measure the similarity of vector geometries. Then, we present and evaluate the ten measures that we derived from the literature on samples of three real knowledge bases. The second challenge we address in this thesis is the lack of automatic Link Discovery (LD) approaches capable of dealing with geospatial knowledge bases with missing and erroneous data. To this end, we present Colibri, an unsupervised approach that allows discovering links between knowledge bases while improving the quality of the instance data in these knowledge bases. A Colibri iteration begins by generating links between knowledge bases. Then, the approach makes use of these links to detect resources with probably erroneous or missing information. This erroneous or missing information detected by the approach is finally corrected or added. The third challenge we address is the lack of scalable LD approaches for tackling big geo-spatial knowledge bases. Thus, we present Deterministic Particle-Swarm Optimization (DPSO), a novel load balancing technique for LD on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial data sets. The lack of approaches for automatic updating of links of an evolving knowledge base is our fourth challenge. This challenge is addressed in this thesis by the Wombat algorithm. Wombat is a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of Link Specifications (LS). We study the theoretical characteristics of Wombat and evaluate it on different benchmark data sets. The last challenge addressed herein is the lack of automatic approaches for geo-spatial knowledge base enrichment. Thus, we propose Deer, a supervised learning approach based on a refinement operator for enriching Resource Description Framework (RDF) data sets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples. Each of the proposed approaches is implemented and evaluated against state-of-the-art approaches on real and/or artificial data sets. Moreover, all approaches are peer-reviewed and published in a conference or a journal paper. Throughout this thesis, we detail the ideas, implementation and the evaluation of each of the approaches. Moreover, we discuss each approach and present lessons learned. Finally, we conclude this thesis by presenting a set of possible future extensions and use cases for each of the proposed approaches.
9

Rihany, Mohamad. "Keyword Search and Summarization Approaches for RDF Dataset Exploration". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG030.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Un nombre croissant de sources de données sont publiées sur le web, exprimées dans les langages proposés par le W3C comme RDF, RDF (S) et OWL. Ces sources représentent un volume de données sans précédent disponible pour les utilisateurs et les applications. Afin d’identifier les sources les plus pertinentes et de les utiliser, il est nécessaire d’en connaître le contenu, par exemple au moyen de requêtes écrites en Sparql, le langage d’interrogation proposé par le W3C pour les sources de données RDF. Mais cela nécessite, en plus de la maîtrise du langage Sparql, de disposer de connaissances sur le contenu de la source en termes de ressources, classes ou propriétés qu’elle contient. L’objectif de ma thèse est d’étudier des approches permettant de fournir un support à l’exploration d’une source de données RDF. Nous avons proposé deux approches complémentaires, la recherche mots-clés et le résumé d’un graphe RDF.La recherche mots-clés dans un graphe RDF renvoie un ou plusieurs sous-graphes en réponse à une requête exprimée comme un ensemble de termes à rechercher. Chaque sous-graphe est l’agrégation d’éléments extraits du graphe initial, et représente une réponse possible à la requête constituée par un ensemble de mots-clés. Les sous-graphes retournés peuvent être classés en fonction de leur pertinence. La recherche par mot-clé dans des sources de données RDF soulève les problèmes suivants : (i) l’identification pour chaque mot-clé de la requête des éléments correspondants dans le graphe considéré, en prenant en compte les différences de terminologies existant entre les mots-clés et les termes utilisés dans le graphe RDF, (ii) la combinaison des éléments de graphes retournés pour construire un sous-graphe résultat en utilisant des algorithmes d’agrégation capable de déterminer la meilleure façon de relier les éléments du graphe correspondant à des mots-clés, et enfin (iii), comme il peut exister plusieurs éléments du graphe qui correspondent à un même mot-clé, et par conséquent plusieurs sous-graphes résultat, il s’agit d’évaluer la pertinence de ces sous-graphes par l’utilisation de métriques appropriées. Dans notre travail, nous avons proposé une approche de recherche par mot-clé qui apporte des solutions aux problèmes ci-dessus.Fournir une vue résumée d’un graphe RDF peut être utile afin de déterminer si ce graphe correspond aux besoins d’un utilisateur particulier en mettant en évidence ses éléments les plus importants ; une telle vue résumée peut faciliter l’exploration du graphe. Dans notre travail, nous avons proposé une approche de résumé originale fondée sur l’identification des thèmes sous-jacents dans un graphe RDF. Notre approche de résumé consiste à extraire ces thèmes, puis à construire le résumé en garantissant que tous les thèmes sont représentés dans le résultat. Cela pose les questions suivantes : (i) comment identifier les thèmes dans un graphe RDF ? (ii) quels sont les critères adaptés pour identifier les éléments les plus pertinents dans les sous-graphes correspondants à un thème ? (iii) comment connecter les éléments les plus pertinents pour créer le résumé d’une thème ? et enfin (iv) comment générer un résumé pour le graphe initial à partir des résumés de thèmes ? Dans notre travail, nous avons proposé une approche qui fournit des réponses à ces questions et qui produit une représentation résumée d’un graphe RDF garantissant que chaque thème y est représenté proportionnellement à son importance dans le graphe initial
An increasing number of datasets are published on the Web, expressed in the standard languages proposed by the W3C such as RDF, RDF (S), and OWL. These datasets represent an unprecedented amount of data available for users and applications. In order to identify and use the relevant datasets, users and applications need to explore them using queries written in SPARQL, a query language proposed by the W3C. But in order to write a SPARQL query, a user should not only be familiar with the query language but also have knowledge about the content of the RDF dataset in terms of the resources, classes or properties it contains. The goal of this thesis is to provide approaches to support the exploration of these RDF datasets. We have studied two alternative and complementary exploration techniques, keyword search and summarization of an RDF dataset. Keyword search returns RDF graphs in response to a query expressed as a set of keywords, where each resulting graph is the aggregation of elements extracted from the source dataset. These graphs represent possible answers to the keyword query, and they can be ranked according to their relevance. Keyword search in RDF datasets raises the following issues: (i) identifying for each keyword in the query the matching elements in the considered dataset, taking into account the differences of terminology between the keywords and the terms used in the RDF dataset, (ii) combining the matching elements to build the result by defining aggregation algorithms that find the best way of linking matching elements, and finally (iii), finding appropriate metrics to rank the results, as several matching elements may exist for each keyword and consequently several graphs may be returned. In our work, we propose a keyword search approach that addresses these issues. Providing a summarized view of an RDF dataset can help a user in identifying if this dataset is relevant to his needs, and in highlighting its most relevant elements. This could be useful for the exploration of a given dataset. In our work, we propose a novel summarization approach based on the underlying themes of a dataset. Our theme-based summarization approach consists of extracting the existing themes in a data source, and building the summarized view so as to ensure that all these discovered themes are represented. This raises the following questions: (i) how to identify the underlying themes in an RDF dataset? (ii) what are the suitable criteria to identify the relevant elements in the themes extracted from the RDF graph? (iii) how to aggregate and connect the relevant elements to create a theme summary? and finally, (iv) how to create the summary for the whole RDF graph from the generated theme summaries? In our work, we propose a theme-based summarization approach for RDF datasets which answers these questions and provides a summarized representation ensuring that each theme is represented proportionally to its importance in the initial dataset
10

Ben, Ellefi Mohamed. "La recommandation des jeux de données basée sur le profilage pour le liage des données RDF". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT276/document.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Avec l’émergence du Web de données, notamment les données ouvertes liées, une abondance de données est devenue disponible sur le web. Cependant, les ensembles de données LOD et leurs sous-graphes inhérents varient fortement par rapport a leur taille, le thème et le domaine, les schémas et leur dynamicité dans le temps au niveau des données. Dans ce contexte, l'identification des jeux de données appropriés, qui répondent a des critères spécifiques, est devenue une tâche majeure, mais difficile a soutenir, surtout pour répondre a des besoins spécifiques tels que la recherche d'entités centriques et la recherche des liens sémantique des données liées. Notamment, en ce qui concerne le problème de liage des données, le besoin d'une méthode efficace pour la recommandation des jeux de données est devenu un défi majeur, surtout avec l'état actuel de la topologie du LOD, dont la concentration des liens est très forte au niveau des graphes populaires multi-domaines tels que DBpedia et YAGO, alors qu'une grande liste d'autre jeux de données considérés comme candidats potentiels pour le liage est encore ignorée. Ce problème est dû a la tradition du web sémantique dans le traitement du problème de "identification des jeux de données candidats pour le liage". Bien que la compréhension de la nature du contenu d'un jeu de données spécifique est une condition cruciale pour les cas d'usage mentionnées, nous adoptons dans cette thèse la notion de "profil de jeu de données"- un ensemble de caractéristiques représentatives pour un jeu de données spécifique, notamment dans le cadre de la comparaison avec d'autres jeux de données. Notre première direction de recherche était de mettre en œuvre une approche de recommandation basée sur le filtrage collaboratif, qui exploite à la fois les prols thématiques des jeux de données, ainsi que les mesures de connectivité traditionnelles, afin d'obtenir un graphe englobant les jeux de données du LOD et leurs thèmes. Cette approche a besoin d'apprendre le comportement de la connectivité des jeux de données dans le LOD graphe. Cependant, les expérimentations ont montré que la topologie actuelle de ce nuage LOD est loin d'être complète pour être considéré comme des données d'apprentissage.Face aux limites de la topologie actuelle du graphe LOD, notre recherche a conduit a rompre avec cette représentation de profil thématique et notamment du concept "apprendre pour classer" pour adopter une nouvelle approche pour l'identification des jeux de données candidats basée sur le chevauchement des profils intensionnels entre les différents jeux de données. Par profil intensionnel, nous entendons la représentation formelle d'un ensemble d'étiquettes extraites du schéma du jeu de données, et qui peut être potentiellement enrichi par les descriptions textuelles correspondantes. Cette représentation fournit l'information contextuelle qui permet de calculer la similarité entre les différents profils d'une manière efficace. Nous identifions le chevauchement de différentes profils à l'aide d'une mesure de similarité semantico-fréquentielle qui se base sur un classement calcule par le tf*idf et la mesure cosinus. Les expériences, menées sur tous les jeux de données lies disponibles sur le LOD, montrent que notre méthode permet d'obtenir une précision moyenne de 53% pour un rappel de 100%.Afin d'assurer des profils intensionnels de haute qualité, nous introduisons Datavore- un outil oriente vers les concepteurs de métadonnées qui recommande des termes de vocabulaire a réutiliser dans le processus de modélisation des données. Datavore fournit également les métadonnées correspondant aux termes recommandés ainsi que des propositions des triples utilisant ces termes. L'outil repose sur l’écosystème des Vocabulaires Ouverts Lies (LOV) pour l'acquisition des vocabulaires existants et leurs métadonnées
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community

Capitoli di libri sul tema "RDF datasets":

1

Kellou-Menouer, Kenza, e Zoubida Kedad. "Discovering Types in RDF Datasets". In The Semantic Web: ESWC 2015 Satellite Events, 77–81. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-25639-9_15.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Casanova, Marco A. "Keyword Search over RDF Datasets". In Conceptual Modeling, 7–10. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-33223-5_2.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Faisal, Sidra, Kemele M. Endris, Saeedeh Shekarpour, Sören Auer e Maria-Esther Vidal. "Co-evolution of RDF Datasets". In Lecture Notes in Computer Science, 225–43. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-38791-8_13.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Troullinou, Georgia, Haridimos Kondylakis e Dimitris Plexousakis. "Semantic Partitioning for RDF Datasets". In Communications in Computer and Information Science, 99–115. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68282-2_7.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Behan, Jam Jahanzeb Khan, Oscar Romero e Esteban Zimányi. "Multidimensional Integration of RDF Datasets". In Big Data Analytics and Knowledge Discovery, 119–35. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-27520-4_9.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Dosso, Dennis. "Keyword Search on RDF Datasets". In Lecture Notes in Computer Science, 332–36. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-15719-7_44.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Rihany, Mohamad, Zoubida Kedad e Stéphane Lopes. "Theme-Based Summarization for RDF Datasets". In Lecture Notes in Computer Science, 312–21. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-59051-2_21.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Avgoustaki, Argyro, Giorgos Flouris, Irini Fundulaki e Dimitris Plexousakis. "Provenance Management for Evolving RDF Datasets". In The Semantic Web. Latest Advances and New Domains, 575–92. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-34129-3_35.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Luo, Yongming, François Picalausa, George H. L. Fletcher, Jan Hidders e Stijn Vansummeren. "Storing and Indexing Massive RDF Datasets". In Semantic Search over the Web, 31–60. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-25008-8_2.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Swacha, Jakub, e Szymon Grabowski. "OFR: An Efficient Representation of RDF Datasets". In Communications in Computer and Information Science, 224–35. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-27653-3_22.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri

Atti di convegni sul tema "RDF datasets":

1

Goy, Anna, Diego Magro e Francesco Conforti. "Exploring RDF Datasets with LDscout". In 10th International Conference on Knowledge Management and Information Sharing. SCITEPRESS - Science and Technology Publications, 2018. http://dx.doi.org/10.5220/0006957600920100.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Bouhamoum, Redouane, Kenza Kellou-Menouer, Stephane Lopes e Zoubida Kedad. "Scaling Up Schema Discovery for RDF Datasets". In 2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW). IEEE, 2018. http://dx.doi.org/10.1109/icdew.2018.00021.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Arndt, Natanael, Norman Radtke e Michael Martin. "Distributed Collaboration on RDF Datasets Using Git". In SEMANTiCS 2016: 12th International Conference on Semantic Systems. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2993318.2993328.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Morari, Alessandro, Jesse Weaver, Oreste Villa, David Haglin, Antonino Tumeo, Vito Giovanni Castellana e John Feo. "High-Performance, Distributed Dictionary Encoding of RDF Datasets". In 2015 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 2015. http://dx.doi.org/10.1109/cluster.2015.44.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Regino, André Gomes, e Julio Cesar dos Reis. "Leveraging Linked Open Data: A Link Maintenance Framework". In Simpósio Brasileiro de Sistemas Multimídia e Web. Sociedade Brasileira de Computação - SBC, 2022. http://dx.doi.org/10.5753/webmedia_estendido.2022.225651.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Connections among RDF (Resource Description Framework) data elements represent the core of LOD (Linked Open Data). These connections are built with semi-automatic linking algorithms using a variety of similarity methods. Interconnected data demand automatic methods to maintain their consistency. Constant update of RDF connections is relevant for the evolution of RDF datasets. However, changing operations can influence well-formed links, which turns difficult the consistency of the connections over time. This study investigated new methods responsible for fixing and updating links among structured data following ontologies rules and properties. We contribute with the design and development of an automatic method that updates RDF links based on changing operations in RDF datasets. The framework that implements our method - named LODMF - was evaluated in terms of discovering broken links in big and well-known Linked Open datasets.
6

Pouriyeh, Seyedamin, Mehdi Allahyaril, Gong Cheng, Hamid Reza Arabnia, Krys Kochut e Maurizio Atzori. "R-LDA: Profiling RDF Datasets Using Knowledge-Based Topic Modeling". In 2019 IEEE 13th International Conference on Semantic Computing (ICSC). IEEE, 2019. http://dx.doi.org/10.1109/icosc.2019.8665510.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Shahinyan, Tigran. "Automatic data analysis of RDF datasets using apache spark GraphX". In PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON FRONTIER OF DIGITAL TECHNOLOGY TOWARDS A SUSTAINABLE SOCIETY. AIP Publishing, 2023. http://dx.doi.org/10.1063/5.0135779.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Dosso, Dennis, e Gianmaria Silvello. "A Scalable Virtual Document-Based Keyword Search System for RDF Datasets". In SIGIR '19: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3331184.3331284.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Gu, Jinguang, Hao Dong, Zhao Liu e Fangfang Xu. "Research on optimizing top-K join queries for RDF datasets based on spark". In the first S2 International Coference on Internet of Things. World Press Group, Inc (WPG), 2016. http://dx.doi.org/10.29268/iciot.2016.0018.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Ragab, Mohamed, Riccardo Tommasini, Sadiq Eyvazov e Sherif Sakr. "Towards making sense of Spark-SQL performance for processing vast distributed RDF datasets". In SIGMOD/PODS '20: International Conference on Management of Data. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3391274.3393632.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri

Rapporti di organizzazioni sul tema "RDF datasets":

1

Brun, Matthieu. Impact assessment of Bpifrance’s financial support to SMEs’ innovation projects. Fteval - Austrian Platform for Research and Technology Policy Evaluation, aprile 2022. http://dx.doi.org/10.22163/fteval.2022.555.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This paper evaluates the economic impact of Bpifrance’s financial programmes to support SMEs’ Research, Development and Innovation (RDI), called individual aid for innovation (IA). It focuses on the analysis of subsidies and zero-interest loans granted to SMEs over three years old during the period 2005-2018 in order to foster their RDI activity (R&D expenses and spending related to the development of innovative products, processes or services) and economic growth (turnover, employment). We use a difference-in-differences methodology combined with a propensity score matching procedure to compare supported SMEs with non-supported SMEs with same initial characteristics. This counterfactual analysis is based on a unique dataset containing both financial and non-financial information about millions of French companies. Up to 12,000 SMEs supported over the 2005-2016 period have thus been analysed, making this study the first to estimate the effect of Bpifrance’s individual aid for innovation on such a scale and using such detailed information.
2

Idakwo, Gabriel, Sundar Thangapandian, Joseph Luttrell, Zhaoxian Zhou, Chaoyang Zhang e Ping Gong. Deep learning-based structure-activity relationship modeling for multi-category toxicity classification : a case study of 10K Tox21 chemicals with high-throughput cell-based androgen receptor bioassay data. Engineer Research and Development Center (U.S.), luglio 2021. http://dx.doi.org/10.21079/11681/41302.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Deep learning (DL) has attracted the attention of computational toxicologists as it offers a potentially greater power for in silico predictive toxicology than existing shallow learning algorithms. However, contradicting reports have been documented. To further explore the advantages of DL over shallow learning, we conducted this case study using two cell-based androgen receptor (AR) activity datasets with 10K chemicals generated from the Tox21 program. A nested double-loop cross-validation approach was adopted along with a stratified sampling strategy for partitioning chemicals of multiple AR activity classes (i.e., agonist, antagonist, inactive, and inconclusive) at the same distribution rates amongst the training, validation and test subsets. Deep neural networks (DNN) and random forest (RF), representing deep and shallow learning algorithms, respectively, were chosen to carry out structure-activity relationship-based chemical toxicity prediction. Results suggest that DNN significantly outperformed RF (p < 0.001, ANOVA) by 22–27% for four metrics (precision, recall, F-measure, and AUPRC) and by 11% for another (AUROC). Further in-depth analyses of chemical scaffolding shed insights on structural alerts for AR agonists/antagonists and inactive/inconclusive compounds, which may aid in future drug discovery and improvement of toxicity prediction modeling.

Vai alla bibliografia