Tesi sul tema "RDF datasets"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: RDF datasets.

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-23 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "RDF datasets".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.

1

AZEVEDO, MARCELO COHEN DE. "AN APPLICATION BUILDER FOR QUERING RDF/RDFS DATASETS". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2010. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=15978@1.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Com o crescimento da web semântica, cada vez mais bases de dados em RDF contendo todo tipo de informações, nos mais variados domínios, estão disponíveis para acesso na Internet. Para auxiliar o acesso e a integração dessas informações, esse trabalho apresenta uma ferramenta que permite a geração de aplicações para consultas a bases em RDF e RDFS através da programação por exemplo. Usuários podem criar casos de uso através de operações simples em cima do modelo RFDS da própria base. Esses casos de uso podem ser generalizados e compartilhados com outros usuários, que podem reutilizá-los. Com esse compartilhamento, cria-se a possibilidade desses casos de uso serem customizados e evoluídos colaborativamente no próprio ambiente em que foram desenvolvidos. Novas operações também podem ser criadas e compartilhadas, o que contribui para o aumento gradativo do poder da ferramenta. Finalmente, utilizando um conjunto desses casos de uso, é possível gerar uma aplicação web que abstraia o modelo RDF em que os dados estão representados, tornando possível o acesso a essas informações por usuários que não conheçam o modelo RDF.
Due to increasing popularity of the semantic web, more data sets, containing information about varied domains, have become available for access in the Internet. This thesis proposes a tool to assist accessing and exploring this information. This tool allows the generation of applications for querying databases in RDF and RDFS through programming by example. Users are able to create use cases through simple operations using the RDFS model. These use cases can be generalized and shared with other users, who can reuse them. The shared use cases can be customized and extended collaboratively in the environment which they were developed. New operations can also be created and shared, making the tool increasingly more powerful. Finally, using a set of use cases, it’s possible to generate a web application that abstracts the RDF model where the data is represented, making it possible for lay users to access this information without any knowledge of the RDF model.
2

Arndt, Natanael, Norman Radtke e Michael Martin. "Distributed collaboration on RDF datasets using Git". Universität Leipzig, 2016. https://ul.qucosa.de/id/qucosa%3A15781.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Collaboration is one of the most important topics regarding the evolution of the World Wide Web and thus also for the Web of Data. In scenarios of distributed collaboration on datasets it is necessary to provide support for multiple different versions of datasets to exist simultaneously, while also providing support for merging diverged datasets. In this paper we present an approach that uses SPARQL 1.1 in combination with the version control system Git, that creates commits for all changes applied to an RDF dataset containing multiple named graphs. Further the operations provided by Git are used to distribute the commits among collaborators and merge diverged versions of the dataset. We show the advantages of (public) Git repositories for RDF datasets and how this represents a way to collaborate on RDF data and consume it. With SPARQL 1.1 and Git in combination, users are given several opportunities to participate in the evolution of RDF data.
3

Fernández, Javier D., Miguel A. Martí­nez-Prieto, Axel Polleres e Julian Reindorf. "HDTQ: Managing RDF Datasets in Compressed Space". Springer International Publishing, 2018. http://epub.wu.ac.at/6482/1/HDTQ.pdf.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
HDT (Header-Dictionary-Triples) is a compressed representation of RDF data that supports retrieval features without prior decompression. Yet, RDF datasets often contain additional graph information, such as the origin, version or validity time of a triple. Traditional HDT is not capable of handling this additional parameter(s). This work introduces HDTQ (HDT Quads), an extension of HDT that is able to represent quadruples (or quads) while still being highly compact and queryable. Two HDTQ-based approaches are introduced: Annotated Triples and Annotated Graphs, and their performance is compared to the leading open-source RDF stores on the market. Results show that HDTQ achieves the best compression rates and is a competitive alternative to well-established systems.
4

Fernandez, Garcia Javier David, Sabrina Kirrane, Axel Polleres e Simon Steyskal. "HDT crypt: Compression and Encryption of RDF Datasets". IOS Press, 2018. http://epub.wu.ac.at/6489/1/HDTCrypt%2DSWJ.pdf.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The publication and interchange of RDF datasets online has experienced significant growth in recent years, promoted by different but complementary efforts, such as Linked Open Data, the Web of Things and RDF stream processing systems. However, the current Linked Data infrastructure does not cater for the storage and exchange of sensitive or private data. On the one hand, data publishers need means to limit access to confidential data (e.g. health, financial, personal, or other sensitive data). On the other hand, the infrastructure needs to compress RDF graphs in a manner that minimises the amount of data that is both stored and transferred over the wire. In this paper, we demonstrate how HDT - a compressed serialization format for RDF - can be extended to cater for supporting encryption. We propose a number of different graph partitioning strategies and discuss the benefits and tradeoffs of each approach.
5

Sejdiu, Gezim [Verfasser]. "Efficient Distributed In-Memory Processing of RDF Datasets / Gezim Sejdiu". Bonn : Universitäts- und Landesbibliothek Bonn, 2020. http://d-nb.info/1221669214/34.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Moreno, Vega José Ignacio. "A faceted browsing interface for diverse Large-Scale RDF Datasets". Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168108.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Magíster en Ciencias, Mención Computación.
Las bases de conocimiento en RDF contienen información acerca de millones de recursos, las cuales son consultadas utilizando el lenguaje estándar de consultas para RDF: SPARQL. Sin embargo, esta información no está accesible fácilmente porque requiere conocer el lenguaje SPARQL y la estructura de los datos a consultar; requisitos que no cumple un usuario común de internet. Se propone una interfaz de navegación por facetas para estos datos de gran tamaño que no requiere conocimientos previos de la estructura ni de SPARQL. La navegación por facetas consiste en agregar filtros (conocidos como facetas) para mostrar únicamente los elementos que cumplen los requisitos. Interfaces de navegación por facetas para RDF existentes no escalan bien para las bases de conocimientos actuales. Se propone un nuevo sistema que crea índices para búsquedas fáciles y rápidas sobre los datos, permitiendo calcular y sugerir facetas al usuario. Para validar la escalabilidad y eficiencia del sistema, se escogió Wikidata como la base de datos de gran tamaño para realizar los experimentos de desempeño. Luego, se realizó un estudio de usuarios para evaluar la usabilidad e interacción del sistema, los resultados obtenidos muestran en qué aspectos el sistema desempeña bien y cuáles pueden ser mejorados. Un prototipo final junto a un cuestionario fue enviado a contribuidores de Wikidata para descubrir como este sistema puede ayudar a la comunidad.
7

Arndt, Natanael, e Norman Radtke. "Quit diff: calculating the delta between RDF datasets under version control". Universität Leipzig, 2016. https://ul.qucosa.de/id/qucosa%3A15780.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Distributed actors working on a common RDF dataset regularly encounter the issue to compare the status of one graph with another or generally to synchronize copies of a dataset. A versioning system helps to synchronize the copies of a dataset, combined with a difference calculation system it is also possible to compare versions in a log and to determine, in which version a certain statement was introduced or removed. In this demo we present Quit Diff 1, a tool to compare versions of a Git versioned quad store, while it is also applicable to simple unversioned RDF datasets. We are following an approach to abstract from differences on a syntactical level to differences on the level of the RDF data model, while we leave further semantic interpretation on the schema and instance level to specialized applications. Quit Diff can generate patches in various output formats and can be directly integrated in the distributed version control system Git which provides a foundation for a comprehensive co-evolution work flow on RDF datasets.
8

Sherif, Mohamed Ahmed Mohamed. "Automating Geospatial RDF Dataset Integration and Enrichment". Doctoral thesis, Universitätsbibliothek Leipzig, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-215708.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Over the last years, the Linked Open Data (LOD) has evolved from a mere 12 to more than 10,000 knowledge bases. These knowledge bases come from diverse domains including (but not limited to) publications, life sciences, social networking, government, media, linguistics. Moreover, the LOD cloud also contains a large number of crossdomain knowledge bases such as DBpedia and Yago2. These knowledge bases are commonly managed in a decentralized fashion and contain partly verlapping information. This architectural choice has led to knowledge pertaining to the same domain being published by independent entities in the LOD cloud. For example, information on drugs can be found in Diseasome as well as DBpedia and Drugbank. Furthermore, certain knowledge bases such as DBLP have been published by several bodies, which in turn has lead to duplicated content in the LOD . In addition, large amounts of geo-spatial information have been made available with the growth of heterogeneous Web of Data. The concurrent publication of knowledge bases containing related information promises to become a phenomenon of increasing importance with the growth of the number of independent data providers. Enabling the joint use of the knowledge bases published by these providers for tasks such as federated queries, cross-ontology question answering and data integration is most commonly tackled by creating links between the resources described within these knowledge bases. Within this thesis, we spur the transition from isolated knowledge bases to enriched Linked Data sets where information can be easily integrated and processed. To achieve this goal, we provide concepts, approaches and use cases that facilitate the integration and enrichment of information with other data types that are already present on the Linked Data Web with a focus on geo-spatial data. The first challenge that motivates our work is the lack of measures that use the geographic data for linking geo-spatial knowledge bases. This is partly due to the geo-spatial resources being described by the means of vector geometry. In particular, discrepancies in granularity and error measurements across knowledge bases render the selection of appropriate distance measures for geo-spatial resources difficult. We address this challenge by evaluating existing literature for point set measures that can be used to measure the similarity of vector geometries. Then, we present and evaluate the ten measures that we derived from the literature on samples of three real knowledge bases. The second challenge we address in this thesis is the lack of automatic Link Discovery (LD) approaches capable of dealing with geospatial knowledge bases with missing and erroneous data. To this end, we present Colibri, an unsupervised approach that allows discovering links between knowledge bases while improving the quality of the instance data in these knowledge bases. A Colibri iteration begins by generating links between knowledge bases. Then, the approach makes use of these links to detect resources with probably erroneous or missing information. This erroneous or missing information detected by the approach is finally corrected or added. The third challenge we address is the lack of scalable LD approaches for tackling big geo-spatial knowledge bases. Thus, we present Deterministic Particle-Swarm Optimization (DPSO), a novel load balancing technique for LD on parallel hardware based on particle-swarm optimization. We combine this approach with the Orchid algorithm for geo-spatial linking and evaluate it on real and artificial data sets. The lack of approaches for automatic updating of links of an evolving knowledge base is our fourth challenge. This challenge is addressed in this thesis by the Wombat algorithm. Wombat is a novel approach for the discovery of links between knowledge bases that relies exclusively on positive examples. Wombat is based on generalisation via an upward refinement operator to traverse the space of Link Specifications (LS). We study the theoretical characteristics of Wombat and evaluate it on different benchmark data sets. The last challenge addressed herein is the lack of automatic approaches for geo-spatial knowledge base enrichment. Thus, we propose Deer, a supervised learning approach based on a refinement operator for enriching Resource Description Framework (RDF) data sets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples. Each of the proposed approaches is implemented and evaluated against state-of-the-art approaches on real and/or artificial data sets. Moreover, all approaches are peer-reviewed and published in a conference or a journal paper. Throughout this thesis, we detail the ideas, implementation and the evaluation of each of the approaches. Moreover, we discuss each approach and present lessons learned. Finally, we conclude this thesis by presenting a set of possible future extensions and use cases for each of the proposed approaches.
9

Rihany, Mohamad. "Keyword Search and Summarization Approaches for RDF Dataset Exploration". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG030.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Un nombre croissant de sources de données sont publiées sur le web, exprimées dans les langages proposés par le W3C comme RDF, RDF (S) et OWL. Ces sources représentent un volume de données sans précédent disponible pour les utilisateurs et les applications. Afin d’identifier les sources les plus pertinentes et de les utiliser, il est nécessaire d’en connaître le contenu, par exemple au moyen de requêtes écrites en Sparql, le langage d’interrogation proposé par le W3C pour les sources de données RDF. Mais cela nécessite, en plus de la maîtrise du langage Sparql, de disposer de connaissances sur le contenu de la source en termes de ressources, classes ou propriétés qu’elle contient. L’objectif de ma thèse est d’étudier des approches permettant de fournir un support à l’exploration d’une source de données RDF. Nous avons proposé deux approches complémentaires, la recherche mots-clés et le résumé d’un graphe RDF.La recherche mots-clés dans un graphe RDF renvoie un ou plusieurs sous-graphes en réponse à une requête exprimée comme un ensemble de termes à rechercher. Chaque sous-graphe est l’agrégation d’éléments extraits du graphe initial, et représente une réponse possible à la requête constituée par un ensemble de mots-clés. Les sous-graphes retournés peuvent être classés en fonction de leur pertinence. La recherche par mot-clé dans des sources de données RDF soulève les problèmes suivants : (i) l’identification pour chaque mot-clé de la requête des éléments correspondants dans le graphe considéré, en prenant en compte les différences de terminologies existant entre les mots-clés et les termes utilisés dans le graphe RDF, (ii) la combinaison des éléments de graphes retournés pour construire un sous-graphe résultat en utilisant des algorithmes d’agrégation capable de déterminer la meilleure façon de relier les éléments du graphe correspondant à des mots-clés, et enfin (iii), comme il peut exister plusieurs éléments du graphe qui correspondent à un même mot-clé, et par conséquent plusieurs sous-graphes résultat, il s’agit d’évaluer la pertinence de ces sous-graphes par l’utilisation de métriques appropriées. Dans notre travail, nous avons proposé une approche de recherche par mot-clé qui apporte des solutions aux problèmes ci-dessus.Fournir une vue résumée d’un graphe RDF peut être utile afin de déterminer si ce graphe correspond aux besoins d’un utilisateur particulier en mettant en évidence ses éléments les plus importants ; une telle vue résumée peut faciliter l’exploration du graphe. Dans notre travail, nous avons proposé une approche de résumé originale fondée sur l’identification des thèmes sous-jacents dans un graphe RDF. Notre approche de résumé consiste à extraire ces thèmes, puis à construire le résumé en garantissant que tous les thèmes sont représentés dans le résultat. Cela pose les questions suivantes : (i) comment identifier les thèmes dans un graphe RDF ? (ii) quels sont les critères adaptés pour identifier les éléments les plus pertinents dans les sous-graphes correspondants à un thème ? (iii) comment connecter les éléments les plus pertinents pour créer le résumé d’une thème ? et enfin (iv) comment générer un résumé pour le graphe initial à partir des résumés de thèmes ? Dans notre travail, nous avons proposé une approche qui fournit des réponses à ces questions et qui produit une représentation résumée d’un graphe RDF garantissant que chaque thème y est représenté proportionnellement à son importance dans le graphe initial
An increasing number of datasets are published on the Web, expressed in the standard languages proposed by the W3C such as RDF, RDF (S), and OWL. These datasets represent an unprecedented amount of data available for users and applications. In order to identify and use the relevant datasets, users and applications need to explore them using queries written in SPARQL, a query language proposed by the W3C. But in order to write a SPARQL query, a user should not only be familiar with the query language but also have knowledge about the content of the RDF dataset in terms of the resources, classes or properties it contains. The goal of this thesis is to provide approaches to support the exploration of these RDF datasets. We have studied two alternative and complementary exploration techniques, keyword search and summarization of an RDF dataset. Keyword search returns RDF graphs in response to a query expressed as a set of keywords, where each resulting graph is the aggregation of elements extracted from the source dataset. These graphs represent possible answers to the keyword query, and they can be ranked according to their relevance. Keyword search in RDF datasets raises the following issues: (i) identifying for each keyword in the query the matching elements in the considered dataset, taking into account the differences of terminology between the keywords and the terms used in the RDF dataset, (ii) combining the matching elements to build the result by defining aggregation algorithms that find the best way of linking matching elements, and finally (iii), finding appropriate metrics to rank the results, as several matching elements may exist for each keyword and consequently several graphs may be returned. In our work, we propose a keyword search approach that addresses these issues. Providing a summarized view of an RDF dataset can help a user in identifying if this dataset is relevant to his needs, and in highlighting its most relevant elements. This could be useful for the exploration of a given dataset. In our work, we propose a novel summarization approach based on the underlying themes of a dataset. Our theme-based summarization approach consists of extracting the existing themes in a data source, and building the summarized view so as to ensure that all these discovered themes are represented. This raises the following questions: (i) how to identify the underlying themes in an RDF dataset? (ii) what are the suitable criteria to identify the relevant elements in the themes extracted from the RDF graph? (iii) how to aggregate and connect the relevant elements to create a theme summary? and finally, (iv) how to create the summary for the whole RDF graph from the generated theme summaries? In our work, we propose a theme-based summarization approach for RDF datasets which answers these questions and provides a summarized representation ensuring that each theme is represented proportionally to its importance in the initial dataset
10

Ben, Ellefi Mohamed. "La recommandation des jeux de données basée sur le profilage pour le liage des données RDF". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT276/document.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Avec l’émergence du Web de données, notamment les données ouvertes liées, une abondance de données est devenue disponible sur le web. Cependant, les ensembles de données LOD et leurs sous-graphes inhérents varient fortement par rapport a leur taille, le thème et le domaine, les schémas et leur dynamicité dans le temps au niveau des données. Dans ce contexte, l'identification des jeux de données appropriés, qui répondent a des critères spécifiques, est devenue une tâche majeure, mais difficile a soutenir, surtout pour répondre a des besoins spécifiques tels que la recherche d'entités centriques et la recherche des liens sémantique des données liées. Notamment, en ce qui concerne le problème de liage des données, le besoin d'une méthode efficace pour la recommandation des jeux de données est devenu un défi majeur, surtout avec l'état actuel de la topologie du LOD, dont la concentration des liens est très forte au niveau des graphes populaires multi-domaines tels que DBpedia et YAGO, alors qu'une grande liste d'autre jeux de données considérés comme candidats potentiels pour le liage est encore ignorée. Ce problème est dû a la tradition du web sémantique dans le traitement du problème de "identification des jeux de données candidats pour le liage". Bien que la compréhension de la nature du contenu d'un jeu de données spécifique est une condition cruciale pour les cas d'usage mentionnées, nous adoptons dans cette thèse la notion de "profil de jeu de données"- un ensemble de caractéristiques représentatives pour un jeu de données spécifique, notamment dans le cadre de la comparaison avec d'autres jeux de données. Notre première direction de recherche était de mettre en œuvre une approche de recommandation basée sur le filtrage collaboratif, qui exploite à la fois les prols thématiques des jeux de données, ainsi que les mesures de connectivité traditionnelles, afin d'obtenir un graphe englobant les jeux de données du LOD et leurs thèmes. Cette approche a besoin d'apprendre le comportement de la connectivité des jeux de données dans le LOD graphe. Cependant, les expérimentations ont montré que la topologie actuelle de ce nuage LOD est loin d'être complète pour être considéré comme des données d'apprentissage.Face aux limites de la topologie actuelle du graphe LOD, notre recherche a conduit a rompre avec cette représentation de profil thématique et notamment du concept "apprendre pour classer" pour adopter une nouvelle approche pour l'identification des jeux de données candidats basée sur le chevauchement des profils intensionnels entre les différents jeux de données. Par profil intensionnel, nous entendons la représentation formelle d'un ensemble d'étiquettes extraites du schéma du jeu de données, et qui peut être potentiellement enrichi par les descriptions textuelles correspondantes. Cette représentation fournit l'information contextuelle qui permet de calculer la similarité entre les différents profils d'une manière efficace. Nous identifions le chevauchement de différentes profils à l'aide d'une mesure de similarité semantico-fréquentielle qui se base sur un classement calcule par le tf*idf et la mesure cosinus. Les expériences, menées sur tous les jeux de données lies disponibles sur le LOD, montrent que notre méthode permet d'obtenir une précision moyenne de 53% pour un rappel de 100%.Afin d'assurer des profils intensionnels de haute qualité, nous introduisons Datavore- un outil oriente vers les concepteurs de métadonnées qui recommande des termes de vocabulaire a réutiliser dans le processus de modélisation des données. Datavore fournit également les métadonnées correspondant aux termes recommandés ainsi que des propositions des triples utilisant ces termes. L'outil repose sur l’écosystème des Vocabulaires Ouverts Lies (LOV) pour l'acquisition des vocabulaires existants et leurs métadonnées
With the emergence of the Web of Data, most notably Linked Open Data (LOD), an abundance of data has become available on the web. However, LOD datasets and their inherent subgraphs vary heavily with respect to their size, topic and domain coverage, the schemas and their data dynamicity (respectively schemas and metadata) over the time. To this extent, identifying suitable datasets, which meet specific criteria, has become an increasingly important, yet challenging task to supportissues such as entity retrieval or semantic search and data linking. Particularlywith respect to the interlinking issue, the current topology of the LOD cloud underlines the need for practical and efficient means to recommend suitable datasets: currently, only well-known reference graphs such as DBpedia (the most obvious target), YAGO or Freebase show a high amount of in-links, while there exists a long tail of potentially suitable yet under-recognized datasets. This problem is due to the semantic web tradition in dealing with "finding candidate datasets to link to", where data publishers are used to identify target datasets for interlinking.While an understanding of the nature of the content of specific datasets is a crucial prerequisite for the mentioned issues, we adopt in this dissertation the notion of "dataset profile" - a set of features that describe a dataset and allow the comparison of different datasets with regard to their represented characteristics. Our first research direction was to implement a collaborative filtering-like dataset recommendation approach, which exploits both existing dataset topic proles, as well as traditional dataset connectivity measures, in order to link LOD datasets into a global dataset-topic-graph. This approach relies on the LOD graph in order to learn the connectivity behaviour between LOD datasets. However, experiments have shown that the current topology of the LOD cloud group is far from being complete to be considered as a ground truth and consequently as learning data.Facing the limits the current topology of LOD (as learning data), our research has led to break away from the topic proles representation of "learn to rank" approach and to adopt a new approach for candidate datasets identication where the recommendation is based on the intensional profiles overlap between differentdatasets. By intensional profile, we understand the formal representation of a set of schema concept labels that best describe a dataset and can be potentially enriched by retrieving the corresponding textual descriptions. This representation provides richer contextual and semantic information and allows to compute efficiently and inexpensively similarities between proles. We identify schema overlap by the help of a semantico-frequential concept similarity measure and a ranking criterion based on the tf*idf cosine similarity. The experiments, conducted over all available linked datasets on the LOD cloud, show that our method achieves an average precision of up to 53% for a recall of 100%. Furthermore, our method returns the mappings between the schema concepts across datasets, a particularly useful input for the data linking step.In order to ensure a high quality representative datasets schema profiles, we introduce Datavore| a tool oriented towards metadata designers that provides rankedlists of vocabulary terms to reuse in data modeling process, together with additional metadata and cross-terms relations. The tool relies on the Linked Open Vocabulary (LOV) ecosystem for acquiring vocabularies and metadata and is made available for the community
11

Rossiello, Roberto. "Generazione di dataset RDF su articoli scientifici e affiliazioni: un approccio modulare basato su DBPedia". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2015. http://amslaurea.unibo.it/8933/.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In questa tesi è stato proposto AffiliationExtractor, un tool modulare scritto in Python, preposto all'estrazione di informazioni su affiliazioni di autori di pubblicazioni scientifiche, producendo in output un dataset RDF contente queste informazioni.
12

Abbas, Nacira. "Formal Concept Analysis for Discovering Link Keys in the Web of Data". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0202.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Le Web des données est un espace de données global qui peut être considéré comme une couche supplémentaire au-dessus du Web des documents. Le liage des données est la tâche de découverte des liens d'identité entre les ensembles de données RDF (Resource Description Framework) sur le Web des données. Nous nous intéressons à une approche spécifique pour le liage des données, qui repose sur les “clés de liage”. Cette clé a la forme de deux ensembles de paires de propriétés associées à une paire de classes. Par exemple, la clé de liage ({(designation,titre)},{(designation,titre), (createur,auteur)},(Livre,Roman)) indique que si une instance “a” de la classe “Livre” et “b” de la classe “Roman” partagent au moins une valeur pour les propriétés “createur” et “auteur” et que “a” et “b” ont les mêmes valeurs pour les propriétés “designation” et “titre”, alors “a” et “b” désignent la même entité. Ainsi, (a,owl:sameAs,b) est un lien d'identité sur les deux ensembles de données. Cependant, les clés de liage ne sont pas toujours fournies, et divers algorithmes ont été développés pour découvrir automatiquement ces clés. Les algorithmes découvrent d'abord des “clés de liage candidates”. La qualité de ces candidates est ensuite évaluée à l'aide de mesures appropriées, et les clés de liage valides sont sélectionnées en conséquence. L'Analyse Formelle des Concepts (AFC) a été étroitement associée à la découverte de clés de liage candidates, ce qui a conduit à la proposition d'un algorithme basé sur l'AFC à cette fin. Cependant, les algorithmes de découverte de clés de liage présentent certaines limitations. Premièrement, ils ne spécifient pas explicitement les paires de classes associées aux candidates découvertes, ce qui peut conduire à des évaluations inexactes. De plus, les stratégies de sélection utilisées par ces algorithmes peuvent également produire des résultats moins précis. On observe aussi une redondance parmi les ensembles de candidates découvertes, ce qui complique leur visualisation, évaluation et analyse. Pour remédier à ces limitations, nous proposons d'étendre les algorithmes existants sur plusieurs aspects. Tout d'abord, nous introduisons une méthode basée sur les Pattern Structures, une généralisation de l'AFC pour les données non binaires. Cette approche permet de spécifier explicitement les paires de classes associées à chaque clé de liage candidate. Deuxièmement, basée sur la Pattern Structure proposée, nous présentons deux méthodes de sélection de clés de liage. La première méthode est guidée par les paires de classes associées aux candidates, tandis que la deuxième méthode utilise le treillis générée par la Pattern Structure. Ces deux méthodes améliorent la sélection par rapport à la stratégie existante. Enfin, pour remédier à la redondance, nous introduisons deux méthodes. La première méthode est basée sur une Partition Pattern Structure, qui identifie et fusionne les candidates générant les mêmes partitions. La deuxième méthode est basée sur le clustering hiérarchique, qui groupe les candidates produisant des ensembles de liens similaires en clusters et sélectionne un représentant pour chaque cluster. Cette approche réduit efficacement la redondance parmi les clés de liage candidates
The Web of data is a global data space that can be seen as an additional layer interconnected with the Web of documents. Data interlinking is the task of discovering identity links across RDF (Resource Description Framework) datasets over the Web of data. We focus on a specific approach for data interlinking, which relies on the “link keys”. A link key has the form of two sets of pairs of properties associated with a pair of classes. For example the link key ({(designation,title)},{(designation,title) (creator,author)},(Book,Novel)), states that whenever an instance “a” of the class “Book” and “b” of the class “Novel”, share at least one value for the properties “creator” and “author” and that, “a” and “b” have the same values for the properties “designation” and “title”, then “a” and “b” denote the same entity. Then (a,owl:sameAs,b) is an identity link over the two datasets. However, link keys are not always provided, and various algorithms have been developed to automatically discover these keys. First, these algorithms focus on finding “link key candidates”. The quality of these candidates is then evaluated using appropriate measures, and valid link keys are selected accordingly. Formal Concept Analysis (FCA) has been closely associated with the discovery of link key candidates, leading to the proposal of an FCA-based algorithm for this purpose. Nevertheless, existing algorithms for link key discovery have certain limitations. First, they do not explicitly specify the associated pairs of classes for the discovered link key candidates, which can lead to inaccurate evaluations. Additionally, the selection strategies employed by these algorithms may also produce less accurate results. Furthermore, redundancy is observed among the sets of discovered candidates, which presents challenges for their visualization, evaluation, and analysis. To address these limitations, we propose to extend the existing algorithms in several aspects. Firstly, we introduce a method based on Pattern Structures, an FCA generalization that can handle non-binary data. This approach allows for explicitly specifying the associated pairs of classes for each link key candidate. Secondly, based on the proposed Pattern Structure, we present two methods for link key selection. The first method is guided by the associated pairs of classes of link keys, while the second method utilizes the lattice generated by the Pattern Structure. These two methods improve the selection compared to the existing strategy. Finally, to address redundancy, we introduce two methods. The first method involves Partition Pattern Structure, which identifies and merges link key candidates that generate the same partitions. The second method is based on hierarchical clustering, which groups candidates producing similar link sets into clusters and selects a representative for each cluster. This approach effectively minimizes redundancy among the link key candidates
13

Barnathan, Michael. "Mining Complex High-Order Datasets". Diss., Temple University Libraries, 2010. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/82058.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Computer and Information Science
Ph.D.
Selection of an appropriate structure for storage and analysis of complex datasets is a vital but often overlooked decision in the design of data mining and machine learning experiments. Most present techniques impose a matrix structure on the dataset, with rows representing observations and columns representing features. While this assumption is reasonable when features are scalar and do not exhibit co-dependence, the matrix data model becomes inappropriate when dependencies between non-target features must be modeled in parallel, or when features naturally take the form of higher-order multilinear structures. Such datasets particularly abound in functional medical imaging modalities, such as fMRI, where accurate integration of both spatial and temporal information is critical. Although necessary to take full advantage of the high-order structure of these datasets and built on well-studied mathematical tools, tensor analysis methodologies have only recently entered widespread use in the data mining community and remain relatively absent from the literature within the biomedical domain. Furthermore, naive tensor approaches suffer from fundamental efficiency problems which limit their practical use in large-scale high-order mining and do not capture local neighborhoods necessary for accurate spatiotemporal analysis. To address these issues, a comprehensive framework based on wavelet analysis, tensor decomposition, and the WaveCluster algorithm is proposed for addressing the problems of preprocessing, classification, clustering, compression, feature extraction, and latent concept discovery on large-scale high-order datasets, with a particular emphasis on applications in computer-assisted diagnosis. Our framework is evaluated on a 9.3 GB fMRI motor task dataset of both high dimensionality and high order, performing favorably against traditional voxelwise and spectral methods of analysis, discovering latent concepts suggestive of subject handedness, and reducing space and time complexities by up to two orders of magnitude. Novel wavelet and tensor tools are derived in the course of this work, including a novel formulation of an r-dimensional wavelet transform in terms of elementary tensor operations and an enhanced WaveCluster algorithm capable of clustering real-valued as well as binary data. Sparseness-exploiting properties are demonstrated and variations of core algorithms for specialized tasks such as image segmentation are presented.
Temple University--Theses
14

Koufakou, Anna. "SCALABLE AND EFFICIENT OUTLIER DETECTION IN LARGE DISTRIBUTED DATA SETS WITH MIXED-TYPE ATTRIBUTES". Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3431.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
An important problem that appears often when analyzing data involves identifying irregular or abnormal data points called outliers. This problem broadly arises under two scenarios: when outliers are to be removed from the data before analysis, and when useful information or knowledge can be extracted by the outliers themselves. Outlier Detection in the context of the second scenario is a research field that has attracted significant attention in a broad range of useful applications. For example, in credit card transaction data, outliers might indicate potential fraud; in network traffic data, outliers might represent potential intrusion attempts. The basis of deciding if a data point is an outlier is often some measure or notion of dissimilarity between the data point under consideration and the rest. Traditional outlier detection methods assume numerical or ordinal data, and compute pair-wise distances between data points. However, the notion of distance or similarity for categorical data is more difficult to define. Moreover, the size of currently available data sets dictates the need for fast and scalable outlier detection methods, thus precluding distance computations. Additionally, these methods must be applicable to data which might be distributed among different locations. In this work, we propose novel strategies to efficiently deal with large distributed data containing mixed-type attributes. Specifically, we first propose a fast and scalable algorithm for categorical data (AVF), and its parallel version based on MapReduce (MR-AVF). We extend AVF and introduce a fast outlier detection algorithm for large distributed data with mixed-type attributes (ODMAD). Finally, we modify ODMAD in order to deal with very high-dimensional categorical data. Experiments with large real-world and synthetic data show that the proposed methods exhibit large performance gains and high scalability compared to the state-of-the-art, while achieving similar accuracy detection rates.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD
15

Marcelli, Fulvio. "Estrazione automatica di informazioni da articoli scientifici in formato PDF e pubblicazione in Linked Open Data". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2016. http://amslaurea.unibo.it/10892/.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
La tesi ha lo scopo di introdurre Investiga, un'applicazione per l'estrazione automatica di informazioni da articoli scientifici in formato PDF e pubblicazione di queste informazioni secondo i principi e i formati Linked Open Data, creata per la tesi. Questa applicazione è basata sul Task 2 della SemPub 2016, una challenge che ha come scopo principale quello di migliorare l'estrazione di informazioni da articoli scientifici in formato PDF. Investiga estrae i capitoli di primo livello, le didascalie delle figure e delle tabelle da un dato articolo e crea un grafo delle informazioni così estratte collegate adeguatamente tra loro. La tesi inoltre analizza gli strumenti esistenti per l'estrazione automatica di informazioni da documenti PDF e dei loro limiti.
16

Yang, Chaozheng. "Sufficient Dimension Reduction in Complex Datasets". Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/404627.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Statistics
Ph.D.
This dissertation focuses on two problems in dimension reduction. One is using permutation approach to test predictor contribution. The permutation approach applies to marginal coordinate tests based on dimension reduction methods such as SIR, SAVE and DR. This approach no longer requires calculation of the method-specific weights to determine the asymptotic null distribution. The other one is through combining clustering method with robust regression (least absolute deviation) to estimate dimension reduction subspace. Compared with ordinary least squares, the proposed method is more robust to outliers; also, this method replaces the global linearity assumption with the more flexible local linearity assumption through k-means clustering.
Temple University--Theses
17

Sherif, Mohamed Ahmed Mohamed [Verfasser], Klaus-Peter [Akademischer Betreuer] Fähnrich, Klaus-Peter [Gutachter] Fähnrich, Jens [Akademischer Betreuer] Lehmann, Ngomo Axel-Cyrille [Akademischer Betreuer] Ngonga, Sören [Akademischer Betreuer] Auer e Daniel [Gutachter] Mirankar. "Automating Geospatial RDF Dataset Integration and Enrichment / Mohamed Ahmed Mohamed Sherif ; Gutachter: Klaus-Peter Fähnrich, Daniel Mirankar ; Klaus-Peter Fähnrich, Jens Lehmann, Axel-Cyrille Ngonga Ngomo, Sören Auer". Leipzig : Universitätsbibliothek Leipzig, 2016. http://d-nb.info/1240696035/34.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Lowry, Kimberly. "THE PATHS TO BECOMING A MATHEMATICS TEACHER". Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3810.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Increasing numbers of mathematics teachers must be recruited in coming years, because of a growing student population, teacher attrition, calls for smaller class size, and the need to replace out-of-subject teachers. Recruitment can be made more effective and efficient, if better information on career paths is provided to decision makers. This study attempts to analyze the academic decisions which lead to the outcome "becoming a mathematics teacher". Four groups were compared and contrasted: mathematics teachers, science teachers, other teachers, and non-teachers. Science teachers were removed from the "other teachers" category because of their many similarities to mathematics teachers on the variables examined. The question of whether these groups differ in ways that could help predict the outcome of interest was examined using the NCES dataset Baccalaureate &Beyond:93/97, which provides thousands of variables on academic path, demographics, and labor market histories for over 8,000 individuals. It was analyzed using the NCES online analytic tool DAS to generate tables showing percentage distribution of the four groups on variables organized according to the concepts demographics, family environment, academic path, and academic achievement. Further examination was conducted by entering the variables into a discriminant analysis. Mathematics teachers were found to differ from teachers of other K-12 fields on all of the four conceptual categories. However, only a few such differences were statistically significant. More significant differences were observed when the analyses were conducted separately for women and men. The trend observed was that those who became mathematics teachers were more likely to have attended public high schools and to have first attended two-year colleges; to have lower GPAs, more mathematics credits, and midrange CEE scores; and to be female.
Ph.D.
Department of Teaching and Learning Principles
Education
Mathematics Education
19

Bedeschi, Luca. "Analisi sulla crescita e sulle funzioni dei Linked Open Data - LODStories". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2014. http://amslaurea.unibo.it/7733/.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
L'Open Data, letteralmente “dati aperti”, è la corrente di pensiero (e il relativo “movimento”) che cerca di rispondere all'esigenza di poter disporre di dati legalmente “aperti”, ovvero liberamente re-usabili da parte del fruitore, per qualsiasi scopo. L’obiettivo dell’Open Data può essere raggiunto per legge, come negli USA dove l’informazione generata dal settore pubblico federale è in pubblico dominio, oppure per scelta dei detentori dei diritti, tramite opportune licenze. Per motivare la necessità di avere dei dati in formato aperto, possiamo usare una comparazione del tipo: l'Open Data sta al Linked Data, come la rete Internet sta al Web. L'Open Data, quindi, è l’infrastruttura (o la “piattaforma”) di cui il Linked Data ha bisogno per poter creare la rete di inferenze tra i vari dati sparsi nel Web. Il Linked Data, in altre parole, è una tecnologia ormai abbastanza matura e con grandi potenzialità, ma ha bisogno di grandi masse di dati tra loro collegati, ossia “linkati”, per diventare concretamente utile. Questo, in parte, è già stato ottenuto ed è in corso di miglioramento, grazie a progetti come DBpedia o FreeBase. In parallelo ai contributi delle community online, un altro tassello importante – una sorta di “bulk upload” molto prezioso – potrebbe essere dato dalla disponibilità di grosse masse di dati pubblici, idealmente anche già linkati dalle istituzioni stesse o comunque messi a disposizione in modo strutturato – che aiutino a raggiungere una “massa” di Linked Data. A partire dal substrato, rappresentato dalla disponibilità di fatto dei dati e dalla loro piena riutilizzabilità (in modo legale), il Linked Data può offrire una potente rappresentazione degli stessi, in termini di relazioni (collegamenti): in questo senso, Linked Data ed Open Data convergono e raggiungono la loro piena realizzazione nell’approccio Linked Open Data. L’obiettivo di questa tesi è quello di approfondire ed esporre le basi sul funzionamento dei Linked Open Data e gli ambiti in cui vengono utilizzati.
20

Ortiz, Enrique. "A Scalable and Efficient Outlier Detection Strategy for Categorical Data". Honors in the Major Thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/1185.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Engineering and Computer Science
Computer Engineering
21

"Referring Expression Comprehension for CLEVR-Ref+ Dataset". Master's thesis, 2020. http://hdl.handle.net/2286/R.I.62696.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
abstract: Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature as it requires visual reasoning as underlying process along with relationships among the objects in the image. Recent works based on modular networks have displayed to be an effective framework for performing visual reasoning task. Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)
Dissertation/Thesis
Masters Thesis Computer Science 2020
22

Jareš, Antonín. "Zjednodušení přístupu k propojeným datům pomocí tabulkových pohledů". Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-451054.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The goal of this thesis is to design and implement a front-end application allowing users to create and manage custom views for arbitrary linked data endpoints. Such views will be executable against a predefined SPARQL endpoint and the users will be able to retrieve and download their requested data in the CSV format. The users will also be able to share these views and store them utilizing Solid Pods. Experienced SPARQL users will be able to manually customize the query. To achieve these goals, the system uses freely available technologies - HTML, JavaScript (namely the React framework) and CSS.
23

Soderi, Mirco. "Semantic models for the modeling and management of big data in a smart city environment". Doctoral thesis, 2021. http://hdl.handle.net/2158/1232245.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The overall purpose of this research has been the building or the improve- ment of semantic models for the representation of data related to smart cities and smart industries, in such a way that it could also be possible to build context-rich, user-oriented, ecient and eective applications based on such data. In some more detail, one of the key purposes has been the modelling of structural and the functioning aspects of the urban mobility and the produc- tion of instances exploiting the Open Street Map, that once integrated with trac sensors data, it has lead to the building and displaying of real-time trac reconstructions at a city level. One second key purpose has been the modelling of the Internet of Things, that allows today to seamlessy and e- ciently identify sensing devices that are deployed in a given area or along a given path and that are of a given type, and also inspect real-time data that they produce, through a user-oriented Web application, namely the Service Map. A pragmatic approach to the modelling has been followed, always tak- ing into consideration the best practices of semantic modelling on one side for that a clean, comprehensive and understandable model could result, and the reality of the data at our hands and of the applicative requirements on the other side. As said, the identication of architectures and methods that could grant eciency and scalability in data access has also been a primary purpose of this research that has led to the denition and implementation of a federation of Service Maps, namely the Super Service Map. The archi- tecture is fully distributed: each Super Service Map has a local list of the actual Service Maps with relevant metadata, it exposes the same interface as actual Service Maps, it forwards requests and builds merged responses, also implementing security and caching mechanisms. As said, the identica- tion of technologies, tools, methods, for presenting the data in a user-friendly manner is also has been a relevant part of this research, and it has led among the other to the denition and implementation of a client-server architecture and a Web interface in the Snap4City platform for the building, manage- ment, and displaying of synoptic templates and instances thanks to which users can securely display and iteract with dierent types of data. In end, some eort has been made for the automatic classication of RDF datasets as for their structures and purposes, based on the computation of metrics through SPARQL queries and on the application of dimensionality reduc- tion and clustering techniques. A Web portal is available where directories, datasets, metrics, and computations can be inspected even at real-time.

Vai alla bibliografia