Accedi

Bibliografie tematiche / SPARQL query transformation / Articoli di riviste

Articoli di riviste sul tema "SPARQL query transformation"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: SPARQL query transformation.

Autore: Grafiati

Pubblicato: 8 giugno 2024

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-18 articoli di riviste per l'attività di ricerca sul tema "SPARQL query transformation".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Wei, Bingyang, e Jing Sun. "Leveraging SPARQL Queries for UML Consistency Checking". International Journal of Software Engineering and Knowledge Engineering 31, n. 04 (aprile 2021): 635–54. http://dx.doi.org/10.1142/s0218194021500170.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Context and motivation: Multiple-viewed requirements modeling method describes the system to-be from different perspectives. Some requirements models are then specified in various UML diagrams. Question/problem: Managing those models can be tedious and error-prone, since a lot of CASE tools provide poor support for reasoning and consistency checking. Principal ideas/results: Ontology is a formal notation for describing concepts and their relations in a domain. Since software requirements are a kind of knowledge, we propose to adopt a knowledge engineering approach for managing the consistency of requirements models. In this paper, an ontology for three most commonly used UML diagrams is developed in Web Ontology Language (OWL). The transformation of UML class, sequence and state diagrams to OWL knowledge base is presented. Owing to the underlying logical reasoning capability of OWL, a semantic query language, SPARQL (SPARQL Protocol and RDF Query Language), is used to query the knowledge base for consistency checking. Contribution: This paper introduces a semantic web-based knowledge engineering approach to represent and manage software requirements knowledge in OWL. By experimenting with a concrete software system, we demonstrate the feasibility and applicability of this knowledge approach.

2

Cooray, Thilini, e Gihan Wikramanayake. "Path index based keywords to SPARQL query transformation for semantic data federations". International Journal on Advances in ICT for Emerging Regions (ICTer) 9, n. 1 (13 luglio 2016): 12. http://dx.doi.org/10.4038/icter.v9i1.7168.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

3

Qi, Jiexing, Chang Su, Zhixin Guo, Lyuwen Wu, Zanwei Shen, Luoyi Fu, Xinbing Wang e Chenghu Zhou. "Enhancing SPARQL Query Generation for Knowledge Base Question Answering Systems by Learning to Correct Triplets". Applied Sciences 14, n. 4 (14 febbraio 2024): 1521. http://dx.doi.org/10.3390/app14041521.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Generating SPARQL queries from natural language questions is challenging in Knowledge Base Question Answering (KBQA) systems. The current state-of-the-art models heavily rely on fine-tuning pretrained models such as T5. However, these methods still encounter critical issues such as triple-flip errors (e.g., (subject, relation, object) is predicted as (object, relation, subject)). To address this limitation, we introduce TSET (Triplet Structure Enhanced T5), a model with a novel pretraining stage positioned between the initial T5 pretraining and the fine-tuning for the Text-to-SPARQL task. In this intermediary stage, we introduce a new objective called Triplet Structure Correction (TSC) to train the model on a SPARQL corpus derived from Wikidata. This objective aims to deepen the model’s understanding of the order of triplets. After this specialized pretraining, the model undergoes fine-tuning for SPARQL query generation, augmenting its query-generation capabilities. We also propose a method named “semantic transformation” to fortify the model’s grasp of SPARQL syntax and semantics without compromising the pre-trained weights of T5. Experimental results demonstrate that our proposed TSET outperforms existing methods on three well-established KBQA datasets: LC-QuAD 2.0, QALD-9 plus, and QALD-10, establishing a new state-of-the-art performance (95.0% F1 and 93.1% QM on LC-QuAD 2.0, 75.85% F1 and 61.76% QM on QALD-9 plus, 51.37% F1 and 40.05% QM on QALD-10).

4

GEORGIEVA-TRIFONOVA, Tsvetanka, e Miroslav GALABOV. "Transforming 3D Models to Semantic Web Representation". Romanian Journal of Information Science and Technology 2023, n. 1 (24 marzo 2023): 33–48. http://dx.doi.org/10.59277/romjist.2023.1.03.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

"The purpose of the present paper is to research a rule-based approach for transforming X3D (eXtensible 3D) models to RDF (Resource Description Framework). The transformation is performed by using the RDF Mapping Language (RML). Its advantages are summarized, which are mainly due to the fact that the rules created build a knowledge base. By applying SPARQL (SPARQL Protocol and RDF Query Language) queries to it, the possibility of explore in order to validate and improve the defined RML rules themselves, is pointed out. An approach for reversing from the RDF triples to the original X3D in a unique way is considered, and the types of SPARQL queries needed for its implementation are systematized. Rules are formulated for all elements defined in the X3D schema, their attributes and properties are described. Their accessibility is ensured. The conversion of X3D models to RDF is confirmed to be consistent with guidelines and best practices for creating accessible, understandable, and reusable ontologies on the Web. The systematized SPARQL query types for reversing from RDF triples to the original X3D are checked for specific elements and sample data, and the obtained results establish their correctness. The prerequisites and limitations of the represented approach are explained. The proposed approach allows building a comprehensive knowledge base that includes the RML rules, the transformed X3D models and the domain-specific ontology and its use to analyzing data and semantic reasoning. The electronic libraries that include 3D content could take advantage from the benefits and possible future applications of the solutions discussed in this study."

5

Voinov, Artem, e Ilya Senokosov. "Analysis of the performance of languages for working with the ontological model of the assembly of 3D-constructions". Journal of Physics: Conference Series 2373, n. 7 (1 dicembre 2022): 072014. http://dx.doi.org/10.1088/1742-6596/2373/7/072014.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract The paper studies the performance of various languages used to work with ontological models. The languages are used in the methodology for modifying and verifying the 3D construction ontology presented in this paper. The methodology consists of three stages: building an ontological model of the assembly system and an ontological model of the desired 3D structure, modifying the original 3D structure model, and verifying the result in order to decide on further actions. The paper considers two groups of languages: query languages and modification languages. SQWRL and SPARQL are analyzed as query languages, and OWL DL and SWRL are used as modification languages. The comparison is based on the speed of performing such basic Boolean operations as conjunction and disjunction. To achieve the greatest objectivity a study is carried out on models of different dimensions. All measurements are made in the Protégé system since this system supports all 4 languages, the dependence of the runtime results on the runtime environment is minimized. The evaluation results are presented as a graph of the dependence of the rate of change in the process of performing an operation on the number of elements in the ontological model. The conclusion shows that the languages OWL DL (reasoner Pellet) and SWRL as ontology transformation languages and SPARQL as a query language are most suitable for working with the assembly ontology of 3D structures.

6

Postanogov, I. S., e I. A. Turova. "Platform for Integrating and Testing Tools which Transform Natural Language Queries into SPARQL Queries". Vestnik NSU. Series: Information Technologies 17, n. 2 (2019): 138–52. http://dx.doi.org/10.25205/1818-7900-2019-17-2-138-152.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

In the paper we discuss how to support the process of creating tools which transform natural language (NL) queries into SPARQL queries (hereinafter referred to as a transformation tool). In the introduction, we describe the relevance of the task of understanding natural language queries by information systems, as well as the advantages of using ontologies as a means of representing knowledge for solving this problem. This ontology-based data access approach can be also used in systems which provide natural language interface to databases. Based on the analysis of problems related to the integration and testing of existing transformation tools, as well as to support the creation and testing own transformation modules, the concept of a software platform that simplifies these tasks is proposed. The platform architecture satisfies the requirements for ease of connecting third party transformation tools, reusing individual modules, as well as integrating the resulting transformation tools into other systems, including testing systems. The building blocks of the created transformation systems are the individual transformation modules packaged in Docker containers. Program access to each module is carried out using gRPC. Modules loaded into the platform can be built into the transformation pipeline automatically or manually using the built-in third party SciVi data flow diagram editor. Compatibility of individual modules is controlled by automatic analysis of application programming interfaces. The resulting pipeline is combined according to specified data flow into a single multi-container application that can be integrated into other systems, as well as tested on extendable test suites. The expected and actual results of the query transformation are available for viewing in graphical form in the visualization tool developed earlier.

7

Natarajan, Senthilselvan, Subramaniyaswamy Vairavasundaram, Yuvaraja Teekaraman, Ramya Kuppusamy e Arun Radhakrishnan. "Schema-Based Mapping Approach for Data Transformation to Enrich Semantic Web". Wireless Communications and Mobile Computing 2021 (10 novembre 2021): 1–15. http://dx.doi.org/10.1155/2021/8567894.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Modern web wants the data to be in Resource Description Framework (RDF) format, a machine-readable form that is easy to share and reuse data without human intervention. However, most of the information is still available in relational form. The existing conventional methods transform the data from RDB to RDF using instance-level mapping, which has not yielded the expected results because of poor mapping. Hence, in this paper, a novel schema-based RDB-RDF mapping method (relational database to Resource Description Framework) is proposed, which is an improvised version for transforming the relational database into the Resource Description Framework. It provides both data materialization and on-demand mapping. RDB-RDF reduces the data retrieval time for nonprimary key search by using schema-level mapping. The resultant mapped RDF graph presents the relational database in a conceptual schema and maintains the instance triples as data graph. This mechanism is known as data materialization, which suits well for the static dataset. To get the data in a dynamic environment, query translation (on-demand mapping) is best instead of whole data conversion. The proposed approach directly converts the SPARQL query into SQL query using the mapping descriptions available in the proposed system. The mapping description is the key component of this proposed system which is responsible for quick data retrieval and query translation. Join expression introduced in the proposed RDB-RDF mapping method efficiently handles all complex operations with primary and foreign keys. Experimental evaluation is done on the graphics designer database. It is observed from the result that the proposed schema-based RDB-RDF mapping method accomplishes more comprehensible mapping than conventional methods by dissolving structural and operational differences.

8

Zhang, Hui, Yingtao Niu, Kun Ding, Shaoqin Kou e Liu Liu. "Building and Applying Knowledge Graph in Edge Analytics Environment". Journal of Physics: Conference Series 2171, n. 1 (1 gennaio 2022): 012014. http://dx.doi.org/10.1088/1742-6596/2171/1/012014.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract For the scenario of limited hardware resources and restricted software environment in edge computing architecture, the method of building and applying knowledge graph in edge analytics environment is proposed. The main process includes: building knowledge graph in the cloud, storing knowledge base with RDF format at the edge through customization and transformation, and performing query and analytics at the edge with SPARQL graph search language. The method is simulated in a communication anti-jamming test environment, and the results show that the relevant technical solutions can better meet the requirements of construction of knowledge graph and data analysis in a restricted edge analytics environment.

9

A. G. Hazber, Mohamed, Bing Li, Guandong Xu, Mohammed A. S. Mosleh, Xiwu Gu e Yuhua Li. "An Approach for Generation of SPARQL Query from SQL Algebra based Transformation Rules of RDB to Ontology". Journal of Software 13, n. 11 (2018): 573–99. http://dx.doi.org/10.17706/jsw.13.11.573-599.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

10

Haw, Su-Cheng, Lit-Jie Chew, Dana Sulistyo Kusumo, Palanichamy Naveen e Kok-Why Ng. "Mapping of extensible markup language-to-ontology representation for effective data integration". IAES International Journal of Artificial Intelligence (IJ-AI) 12, n. 1 (1 marzo 2023): 432. http://dx.doi.org/10.11591/ijai.v12.i1.pp432-442.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

<span lang="EN-US">Extensible markup language (XML) is well-known as the standard for data exchange over the Internet. It is flexible and has high expressibility to express the relationship between the data stored. Yet, the structural complexity and the semantic relationships are not well expressed. On the other hand, ontology models the structural, semantic and domain knowledge effectively. By combining ontology with visualization effect, one will be able to have a closer view based on respective user requirements. In this paper, we propose several mapping rules for the transformation of XML into ontology representation. Subsequently, we show how the ontology is constructed based on the proposed rules using the sample domain ontology in University of Wisconsin-Milwaukee (UWM) and mondial datasets. <br /> We also look at the schemas, query workload, and evaluation, to derive the extended knowledge from the existing ontology. The correctness of the ontology representation has been proven effective through supporting various types of complex queries in simple protocol and resource description framework query language (SPARQL) language.</span>

11

Pirró, Giuseppe. "REWOrD: Semantic Relatedness in the Web of Data". Proceedings of the AAAI Conference on Artificial Intelligence 26, n. 1 (20 settembre 2021): 129–35. http://dx.doi.org/10.1609/aaai.v26i1.8107.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

This paper presents REWOrD, an approach to compute semantic relatedness between entities in the Web of Data representing real word concepts. REWOrD exploits the graph nature of RDF data and the SPARQL query language to access this data. Through simple queries, REWOrD constructs weighted vectors keeping the informativeness of RDF predicates used to make statements about the entities being compared. The most informative path is also considered to further refine informativeness. Relatedness is then computed by the cosine of the weighted vectors. Differently from previous approaches based on Wikipedia, REWOrD does not require any prepro- cessing or custom data transformation. Indeed, it can lever- age whatever RDF knowledge base as a source of background knowledge. We evaluated REWOrD in different settings by using a new dataset of real word entities and investigate its flexibility. As compared to related work on classical datasets, REWOrD obtains comparable results while, on one side, it avoids the burden of preprocessing and data transformation and, on the other side, it provides more flexibility and applicability in a broad range of domains.

12

Hor, A. H., e G. Sohn. "DESIGN AND EVALUATION OF A BIM-GIS INTEGRATED INFORMATION MODEL USING RDF GRAPH DATABASE". ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences VIII-4/W2-2021 (7 ottobre 2021): 175–82. http://dx.doi.org/10.5194/isprs-annals-viii-4-w2-2021-175-2021.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Abstract. The semantic integration modeling of BIM industry foundations classes and GIS City-geographic markup language are a milestone for many applications that involve both domains of knowledge. In this paper, we propose a system design architecture, and implementation of Extraction, Transformation and Loading (ETL) workflows of BIM and GIS model into RDF graph database model, these workflows were created from functional components and ontological frameworks supporting RDF SPARQL and graph databases Cypher query languages. This paper is about full understanding of whether RDF graph database is suitable for a BIM-GIS integrated information model, and it looks deeper into the assessment of translation workflows and evaluating performance metrics of a BIM-GIS integrated data model managed in an RDF graph database, the process requires designing and developing various pipelines of workflows with semantic tools in order to get the data and its structure into an appropriate format and demonstrate the potential of using RDF graph databases to integrate, manage and analyze information and relationships from both GIS and BIM models, the study also has introduced the concepts of Graph-Model occupancy indexes of nodes, attributes and relationships to measure queries outputs and giving insights on data richness and performance of the resulting BIM-GIS semantically integrated model.

13

Ceballos, Oscar, Carlos Alberto Ramírez Restrepo, María Constanza Pabón, Andres M. Castillo e Oscar Corcho. "SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink". Applied Sciences 11, n. 15 (30 luglio 2021): 7033. http://dx.doi.org/10.3390/app11157033.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the project is available in Github under the MIT license.

14

Papadaki, Maria-Evangelia, Nicolas Spyratos e Yannis Tzitzikas. "Towards Interactive Analytics over RDF Graphs". Algorithms 14, n. 2 (25 gennaio 2021): 34. http://dx.doi.org/10.3390/a14020034.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The continuous accumulation of multi-dimensional data and the development of Semantic Web and Linked Data published in the Resource Description Framework (RDF) bring new requirements for data analytics tools. Such tools should take into account the special features of RDF graphs, exploit the semantics of RDF and support flexible aggregate queries. In this paper, we present an approach for applying analytics to RDF data based on a high-level functional query language, called HIFUN. According to that language, each analytical query is considered to be a well-formed expression of a functional algebra and its definition is independent of the nature and structure of the data. In this paper, we investigate how HIFUN can be used for easing the formulation of analytic queries over RDF data. We detail the applicability of HIFUN over RDF, as well as the transformations of data that may be required, we introduce the translation rules of HIFUN queries to SPARQL and we describe a first implementation of the proposed model.

15

Binding, Ceri, Claudio Gnoli e Douglas Tudhope. "Migrating a complex classification scheme to the semantic web: expressing the Integrative Levels Classification using SKOS RDF". Journal of Documentation ahead-of-print, ahead-of-print (26 marzo 2021). http://dx.doi.org/10.1108/jd-10-2020-0167.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

PurposeThe Integrative Levels Classification (ILC) is a comprehensive “freely faceted” knowledge organization system not previously expressed as SKOS (Simple Knowledge Organization System). This paper reports and reflects on work converting the ILC to SKOS representation.Design/methodology/approachThe design of the ILC representation and the various steps in the conversion to SKOS are described and located within the context of previous work considering the representation of complex classification schemes in SKOS. Various issues and trade-offs emerging from the conversion are discussed. The conversion implementation employed the STELETO transformation tool.FindingsThe ILC conversion captures some of the ILC facet structure by a limited extension beyond the SKOS standard. SPARQL examples illustrate how this extension could be used to create faceted, compound descriptors when indexing or cataloguing. Basic query patterns are provided that might underpin search systems. Possible routes for reducing complexity are discussed.Originality/valueComplex classification schemes, such as the ILC, have features which are not straight forward to represent in SKOS and which extend beyond the functionality of the SKOS standard. The ILC's facet indicators are modelled as rdf:Property sub-hierarchies that accompany the SKOS RDF statements. The ILC's top-level fundamental facet relationships are modelled by extensions of the associative relationship – specialised sub-properties of skos:related. An approach for representing faceted compound descriptions in ILC and other faceted classification schemes is proposed.

16

Sachs, Joel, Jocelyn Pender, Beatriz Lujan-Toro, James Macklin, Peter Haase e Robin Malik. "Using Wikidata and Metaphactory to Underpin an Integrated Flora of Canada". Biodiversity Information Science and Standards 3 (8 agosto 2019). http://dx.doi.org/10.3897/biss.3.38627.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

We are using Wikidata and Metaphactory to build an Integrated Flora of Canada (IFC). IFC will be integrated in two senses: First, it will draw on multiple existing flora (e.g. Flora of North America, Flora of Manitoba, etc.) for content. Second, it will be a portal to related resources such as annotations, specimens, literature, and sequence data. Background We had success using Semantic Media Wiki (SMW) as the platform for an on-line representation of the Flora of North America (FNA). We used Charaparser (Cui 2012) to extract plant structures (e.g. “stem”), characters (e.g. “external texture”), and character values (e.g. “glabrous”) from the semi-structured FNA treatments. We then loaded this data into SMW, which allows us to query for taxa based on their character traits, and enables a broad range of exploratory analysis, both for purposes of hypothesis generation, and also to provide support for or against specific scientific hypotheses. Migrating to Wikidata/Wikibase We decided to explore a migration from SMW to Wikibase for three main reasons: simplified workflow; triple level provenance; and sustainability. Simplified workflow: Our workflow for our FNA-based portal includes Natural Language Processing (NLP) of coarse-grained XML to get the fine-grained XML, transforming this XML for input into SMW, and a custom SMW skin for displaying the data. We consider the coarse-grained XML to be canonical. When it changes (because we find an error, or we improve our NLP), we have to re-run the transformation, and re-load the data, which is time-consuming. Ideally, our presentation would be based on API calls to the data itself, eliminating the need to transform and re-load after every change. Provenance: Wikidata's provenance model supports having multiple, conflicting assertions for the same character trait, which is something that inevitably happens when floristic data is integrated. Sustainability: Wikidata has strong support from the Wikimedia Foundation, while SMW is increasingly seen as a legacy system. Wikibase vs. Wikidata Wikidata, however, is not a suitable home for the Integrated Flora of Canada. It is built upon a relatively small number of community curated properties, while we have ~4500 properties for the Asteraceae family alone. The model we want to pursue is to use Wikidata for a small group of core properties (e.g. accepted name, parent taxon, etc.), and to use our own instance of Wikibase for the much larger number of specialized morphological properties (e.g. adaxial leaf colour, leaf external texture, etc.) Essentially, we will be running our own Wikidata, over which we would exercise full control. Miller (2018) decribes deploying this curation model in another domain. Metaphactory Metaphactory is a suite of middleware and front-end interfaces for authoring, managing, and querying knowledge graphs, including mechanisms for faceted search and geospatial visualizations. It is also the software (together with Blazegraph) behind the Wikidata Query Service. Metaphactory provides us with a SPARQL endpoint; a templating mechanism that allows each taxonomic treatment to be rendered via a collection of SPARQL queries; reasoning capabilities (via an underlying graph database) that permit the organization of over 42,000 morphological properties; and a variety of search and discovery tools. There are a number of ways in which Wikidata and Metaphactory can work together, and we are still exploring questions such as: Will provenance be managed via named graphs, or via the Wikidata snak model?; How will data flow between the two platforms? Etc. We will report on our findings to date, and invite collaboration with related Wikimedia-based projects.

17

Bruneau, Olivier, Nicolas Lasolle, Jean Lieber, Emmanuel Nauer, Siyana Pavlova e Laurent Rollet. "Applying and developing semantic web technologies for exploiting a corpus in history of science: The case study of the Henri Poincaré correspondence". Semantic Web, 30 settembre 2020, 1–20. http://dx.doi.org/10.3233/sw-200400.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

The Henri Poincaré correspondence is a corpus of letters sent and received by this mathematician. The edition of this correspondence is a long-term project begun during the 1990s. Since 1999, a website is devoted to publish online this correspondence with digitized letters. In 2017, it has been decided to reforge this website using Omeka S. This content management system offers useful services but some user needs have led to the development of an RDFS infrastructure associated to it. Approximate and explained searches are managed thanks to SPARQL query transformations. A prototype for efficient RDF annotation of this corpus (and similar corpora) has been designed and implemented. This article deals with these three research issues and how they are addressed.

18

Babalou, Samira, David Schellenberger Costa, Helge Bruelheide, Jens Kattge, Christine Römermann, Christian Wirth e Birgitta König-Ries. "iKNOW: A platform for knowledge graph construction for biodiversity". Biodiversity Information Science and Standards 6 (23 agosto 2022). http://dx.doi.org/10.3897/biss.6.93867.

Testo completo

Gli stili APA, Harvard, Vancouver, ISO e altri

Abstract (sommario):

Nowadays, more and more biodiversity datasets containing observational and experimental data are collected and produced by different projects. In order to answer the fundamental questions of biodiversity research, these data need to be integrated for joint analyses. However, to date, too often, these data remain isolated in silos. Both in academia and industry, Knowledge Graphs (KGs) are widely regarded as a promising approach to overcome issues of data silos and lack of common understanding of data (Fensel and Şimşek 2020). KGs are graph-structured knowledge bases that store factual information in the form of structured relationships between entities, like “tree_species has_trait average_SLA” or “nutans is_observed_in SCH_Location" (Hogan et al. 2021). In our context, entities could be, e.g., abstract concepts like a kingdom, a species, or a trait, or a concrete specimen of a species. Example relationships could be "co-occurs" or, "possesses-trait". KGs for biodiversity have been proposed by Page 2019 and have also been the topic at prior TDWG conferences *1 (Page 2021). However, to date, uptake of this concept in the community has been rather slow (Sachs et al. 2019). We argue that this is at least partially due to the high effort and expertise required in developing and managing such KGs. Therefore, in our ongoing project, iKNOW (Babalou et al. 2021), we aim to provide a toolbox for reproducible KG creation. While iKNOW is still in an early stage, we aim to make this platform open-source and freely available to the biodiversity community. Thus, it can significantly contribute to making biodiversity data widely available, easily discoverable, and integratable. For now, we focus on tabular datasets resulting from biodiversity observation or sampling events or experiments. Given such a dataset, iKNOW will support its transformation into (subject, predicate, object) triples in the RDF standard (Resource Description Framework). Every uploaded dataset will be considered as a subgraph of the main KG in iKNOW. If required, data can be cleaned. After that, the entities and relationships among them should be extracted. For that, a user will be able select one of the existing semi-automatic tools available on our platform (e.g., JenTab (Abdelmageed and Schindler 2020)). The entities in this step can be linked to respective global identifiers in Wikidata, GBIF, the Global Biodiversity Information Facility, or any other user-selected knowledge resource. In the next step, (subject, predicate, object) triples based on the extracted information from the previous steps will be created. After these processes, the generated sub-KG can be used directly. However, one can take further steps such as: Triple Augmentation (generate new triples and extra relations to ease KG completion), Schema Refinement (refine the schema, e.g., via logical reasoning for the KG completion and correctness), Quality Checking (check the quality of the generated sub-KG), and Query Building (create customized SPARQL queries for the generated KG). iKNOW will include a wide range of functionalities for creating, accessing, querying, visualizing, updating, reproducing, and tracking the provenance of KGs. The reproducibility of such a creation is essential to strengthening the establishment of open science practices in the biodiversity domain. Thus, all information regarding the user-selected tools with parameters and settings, along with the initial dataset and intermediate results, will be saved in every step of our platform. With the help of this, users can redo the previous steps. Moreover, this enables us to track the provenance of the created KG. The iKNOW project is a joint effort by computer scientists and domain experts from the German Centre for Integrative Biodiversity Research (iDiv). As a showcase, we aim to create a KG of plant-related data sources at iDiv. These include, among others: TRY (the plant trait database) (Kattge and DÍaz 2011), sPlot (the database about global patterns of taxonomic, functional, and phylogenetic diversity) (Bruelheide and Dengler 2019), and PhenObs (the dataset of the global network of botanical gardens monitoring the impacts of climate change on the phenology of herbaceous plant species) (Nordt and Hensen 2021), LCVP, the Leipzig Catalogue of Vascular Plants, (Freiberg and Winter 2020), and many others. The resulting KG will serve as a discovery tool for biodiversity data and provide a robust infrastructure for managing biodiversity knowledge. From the biodiversity research perspective, iKNOW will contribute to creating a dataset following the Linked Open Data principles by interlinking to cross-domain and specific-domain KGs. From the computer science perspective, iKNOW will contribute to developing tools for dynamic, low-effort creation of reproducible knowledge graphs.