To see the other types of publications on this topic, follow the link: Sensor data semantic annotation.

Dissertations / Theses on the topic 'Sensor data semantic annotation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 dissertations / theses for your research on the topic 'Sensor data semantic annotation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Amir, Mohammad. "Semantically-enriched and semi-Autonomous collaboration framework for the Web of Things. Design, implementation and evaluation of a multi-party collaboration framework with semantic annotation and representation of sensors in the Web of Things and a case study on disaster management." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14363.

Full text
Abstract:
This thesis proposes a collaboration framework for the Web of Things based on the concepts of Service-oriented Architecture and integrated with semantic web technologies to offer new possibilities in terms of efficient asset management during operations requiring multi-actor collaboration. The motivation for the project comes from the rise in disasters where effective cross-organisation collaboration can increase the efficiency of critical information dissemination. Organisational boundaries of participants as well as their IT capability and trust issues hinders the deployment of a multi-party collaboration framework, thereby preventing timely dissemination of critical data. In order to tackle some of these issues, this thesis proposes a new collaboration framework consisting of a resource-based data model, resource-oriented access control mechanism and semantic technologies utilising the Semantic Sensor Network Ontology that can be used simultaneously by multiple actors without impacting each other’s networks and thus increase the efficiency of disaster management and relief operations. The generic design of the framework enables future extensions, thus enabling its exploitation across many application domains. The performance of the framework is evaluated in two areas: the capability of the access control mechanism to scale with increasing number of devices, and the capability of the semantic annotation process to increase in efficiency as more information is provided. The results demonstrate that the proposed framework is fit for purpose.
APA, Harvard, Vancouver, ISO, and other styles
2

Furno, Domenico. "Hybrid approaches based on computational intelligence and semantic web for distributed situation and context awareness." Doctoral thesis, Universita degli studi di Salerno, 2013. http://hdl.handle.net/10556/927.

Full text
Abstract:
2011 - 2012
The research work focuses on Situation Awareness and Context Awareness topics. Specifically, Situation Awareness involves being aware of what is happening in the vicinity to understand how information, events, and one’s own actions will impact goals and objectives, both immediately and in the near future. Thus, Situation Awareness is especially important in application domains where the information flow can be quite high and poor decisions making may lead to serious consequences. On the other hand Context Awareness is considered a process to support user applications to adapt interfaces, tailor the set of application-relevant data, increase the precision of information retrieval, discover services, make the user interaction implicit, or build smart environments. Despite being slightly different, Situation and Context Awareness involve common problems such as: the lack of a support for the acquisition and aggregation of dynamic environmental information from the field (i.e. sensors, cameras, etc.); the lack of formal approaches to knowledge representation (i.e. contexts, concepts, relations, situations, etc.) and processing (reasoning, classification, retrieval, discovery, etc.); the lack of automated and distributed systems, with considerable computing power, to support the reasoning on a huge quantity of knowledge, extracted by sensor data. So, the thesis researches new approaches for distributed Context and Situation Awareness and proposes to apply them in order to achieve some related research objectives such as knowledge representation, semantic reasoning, pattern recognition and information retrieval. The research work starts from the study and analysis of state of art in terms of techniques, technologies, tools and systems to support Context/Situation Awareness. The main aim is to develop a new contribution in this field by integrating techniques deriving from the fields of Semantic Web, Soft Computing and Computational Intelligence. From an architectural point of view, several frameworks are going to be defined according to the multi-agent paradigm. Furthermore, some preliminary experimental results have been obtained in some application domains such as Airport Security, Traffic Management, Smart Grids and Healthcare. Finally, future challenges is going to the following directions: Semantic Modeling of Fuzzy Control, Temporal Issues, Automatically Ontology Elicitation, Extension to other Application Domains and More Experiments. [edited by author]
XI n.s.
APA, Harvard, Vancouver, ISO, and other styles
3

Khan, Imran. "Cloud-based cost-efficient application and service provisioning in virtualized wireless sensor networks." Thesis, Evry, Institut national des télécommunications, 2015. http://www.theses.fr/2015TELE0019/document.

Full text
Abstract:
Des Réseaux de Capteurs Sans Fil (RdCSF) deviennent omniprésents et sont utilisés dans diverses applications domaines. Ils sont les pierres angulaires de l'émergence de l'Internet des Objets (IdO) paradigme. Déploiements traditionnels de réseaux de capteurs sont spécifiques à un domaine, avec des applications généralement incrustés dans le RdCSF, excluant la ré-utilisation de l'infrastructure par d'autres applications. Maintenant, avec l'avènement de l'IdO, cette approche est de moins en moins viable. Une solution possible réside dans le partage d'une même RdCSF par de plusieurs applications et services, y compris même les applications et services qui ne sont pas envisagées lors du déploiement de RdCSF. Deux principaux développements majeurs ont conduit à cette solution potentielle. Premièrement, comme les nœuds de RdCSF sont de plus en plus puissants, il devient de plus en plus pertinent de rechercher comment pourrait plusieurs applications partager les mêmes déploiements WSN. La deuxième évolution est le Cloud Computing paradigme qui promeut des ressources et de la rentabilité en appliquant le concept de virtualisation les ressources physiques disponibles. Grâce à ces développements de cette thèse fait les contributions suivantes. Tout d'abord, un vaste état de la revue d'art est présenté qui présente les principes de base de RdCSF la virtualisation et sa pertinence avec précaution motive les scénarios sélectionnés. Les travaux existants sont présentés en détail et évaluées de manière critique en utilisant un ensemble d'exigences provenant du scénario. Cette contribution améliore sensiblement les critiques actuelles sur l'état de l'art en termes de portée, de la motivation, de détails, et les questions de recherche futures. La deuxième contribution se compose de deux parties: la première partie est une nouvelle architecture de virtualization RdCSF multicouche permet l'approvisionnement de plusieurs applications et services au cours du même déploiement de RdCSF. Il est mis en œuvre et évaluée en utilisant un prototype basé sur un scénario de preuve de concept en utilisant le kit Java SunSpot. La deuxième partie de cette contribution est l'architecture étendue qui permet à l’infrastructure virtualisée RdCSF d'interagir avec un RdCSF Platform-as-a-Service (PaaS) à un niveau d'abstraction plus élevé. Grâce à ces améliorations RdCSF PaaS peut provisionner des applications et des services RdCSF aux utilisateurs finaux que Software-as-a-Service (SaaS). Les premiers résultats sont présentés sur la base de l'implantation de l'architecture améliorée en utilisant le kit Java SunSpot. La troisième contribution est une nouvelle architecture d'annotation de données pour les applications sémantiques dans les environnements virtualisés les RdCSF. Il permet en réseau annotation de données et utilise des superpositions étant la pierre angulaire. Nous utilisons la base ontologie de domaine indépendant d'annoter les données du capteur. Un prototype de preuve de concept, basé sur un scénario, est développé et mis en œuvre en utilisant Java SunSpot, Kits AdvanticSys et Google App Engine. La quatrième et dernière contribution est l'amélioration à l'annotation de données proposée l'architecture sur deux fronts. L'un est l'extension à l'architecture proposée pour soutenir la création d'ontologie, de la distribution et la gestion. Le deuxième front est une heuristique génétique basée algorithme utilisé pour la sélection de noeuds capables de stocker l'ontologie de base. L'extension de la gestion d'ontologie est mise en oeuvre et évaluée à l'aide d'un prototype de validation de principe à l'aide de Java kit SunSpot, tandis que les résultats de la simulation de l'algorithme sont présentés
Wireless Sensor Networks (WSNs) are becoming ubiquitous and are used in diverse applications domains. Traditional deployments of WSNs are domain-specific, with applications usually embedded in the WSN, precluding the re-use of the infrastructure by other applications. This can lead to redundant deployments. Now with the advent of IoT, this approach is less and less viable. A potential solution lies in the sharing of a same WSN by multiple applications and services, to allow resource- and cost-efficiency. In this dissertation, three architectural solutions are proposed for this purpose. The first solution consists of two parts: the first part is a novel multilayer WSN virtualization architecture that allows the provisioning of multiple applications and services over the same WSN deployment. The second part of this contribution is the extended architecture that allows virtualized WSN infrastructure to interact with a WSN Platform-as-a-Service (PaaS) at a higher level of abstraction. Both these solutions are implemented and evaluated using two scenario-based proof-of-concept prototypes using Java SunSpot kit. The second architectural solution is a novel data annotation architecture for the provisioning of semantic applications in virtualized WSNs. It is capable of providing in-network, distributed, real-time annotation of raw sensor data and uses overlays as the cornerstone. This architecture is implemented and evaluated using Java SunSpot, AdvanticSys kits and Google App Engine. The third architectural solution is the enhancement to the data annotation architecture on two fronts. One is a heuristic-based genetic algorithm used for the selection of capable nodes for storing the base ontology. The second front is the extension to the proposed architecture to support ontology creation, distribution and management. The simulation results of the algorithm are presented and the ontology management extension is implemented and evaluated using a proof-of-concept prototype using Java SunSpot kit. As another contribution, an extensive state-of-the-art review is presented that introduces the basics of WSN virtualization and motivates its pertinence with carefully selected scenarios. This contribution substantially improves current state-of-the-art reviews in terms of the scope, motivation, details, and future research issues
APA, Harvard, Vancouver, ISO, and other styles
4

CUTRONA, VINCENZO. "Semantic Table Annotation for Large-Scale Data Enrichment." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2021. http://hdl.handle.net/10281/317044.

Full text
Abstract:
I dati rappresentano uno dei principali asset che creano valore. L'analisi dei dati è diventata una componente cruciale negli studi scientifici e nelle decisioni aziendali negli ultimi anni e ha portato i ricercatori a definire nuove metodologie per rappresentare, gestire e analizzare i dati. Contemporaneamente, la crescita della potenza di calcolo ha permesso l'analisi di enormi quantità di dati, permettendo alle persone di estrarre informazioni utili dai dati raccolti. L'analisi predittiva gioca un ruolo cruciale in molte applicazioni poiché fornisce più conoscenza per supportare le decisioni aziendali. Tra le tecniche statistiche disponibili per supportare l'analitica predittiva, l'apprendimento automatico è una tecnica capace di risolvere molte classi diverse di problemi, e che ha beneficiato maggiormente della crescita della potenza di calcolo. Infatti, negli ultimi anni, sono stati proposti modelli di apprendimento automatico più complessi e accurati, che richiedono una quantità crescente di dati attuali e storici per funzionare al meglio. La richiesta di una quantità così massiccia di dati per addestrare i modelli di apprendimento automatico rappresenta un ostacolo iniziale per i data scientist, perché le informazioni necessarie sono di solito sparse in diversi set di dati che devono essere integrati manualmente. Di conseguenza, l'arricchimento dei dati è diventato un compito critico nel processo di preparazione dei dati, e al giorno d'oggi, la maggior parte dei progetti prevedere un processo di preparazione dei dati costoso in termini di tempo, volto ad arricchire un set di dati principali con informazioni aggiuntive da varie fonti esterne per migliorare la solidità dei modelli addestrati risultanti. Come facilitare la progettazione del processo di arricchimento per gli scienziati dei dati è una sfida, così come sostenere il processo di arricchimento su larga scala. Nonostante la crescente importanza dell'attività di arricchimento, essa è ancora supportata solo in misura limitata dalle soluzioni esistenti, delegando la maggior parte dello sforzo al data scientist, che è incaricato sia di rilevare i set di dati che contengono le informazioni necessarie, sia di integrarli. In questa tesi, introduciamo una metodologia per supportare l'attività di arricchimento dei dati, che si concentra sullo sfruttamento della semantica come fattore chiave, fornendo agli utenti uno strumento semantico per progettare il processo di arricchimento, insieme a una piattaforma per eseguire il processo su larga scala. Illustriamo come l'arricchimento dei dati può essere affrontato tramite trasformazioni di dati tabellari, sfruttando metodi di interpretazione semantica delle tabelle, e discutiamo le tecniche di implementazione per supportare l'esecuzione del processo risultante su grandi set di dati. Dimostriamo sperimentalmente la scalabilità e l'efficienza della soluzione proposta impiegandola in uno scenario del mondo reale. Infine, introduciamo un nuovo set di dati di riferimento per valutare le prestazioni e la scalabilità degli algoritmi di annotazione semantica delle tabelle, e proponiamo un nuovo approccio efficiente per migliorare le prestazioni di tali algoritmi.
Data are the new oil, and they represent one of the main value-creating assets. Data analytics has become a crucial component in scientific studies and business decisions in the last years and has brought researchers to define novel methodologies to represent, manage, and analyze data. Simultaneously, the growth of computing power enabled the analysis of huge amounts of data, allowing people to extract useful information from collected data. Predictive analytics plays a crucial role in many applications since it provides more knowledge to support business decisions. Among the statistical techniques available to support predictive analytics, machine learning is the technique that features capabilities to solve many different classes of problems, and that has benefited the most from computing power growth. In the last years, more complex and accurate machine learning models have been proposed, requiring an increasing amount of current and historical data to perform the best. The demand for such a massive amount of data to train machine learning models represents an initial hurdle for data scientists because the information needed is usually scattered in different data sets that have to be manually integrated. As a consequence, data enrichment has become a critical task in the data preparation process, and nowadays, most of all the data science projects involve a time-costly data preparation process aimed at enriching a core data set with additional information from various external sources to improve the sturdiness of resulting trained models. How to ease the design of the enrichment process for data scientists is defying and supporting the enrichment process at a large scale. Despite the growing importance of the enrichment task, it is still supported only to a limited extent by existing solutions, delegating most of the effort to the data scientist, who is in charge of both detecting the data sets that contain the needed information, and integrate them. In this thesis, we introduce a methodology to support the data enrichment task, which focuses on harnessing the semantics as the key factor by providing users with a semantics-aided tool to design the enrichment process, along with a platform to execute the process at a business scale. We illustrate how the data enrichment can be addressed via tabular data transformations exploiting semantic table interpretation methods, discussing implementation techniques to support the enactment of the resulting process on large data sets. We experimentally demonstrate the scalability and run-time efficiency of the proposed solution by employing it in a real-world scenario. Finally, we introduce a new benchmark dataset to evaluate the performance and the scalability of existing semantic table annotation algorithms, and we propose an efficient novel approach to improve the performance of such algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Anderson, Neil David Alan. "Data extraction & semantic annotation from web query result pages." Thesis, Queen's University Belfast, 2016. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.705642.

Full text
Abstract:
Our unquenchable thirst for knowledge is one of the few things that really defines our humanity. Yet the Information Age, which we have created, has left us floating aimlessly in a vast ocean of unintelligible data. Hidden Web databases are one massive source of structured data. The contents of these databases are, however, often only accessible through a query proposed by a user. The data returned in these Query Result Pages is intended for human consumption and, as such, has nothing more than an implicit semantic structure which can be understood visually by a human reader, but not by a computer. This thesis presents an investigation into the processes of extraction and semantic understanding of data from Query Result Pages. The work is multi-faceted and includes at the outset, the development of a vision-based data extraction tool. This work is followed by the development of a number of algorithms which make use of machine learning-based techniques first to align the data extracted into semantically similar groups and then to assign a meaningful label to each group. Part of the work undertaken in fulfilment of this thesis has also addressed the lack of large, modern datasets containing a wide range of result pages representing of those typically found online today. In particular, a new innovative crowdsourced dataset is presented. Finally, the work concludes by examining techniques from the complementary research field of Information Extraction. An initial, critical assessment of how these mature techniques could be applied to this research area is provided.
APA, Harvard, Vancouver, ISO, and other styles
6

Patni, Harshal Kamlesh. "Real Time Semantic Analysis of Streaming Sensor Data." Wright State University / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=wright1324181415.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Wong, Ping-wai, and 黃炳蔚. "Semantic annotation of Chinese texts with message structures based on HowNet." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38212389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Alirezaie, Marjan. "Bridging the Semantic Gap between Sensor Data and Ontological Knowledge." Doctoral thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-45908.

Full text
Abstract:
The rapid growth of sensor data can potentially enable a better awareness of the environment for humans. In this regard, interpretation of data needs to be human-understandable. For this, data interpretation may include semantic annotations that hold the meaning of numeric data. This thesis is about bridging the gap between quantitative data and qualitative knowledge to enrich the interpretation of data. There are a number of challenges which make the automation of the interpretation process non-trivial. Challenges include the complexity of sensor data, the amount of available structured knowledge and the inherent uncertainty in data. Under the premise that high level knowledge is contained in ontologies, this thesis investigates the use of current techniques in ontological knowledge representation and reasoning to confront these challenges. Our research is divided into three phases, where the focus of the first phase is on the interpretation of data for domains which are semantically poor in terms of available structured knowledge. During the second phase, we studied publicly available ontological knowledge for the task of annotating multivariate data. Our contribution in this phase is about applying a diagnostic reasoning algorithm to available ontologies. Our studies during the last phase have been focused on the design and development of a domain-independent ontological representation model equipped with a non-monotonic reasoning approach with the purpose of annotating time-series data. Our last contribution is related to coupling the OWL-DL ontology with a non-monotonic reasoner. The experimental platforms used for validation consist of a network of sensors which include gas sensors whose generated data is complex. A secondary data set includes time series medical signals representing physiological data, as well as a number of publicly available ontologies such as NCBO Bioportal repository.
APA, Harvard, Vancouver, ISO, and other styles
9

Hatem, Muna Salman. "A framework for semantic web implementation based on context-oriented controlled automatic annotation." Thesis, University of Bradford, 2009. http://hdl.handle.net/10454/3207.

Full text
Abstract:
The Semantic Web is the vision of the future Web. Its aim is to enable machines to process Web documents in a way that makes it possible for the computer software to "understand" the meaning of the document contents. Each document on the Semantic Web is to be enriched with meta-data that express the semantics of its contents. Many infrastructures, technologies and standards have been developed and have proven their theoretical use for the Semantic Web, yet very few applications have been created. Most of the current Semantic Web applications were developed for research purposes. This project investigates the major factors restricting the wide spread of Semantic Web applications. We identify the two most important requirements for a successful implementation as the automatic production of the semantically annotated document, and the creation and maintenance of semantic based knowledge base. This research proposes a framework for Semantic Web implementation based on context-oriented controlled automatic Annotation; for short, we called the framework the Semantic Web Implementation Framework (SWIF) and the system that implements this framework the Semantic Web Implementation System (SWIS). The proposed architecture provides for a Semantic Web implementation of stand-alone websites that automatically annotates Web pages before being uploaded to the Intranet or Internet, and maintains persistent storage of Resource Description Framework (RDF) data for both the domain memory, denoted by Control Knowledge, and the meta-data of the Web site's pages. We believe that the presented implementation of the major parts of SWIS introduce a competitive system with current state of art Annotation tools and knowledge management systems; this is because it handles input documents in the ii context in which they are created in addition to the automatic learning and verification of knowledge using only the available computerized corporate databases. In this work, we introduce the concept of Control Knowledge (CK) that represents the application's domain memory and use it to verify the extracted knowledge. Learning is based on the number of occurrences of the same piece of information in different documents. We introduce the concept of Verifiability in the context of Annotation by comparing the extracted text's meaning with the information in the CK and the use of the proposed database table Verifiability_Tab. We use the linguistic concept Thematic Role in investigating and identifying the correct meaning of words in text documents, this helps correct relation extraction. The verb lexicon used contains the argument structure of each verb together with the thematic structure of the arguments. We also introduce a new method to chunk conjoined statements and identify the missing subject of the produced clauses. We use the semantic class of verbs that relates a list of verbs to a single property in the ontology, which helps in disambiguating the verb in the input text to enable better information extraction and Annotation. Consequently we propose the following definition for the annotated document or what is sometimes called the 'Intelligent Document' 'The Intelligent Document is the document that clearly expresses its syntax and semantics for human use and software automation'. This work introduces a promising improvement to the quality of the automatically generated annotated document and the quality of the automatically extracted information in the knowledge base. Our approach in the area of using Semantic Web iii technology opens new opportunities for diverse areas of applications. E-Learning applications can be greatly improved and become more effective.
APA, Harvard, Vancouver, ISO, and other styles
10

Lindberg, Hampus. "Semantic Segmentation of Iron Ore Pellets in the Cloud." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-86896.

Full text
Abstract:
This master's thesis evaluates data annotation, semantic segmentation and Docker for use in AWS. The data provided has to be annotated and is to be used as a dataset for the creation of a neural network. Different neural network models are then to be compared based on performance. AWS has the option to use Docker containers and thus that option is to be examined, and lastly the different tools available in AWS SageMaker will be analyzed for bringing a neural network to the cloud. Images were annotated in Ilastik and the dataset size is 276 images, then a neural network was created in PyTorch by using the library Segmentation Models PyTorch which gave the option of trying different models. This neural network was created in a notebook in Google Colab for a quick setup and easy testing. The dataset was then uploaded to AWS S3 and the notebook was brought from Colab to an AWS instance where the dataset then could be loaded from S3. A Docker container was created and packaged with the necessary packages and libraries as well as the training and inference code, to then be pushed to the ECR (Elastic Container Registry). This container could then be used to perform training jobs in SageMaker which resulted in a trained model stored in S3, and the hyperparameter tuning tool was also examined to get a better performing model. The two different deployment methods in SageMaker was then investigated to understand the entire machine learning solution. The images annotated in Ilastik were deemed sufficient as the neural network results were satisfactory. The neural network created was able to use all of the models accessible from Segmentation Models PyTorch which enabled a lot of options. By using a Docker container all of the tools available in SageMaker could be used with the created neural network packaged in the container and pushed to the ECR. Training jobs were run in SageMaker by using the container to get a trained model which could be saved to AWS S3. Hyperparameter tuning was used and got better results than the manually tested parameters which resulted in the best neural network produced. The model that was deemed the best was Unet++ in combination with the Dpn98 encoder. The two different deployment methods in SageMaker was explored and is believed to be beneficial in different ways and thus has to be reconsidered for each project. By analysis the cloud solution was deemed to be the better alternative compared to an in-house solution, in all three aspects measured, which was price, performance and scalability.
APA, Harvard, Vancouver, ISO, and other styles
11

Pschorr, Joshua Kenneth. "SemSOS : an Architecture for Query, Insertion, and Discovery for Semantic Sensor Networks." Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1368741809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Nachabe, Ismail Lina. "Automatic sensor discovery and management to implement effective mechanism for data fusion and data aggregation." Thesis, Evry, Institut national des télécommunications, 2015. http://www.theses.fr/2015TELE0021/document.

Full text
Abstract:
Actuellement, des descriptions basées sur de simples schémas XML sont utilisées pour décrire un capteur/actuateur et les données qu’il mesure et fournit. Ces schémas sont généralement formalisés en utilisant le langage SensorML (Sensor Model Language), ne permettant qu’une description hiérarchique basique des attributs des objets sans aucune notion de liens sémantiques, de concepts et de relations entre concepts. Nous pensons au contraire que des descriptions sémantiques des capteurs/actuateurs sont nécessaires au design et à la mise en œuvre de mécanismes efficaces d’inférence, de fusion et de composition de données. Cette ontologie sémantique permettra de masquer l’hétérogénéité des données collectées et facilitera leur fusion et leur composition au sein d’un environnement de gestion de capteur similaire à celui d’une architecture ouverte orientée services. La première partie des travaux de cette thèse porte donc sur la conception et la validation d’une ontologie sémantique légère, extensible et générique de description des données fournies par un capteur/actuateur. Cette description ontologique de données brutes devra être conçue : • d’une manière extensible et légère afin d’être applicable à des équipements embarqués hétérogènes, • comme sous élément d’une ontologie de plus haut niveau (upper level ontology) utilisée pour modéliser les capteurs et actuateurs (en tant qu’équipements et non plus de données fournies), ainsi que les informations mesurées (information veut dire ici donnée de plus haut niveau issue du traitement et de la fusion des données brutes). La seconde partie des travaux de cette thèse portera sur la spécification et la qualification : • d’une architecture générique orientée service (SOA) permettant la découverte et la gestion d’un capteur/actuateur, et des données qu’il fournit (incluant leurs agrégation et fusion en s’appuyant sur les mécanismes de composition de services de l’architecture SOA), à l’identique d’un service composite de plus haut niveau, • d’un mécanisme amélioré de collecte de données à grande échelle, au dessus de cette ontologie descriptive. L’objectif des travaux de la thèse est de fournir des facilitateurs permettant une mise en œuvre de mécanismes efficaces de collecte, de fusion et d’agrégation de données, et par extension de prise de décisions. L’ontologie de haut niveau proposée sera quant à elle pourvue de tous les attributs permettant une représentation, une gestion et une composition des ‘capteurs, actuateurs et objets’ basées sur des architectures orientées services (Service Oriented Architecture ou SOA). Cette ontologie devrait aussi permettre la prise en compte de l’information transporter (sémantique) dans les mécanismes de routage (i.e. routage basé information). Les aspects liés à l’optimisation et à la modélisation constitueront aussi une des composantes fortes de cette thèse. Les problématiques à résoudre pourraient être notamment : • La proposition du langage de description le mieux adapté (compromis entre richesse, complexité et flexibilité), • La définition de la structure optimum de l’architecture de découverte et de gestion d’un capteur/actuateur, • L’identification d’une solution optimum au problème de la collecte à grande échelle des données de capteurs/actuateurs
The constant evolution of technology in terms of inexpensive and embedded wireless interfaces and powerful chipsets has leads to the massive usage and development of wireless sensor networks (WSNs). This potentially affects all aspects of our lives ranging from home automation (e.g. Smart Buildings), passing through e-Health applications, environmental observations and broadcasting, food sustainability, energy management and Smart Grids, military services to many other applications. WSNs are formed of an increasing number of sensor/actuator/relay/sink devices, generally self-organized in clusters and domain dedicated, that are provided by an increasing number of manufacturers, which leads to interoperability problems (e.g., heterogeneous interfaces and/or grounding, heterogeneous descriptions, profiles, models …). Moreover, these networks are generally implemented as vertical solutions not able to interoperate with each other. The data provided by these WSNs are also very heterogeneous because they are coming from sensing nodes with various abilities (e.g., different sensing ranges, formats, coding schemes …). To tackle this heterogeneity and interoperability problems, these WSNs’ nodes, as well as the data sensed and/or transmitted, need to be consistently and formally represented and managed through suitable abstraction techniques and generic information models. Therefore, an explicit semantic to every terminology should be assigned and an open data model dedicated for WSNs should be introduced. SensorML, proposed by OGC in 2010, has been considered an essential step toward data modeling specification in WSNs. Nevertheless, it is based on XML schema only permitting basic hierarchical description of the data, hence neglecting any semantic representation. Furthermore, most of the researches that have used semantic techniques for developing their data models are only focused on modeling merely sensors and actuators (this is e.g. the case of SSN-XG). Other researches dealt with data provided by WSNs, but without modelling the data type, quality and states (like e.g. OntoSensor). That is why the main aim of this thesis is to specify and formalize an open data model for WSNs in order to mask the aforementioned heterogeneity and interoperability between different systems and applications. This model will also facilitate the data fusion and aggregation through an open management architecture like environment as, for example, a service oriented one. This thesis can thus be split into two main objectives: 1)To formalize a semantic open data model for generically describing a WSN, sensors/actuators and their corresponding data. This model should be light enough to respect the low power and thus low energy limitation of such network, generic for enabling the description of the wide variety of WSNs, and extensible in a way that it can be modified and adapted based on the application. 2)To propose an upper service model and standardized enablers for enhancing sensor/actuator discovery, data fusion, data aggregation and WSN control and management. These service layer enablers will be used for improving the data collection in a large scale network and will facilitate the implementation of more efficient routing protocols, as well as decision making mechanisms in WSNs
APA, Harvard, Vancouver, ISO, and other styles
13

Calegari, Newton Juniano. "Proposta de uma ferramenta de anotação semântica para publicação de dados estruturados na Web." Pontifícia Universidade Católica de São Paulo, 2016. https://tede2.pucsp.br/handle/handle/18992.

Full text
Abstract:
Submitted by Filipe dos Santos (fsantos@pucsp.br) on 2016-09-02T14:31:38Z No. of bitstreams: 1 Newton Juniano Calegari.pdf: 2853517 bytes, checksum: e1eda2a1325986c6284a5054d724a19f (MD5)
Made available in DSpace on 2016-09-02T14:31:38Z (GMT). No. of bitstreams: 1 Newton Juniano Calegari.pdf: 2853517 bytes, checksum: e1eda2a1325986c6284a5054d724a19f (MD5) Previous issue date: 2016-04-02
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Pontifícia Universidade Católica de São Paulo
The tool proposed in this research aims at bringing together the Semantic Web technologies and content publishers, this way enabling the latter to contribute to creating structured data and metadata about texts and information they may make available on the Web. The general goal is to investigate the technical feasibility of developing a semantic annotation tool that enables content publishers to contribute to the Semantic Web ecosystem. Based on (BERNERS-LEE et al., 2001; ALESSO; SMITH, 2006; RODRÍGUEZ-ROCHA et al., 2015; GUIZZARDI, 2005; ISOTANI; BITTENCOURT, 2015), the Semantic Web is presented according to its technological stack. Considering the importance of the ontologies and vocabularies used to create Semantic Web applications, the essential subjects of the conceptual modelling and the ontology language used on the Web are presented. In order to provide the necessary concepts to use semantic annotations, this dissertation presents both the way annotations are used (manual, semi-automatic, and automatic) as well as the way these annotations are integrated with resources available on the Web. The state-of-the-art chapter describes recent projects and related work on the use of Semantic Web within Web-content publishing context. The methodology adopted by this research is based on (SANTAELLA; VIEIRA, 2008; GIL, 2002), in compliance with the exploratory approach for research. This research presents the proposal and the architecture of the semantic annotation tool, which uses shared vocabulary in order to create structured data based on textual content. In conclusion, this dissertation addresses the possibilities of future work, both in terms of the implementation of the tool in a real use case as well as in new scientific research
A proposta apresentada nesta pesquisa busca aproximar as tecnologias de Web Semântica dos usuários publicadores de conteúdo na Web, permitindo que estes contribuam com a geração de dados estruturados e metadados sobre textos e informações que venham disponibilizar na Web. O objetivo geral deste trabalho é investigar a viabilidade técnica de desenvolvimento de uma ferramenta de anotação semântica que permita aos usuários publicadores de conteúdo contribuírem para o ecossistema de Web Semântica. Com suporte de (BERNERS-LEE et al., 2001; ALESSO; SMITH, 2006; RODRÍGUEZ-ROCHA et al., 2015; GUIZZARDI, 2005; ISOTANI; BITTENCOURT, 2015) apresenta-se o tópico de Web Semântica de acordo com a pilha tecnológica que mostra o conjunto de tecnologias proposto para a sua realização. Considerando a importância de ontologias e vocabulários para a construção de aplicações de Web Semântica, são apresentados então os tópicos fundamentais de modelagem conceitual e a linguagem de ontologias para Web. Para fornecer a base necessária para a utilização de anotações semânticas são apresentados, além da definição, os modos de uso de anotações (manual, semi-automático e automático) e as formas de integrar essas anotações com recursos disponíveis nas tecnologias da Web Semântica. O estado da arte contempla trabalhos e projetos recentes sobre o uso de Web Semântica no contexto de publicação de conteúdo na Web. A metodologia é baseada na proposta apresentada por SANTAELLA; VIEIRA (2008), seguindo uma abordagem exploratória para a condução da pesquisa. É apresentada a proposta e os componentes de uma ferramenta de anotação semântica que utiliza vocabulários compartilhados para geração de dados estruturados a partir de conteúdo textual. Concluindo o trabalho, são apresentadas as possibilidades futuras, tanto da implementação da ferramenta em um cenário real, atestando sua viabilidade técnica, quanto novos trabalhos encaminhados a partir desta pesquisa
APA, Harvard, Vancouver, ISO, and other styles
14

RULA, ANISA. "Time-related quality dimensions in linked data." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2014. http://hdl.handle.net/10281/81717.

Full text
Abstract:
Over the last few years, there has been an increasing di↵usion of Linked Data as a standard way to publish interlinked structured data on the Web, which allows users, and public and private organizations to fully exploit a large amount of data from several domains that were not available in the past. Although gathering and publishing such massive amount of structured data is certainly a step in the right direction, quality still poses a significant obstacle to the uptake of data consumption applications at large-scale. A crucial aspect of quality regards the dynamic nature of Linked Data where information can change rapidly and fail to reflect changes in the real world, thus becoming out-date. Quality is characterised by di↵erent dimensions that capture several aspects of quality such as accuracy, currency, consistency or completeness. In particular, the aspects of Linked Data dynamicity are captured by Time-Related Quality Dimen- sions such as data currency. The assessment of Time-Related Quality Dimensions, which is the task of measuring the quality, is based on temporal information whose collection poses several challenges regarding their availability, representation and diversity in Linked Data. The assessment of Time-Related Quality Dimensions supports data consumers in their decisions whether information are valid or not. The main goal of this thesis is to develop techniques for assessing Time-Related Quality Dimensions in Linked Data, which must overcome several challenges posed by Linked Data such as third-party applications, variety of data, high volume of data or velocity of data. The major contributions of this thesis can be summarized as follows: it presents a general settings of definitions for quality dimensions and measures adopted in Linked Data; it provides a large-scale analysis of approaches for representing temporal information in Linked Data; it provides a sharable and interoperable conceptual model which integrates vocabularies used to represent temporal information required for the assessment of Time-Related Quality Di- mensions; it proposes two domain-independent techniques to assess data currency that work with incomplete or inaccurate temporal information and finally it pro- vides an approach that enrich information with time intervals representing their temporal validity.
APA, Harvard, Vancouver, ISO, and other styles
15

Persson, Martin. "Semantic Mapping using Virtual Sensors and Fusion of Aerial Images with Sensor Data from a Ground Vehicle." Doctoral thesis, Örebro : Örebro University, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-2186.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Bai, Xi. "Peer-to-peer, multi-agent interaction adapted to a web architecture." Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/7968.

Full text
Abstract:
The Internet and Web have brought in a new era of information sharing and opened up countless opportunities for people to rethink and redefine communication. With the development of network-related technologies, a Client/Server architecture has become dominant in the application layer of the Internet. Nowadays network nodes are behind firewalls and Network Address Translations, and the centralised design of the Client/Server architecture limits communication between users on the client side. Achieving the conflicting goals of data privacy and data openness is difficult and in many cases the difficulty is compounded by the differing solutions adopted by different organisations and companies. Building a more decentralised or distributed environment for people to freely share their knowledge has become a pressing challenge and we need to understand how to adapt the pervasive Client/Server architecture to this more fluid environment. This thesis describes a novel framework by which network nodes or humans can interact and share knowledge with each other through formal service-choreography specifications in a decentralised manner. The platform allows peers to publish, discover and (un)subscribe to those specifications in the form of Interaction Models (IMs). Peer groups can be dynamically formed and disbanded based on the interaction logs of peers. IMs are published in HTML documents as normal Web pages indexable by search engines and associated with lightweight annotations which semantically enhance the embedded IM elements and at the same time make IM publications comply with the Linked Data principles. The execution of IMs is decentralised on each peer via conventional Web browsers, potentially giving the system access to a very large user community. In this thesis, after developing a proof-of-concept implementation, we carry out case studies of the resulting functionality and evaluate the implementation across several metrics. An increasing number of service providers have began to look for customers proactively, and we believe that in the near future we will not search for services but rather services will find us through our peer communities. Our approaches show how a peer-to-peer architecture for this purpose can be obtained on top of a conventional Client/Server Web infrastructure.
APA, Harvard, Vancouver, ISO, and other styles
17

Khan, Arshad Ali. "Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results." Thesis, University of Southampton, 2018. https://eprints.soton.ac.uk/428046/.

Full text
Abstract:
Online searching of multi-disciplinary web repositories is a topic of increasing importance as the number of repositories increases and the diversity of skills and backgrounds of their users widens. Earlier term-frequency based approaches have been improved by ontology-based semantic annotation, but such approaches are predominantly driven by "domain ontologies engineering first" and lack dynamicity, whereas the information is dynamic; the meaning of things changes with time; and new concepts are constantly being introduced. Further, there is no sustainable framework or method, discovered so far, which could automatically enrich the content of heterogeneous online resources for information retrieval over time. Furthermore, the methods and techniques being applied are fast becoming inadequate due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. In the face of such complexities, term matching alone between a query and the indexed documents will no longer fulfil complex user needs. The ever growing gap between syntax and semantics needs to be continually bridged in order to address the above issues; and ensure accurate search results retrieval, against natural language queries, despite such challenges. This thesis investigates that by domain-specific expert crowd-annotation of content, on top of the automatic semantic annotation (using Linked Open Data sources), the contemporary value of content in scientific repositories, can be continually enriched and sustained. A purpose-built annotation, indexing and searching environment has been developed and deployed to a web repository, which hosts more than 3,400 heterogeneous web documents. Based on expert crowd annotations, automatic LoD-based named entity extraction and search results evaluations, this research finds that search results retrieval, having the crowd-sourced element, performs better than those having no crowd-sourced element. This thesis also shows that a consensus can be reached between the expert and non-expert crowd-sourced annotators on annotating and tagging the content of web repositories, using the controlled vocabulary (typology) and free-text terms and keywords.
APA, Harvard, Vancouver, ISO, and other styles
18

Ayllón-Benítez, Aarón. "Development of new computational methods for a synthetic gene set annotation." Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0305.

Full text
Abstract:
Les avancées dans l'analyse de l'expression différentielle de gènes ont suscité un vif intérêt pour l'étude d'ensembles de gènes présentant une similarité d'expression au cours d'une même condition expérimentale. Les approches classiques pour interpréter l'information biologique reposent sur l'utilisation de méthodes statistiques. Cependant, ces méthodes se focalisent sur les gènes les plus connus tout en générant des informations redondantes qui peuvent être éliminées en prenant en compte la structure des ressources de connaissances qui fournissent l'annotation. Au cours de cette thèse, nous avons exploré différentes méthodes permettant l'annotation d'ensembles de gènes.Premièrement, nous présentons les solutions visuelles développées pour faciliter l'interprétation des résultats d'annota-tion d'un ou plusieurs ensembles de gènes. Dans ce travail, nous avons développé un prototype de visualisation, appelé MOTVIS, qui explore l'annotation d'une collection d'ensembles des gènes. MOTVIS utilise ainsi une combinaison de deux vues inter-connectées : une arborescence qui fournit un aperçu global des données mais aussi des informations détaillées sur les ensembles de gènes, et une visualisation qui permet de se concentrer sur les termes d'annotation d'intérêt. La combinaison de ces deux visualisations a l'avantage de faciliter la compréhension des résultats biologiques lorsque des données complexes sont représentées.Deuxièmement, nous abordons les limitations des approches d'enrichissement statistique en proposant une méthode originale qui analyse l'impact d'utiliser différentes mesures de similarité sémantique pour annoter les ensembles de gènes. Pour évaluer l'impact de chaque mesure, nous avons considéré deux critères comme étant pertinents pour évaluer une annotation synthétique de qualité d'un ensemble de gènes : (i) le nombre de termes d'annotation doit être réduit considérablement tout en gardant un niveau suffisant de détail, et (ii) le nombre de gènes décrits par les termes sélectionnés doit être maximisé. Ainsi, neuf mesures de similarité sémantique ont été analysées pour trouver le meilleur compromis possible entre réduire le nombre de termes et maintenir un niveau suffisant de détails fournis par les termes choisis. Tout en utilisant la Gene Ontology (GO) pour annoter les ensembles de gènes, nous avons obtenu de meilleurs résultats pour les mesures de similarité sémantique basées sur les nœuds qui utilisent les attributs des termes, par rapport aux mesures basées sur les arêtes qui utilisent les relations qui connectent les termes. Enfin, nous avons développé GSAn, un serveur web basé sur les développements précédents et dédié à l'annotation d'un ensemble de gènes a priori. GSAn intègre MOTVIS comme outil de visualisation pour présenter conjointement les termes représentatifs et les gènes de l'ensemble étudié. Nous avons comparé GSAn avec des outils d'enrichissement et avons montré que les résultats de GSAn constituent un bon compromis pour maximiser la couverture de gènes tout en minimisant le nombre de termes.Le dernier point exploré est une étape visant à étudier la faisabilité d'intégrer d'autres ressources dans GSAn. Nous avons ainsi intégré deux ressources, l'une décrivant les maladies humaines avec Disease Ontology (DO) et l'autre les voies métaboliques avec Reactome. Le but était de fournir de l'information supplémentaire aux utilisateurs finaux de GSAn. Nous avons évalué l'impact de l'ajout de ces ressources dans GSAn lors de l'analyse d’ensembles de gènes. L'intégration a amélioré les résultats en couvrant d'avantage de gènes sans pour autant affecter de manière significative le nombre de termes impliqués. Ensuite, les termes GO ont été mis en correspondance avec les termes DO et Reactome, a priori et a posteriori des calculs effectués par GSAn. Nous avons montré qu'un processus de mise en correspondance appliqué a priori permettait d'obtenir un plus grand nombre d'inter-relations entre les deux ressources
The revolution in new sequencing technologies, by strongly improving the production of omics data, is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by a priori selecting the over-represented terms and focus on the most studied genes that may represent a limited coverage of annotated genes within a gene set. During this thesis, we explored different methods for annotating gene sets. In this frame, we developed three studies allowing the annotation of gene sets and thus improving the understanding of their biological context.First, visualization approaches were applied to represent annotation results provided by enrichment analysis for a gene set or a repertoire of gene sets. In this work, a visualization prototype called MOTVIS (MOdular Term VISualization) has been developed to provide an interactive representation of a repertoire of gene sets combining two visual metaphors: a treemap view that provides an overview and also displays detailed information about gene sets, and an indented tree view that can be used to focus on the annotation terms of interest. MOTVIS has the advantage to solve the limitations of each visual metaphor when used individually. This illustrates the interest of using different visual metaphors to facilitate the comprehension of biological results by representing complex data.Secondly, to address the issues of enrichment analysis, a new method for analyzing the impact of using different semantic similarity measures on gene set annotation was proposed. To evaluate the impact of each measure, two relevant criteria were considered for characterizing a "good" synthetic gene set annotation: (i) the number of annotation terms has to be drastically reduced while maintaining a sufficient level of details, and (ii) the number of genes described by the selected terms should be as large as possible. Thus, nine semantic similarity measures were analyzed to identify the best possible compromise between both criteria while maintaining a sufficient level of details. Using GO to annotate the gene sets, we observed better results with node-based measures that use the terms’ characteristics than with edge-based measures that use the relations terms. The annotation of the gene sets achieved with the node-based measures did not exhibit major differences regardless of the characteristics of the terms used. Then, we developed GSAn (Gene Set Annotation), a novel gene set annotation web server that uses semantic similarity measures to synthesize a priori GO annotation terms. GSAn contains the interactive visualization MOTVIS, dedicated to visualize the representative terms of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.At last, the third work consisted in enriching the annotation results provided by GSAn. Since the knowledge described in GO may not be sufficient for interpreting gene sets, other biological information, such as pathways and diseases, may be useful to provide a wider biological context. Thus, two additional knowledge resources, being Reactome and Disease Ontology (DO), were integrated within GSAn. In practice, GO terms were mapped to terms of Reactome and DO, before and after applying the GSAn method. The integration of these resources improved the results in terms of gene coverage without affecting significantly the number of involved terms. Two strategies were applied to find mappings (generated or extracted from the web) between each new resource and GO. We have shown that a mapping process before computing the GSAn method allowed to obtain a larger number of inter-relations between the two knowledge resources
APA, Harvard, Vancouver, ISO, and other styles
19

Kozák, David. "Indexace rozsáhlých textových dat a vyhledávání v zaindexovaných datech." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2020. http://www.nusl.cz/ntk/nusl-417263.

Full text
Abstract:
Tématem této práce je sémantické vyhledávání ve velkých textových datech. Cílem je navrhnout a implementovat vyhledávač, který se bude efektivně dotazovat nad sémanticky obohacenými dokumenty a prezentovat výsledky uživatelsky přívětivým způsobem. V práci jsou nejdříve analyzovány současné sémantické vyhledávače, spolu s jejich silnými a slabými stránkami. Poté je přednesen návrh nového vyhledávače s vlastním dotazovacím jazykem. Tento systém se skládá z komponent pro indexaci a dotazování se nad dokumenty, management serveru, překladače pro dotazovací jazyk a dvou klientských aplikací, webové a konzolové. Vyhledávač byl úspěšně navržen, implementován i nasazen a je veřejně dostupný na Internetu. Výsledky práce umožňují široké veřejnosti využívat sémantického vyhledávání.
APA, Harvard, Vancouver, ISO, and other styles
20

La, Rosa Giovanni. "Prototipazione di un Modello di Trust in una rete di sensori." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text
Abstract:
La recente esplosione di tecnologie quali Pervasive Computing e IoT ha fatto sı̀ che, negli ultimi anni, da una parte facessero ingresso nelle nostre vite dispositivi in grado di interagire con l’ambiente circostante grazie ai sensori di cui sono dotati che permettono di eseguire continue rilevazioni di numerosi tipi di dati, dall’altra che molti oggetti, anche di uso comune, venissero dotati di tecnologie di comunicazione e che divenissero capaci di comunicare in internet. Questo, oltre a portare molteplici benefici in numerosi campi di applicazione, ha generato un forte interesse dal punto di vista commerciale e numerosi produttori in tutto il mondo si sono lanciati in questo mercato. Non esistendo un criterio di comunicazione condiviso da tutti, questo ha determinato che dal punto di vista tecnologico la situazione si andasse cosı̀ a frammentare. L’obiettivo di questo studio è di illustrare come l’integrazione di un modello di trust possa fornire una valida soluzione al problema dell’eterogeneità e alla bassa qualità dei dati rilevati dovuta al basso costo dei dispositivi coinvolti. Contestualmente apporta numerosi benefici all’efficenza del sistema in termine di ottimizzazione delle risorse e di attuazione di strategie di identificazione degli errori.
APA, Harvard, Vancouver, ISO, and other styles
21

Orlando, João Paulo. "Usando aplicações ricas para internet na criação de um ambiente para visualização e edição de regras SWRL." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-25072012-101810/.

Full text
Abstract:
A Web Semântica é uma maneira de explorar a associação de significados explícitos aos conteúdos de documentos presentes na Web, para que esses possam ser processados diretamente ou indiretamente por máquinas. Para possibilitar esse processamento, os computadores necessitam ter acesso a coleções estruturadas de informações e a conjuntos de regras de inferência sobre esses conteúdos. O SWRL permite a combinação de regras e termos de ontologias (definidos por OWL) para aumentar a expressividade de ambos. Entretanto, conforme um conjunto de regras cresce, ele torna-se de difícil compreensão e sujeito a erros, especialmente quando mantido por mais de uma pessoa. Para que o SWRL se torne um verdadeiro padrão web, deverá ter a capacidade de lidar com grandes conjuntos de regras. Para encontrar soluções para este problema, primeiramente, foi realizado um levantamento sobre sistemas de regras de negócios, descobrindo os principais recursos e interfaces utilizados por eles, e então, com as descobertas, propusemos técnicas que usam novas representações visuais em uma aplicação web. Elas permitem detecção de erro, identificação de regras similares, agrupamento, visualização de regras e o reuso de átomos para novas regras. Estas técnicas estão implementadas no SWRL Editor, um plug-in open-source para o Web-Protégé (um editor de ontologias baseado na web) que utiliza ferramentas de colaboração para permitir que grupos de usuários possam não só ver e editar regras, mas também comentar e discutir sobre elas. Foram realizadas duas avaliações do SWRL Editor. A primeira avaliação foi um estudo de caso para duas ontologias da área biomédica (uma área onde regras SWRL são muito usadas) e a segunda uma comparação com os únicos três editores de regras SWRL encontrados na literatura. Nessa comparação foi mostrando que ele implementa mais recursos encontrados em sistemas de regras em geral
The Semantic Web is a way to associate explicitly meaning to the content of web documents to allow them to be processed directly by machines. To allow this processing, computers need to have access to structured collections of information and sets of rules to reason about these content. The Semantic Web Rule Language (SWRL) allows the combination of rules and ontology terms, defined using the Web Ontology Language (OWL), to increase the expressiveness of both. However, as rule sets grow, they become difficult to understand and error prone, especially when used and maintained by more than one person. If SWRL is to become a true web standard, it has to be able to handle big rule sets. To find answers to this problem, we first surveyed business rule systems and found the key features and interfaces they used and then, based on our finds, we proposed techniques and tools that use new visual representations to edit rules in a web application. They allow error detection, rule similarity analysis, rule clustering visualization and atom reuse between rules. These tools are implemented in the SWRL Editor, an open source plug-in for Web-Protégé (a web-based ontology editor) that leverages Web-Protégés collaborative tools to allow groups of users to not only view and edit rules but also comment and discuss about them. We have done two evaluations of the SWRL Editor. The first one was a case study of two ontologies from the biomedical domain, the second was a comparison with the SWRL editors available in the literature, there are only three. In this comparison, it has been shown that the SWRL Editor implements more of the key resources found on general rule systems than the other three editors
APA, Harvard, Vancouver, ISO, and other styles
22

Usbeck, Ricardo. "Knowledge Extraction for Hybrid Question Answering." Doctoral thesis, Universitätsbibliothek Leipzig, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-225097.

Full text
Abstract:
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.
APA, Harvard, Vancouver, ISO, and other styles
23

Alili, Hiba. "Intégration de données basée sur la qualité pour l'enrichissement des sources de données locales dans le Service Lake." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED019.

Full text
Abstract:
De nos jours, d’énormes volumes de données sont créés en continu et les utilisateurs s’attendent à ce que ceux-ci soient collectés, stockés et traités quasiment en temps réel. Ainsi, les lacs de données sont devenus une solution attractive par rapport aux entrepôts de données classiques coûteux et fastidieux (nécessitant une démarche ETL), pour les entreprises qui souhaitent stocker leurs données. Malgré leurs volumes, les données stockées dans les lacs de données des entreprises sont souvent incomplètes voire non mises à jour vis-à-vis des besoins (requêtes) des utilisateurs.Les sources de données locales ont donc besoin d’être enrichies. Par ailleurs, la diversité et l’expansion du nombre de sources d’information disponibles sur le web a rendu possible l’extraction des données en temps réel. Ainsi, afin de permettre d’accéder et de récupérer l’information de manière simple et interopérable, les sources de données sont de plus en plus intégrées dans les services Web. Il s’agit plus précisément des services de données, y compris les services DaaS du Cloud Computing. L’enrichissement manuel des sources locales implique plusieurs tâches fastidieuses telles que l’identification des services pertinents, l’extraction et l’intégration de données hétérogènes, la définition des mappings service-source, etc. Dans un tel contexte, nous proposons une nouvelle approche d’intégration de données centrée utilisateur. Le but principal est d’enrichir les sources de données locales avec des données extraites à partir du web via les services de données. Cela permettrait de satisfaire les requêtes des utilisateurs tout en respectant leurs préférences en terme de coût d’exécution et de temps de réponse et en garantissant la qualité des résultats obtenus
In the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and timeconsumingETL (Extract, Transform, Load) processes are used, towards data lakes in order to manage their increasinglygrowing data. Yet the stored knowledge in companies’ databases, even though in the constructed data lakes, can never becomplete and up-to-date, because of the continuous production of data. Local data sources often need to be augmentedand enriched with information coming from external data sources. Unfortunately, the data enrichment process is one of themanual labors undertaken by experts who enrich data by adding information based on their expertise or select relevantdata sources to complete missing information. Such work can be tedious, expensive and time-consuming, making itvery promising for automation. We present in this work an active user-centric data integration approach to automaticallyenrich local data sources, in which the missing information is leveraged on the fly from web sources using data services.Accordingly, our approach enables users to query for information about concepts that are not defined in the data sourceschema. In doing so, we take into consideration a set of user preferences such as the cost threshold and the responsetime necessary to compute the desired answers, while ensuring a good quality of the obtained results
APA, Harvard, Vancouver, ISO, and other styles
24

Cheng, Heng-Tze. "Learning and Recognizing The Hierarchical and Sequential Structure of Human Activities." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/293.

Full text
Abstract:
The mission of the research presented in this thesis is to give computers the power to sense and react to human activities. Without the ability to sense the surroundings and understand what humans are doing, computers will not be able to provide active, timely, appropriate, and considerate services to the humans. To accomplish this mission, the work stands on the shoulders of two giants: Machine learning and ubiquitous computing. Because of the ubiquity of sensor-enabled mobile and wearable devices, there has been an emerging opportunity to sense, learn, and infer human activities from the sensor data by leveraging state-of-the-art machine learning algorithms. While having shown promising results in human activity recognition, most existing approaches using supervised or semi-supervised learning have two fundamental problems. Firstly, most existing approaches require a large set of labeled sensor data for every target class, which requires a costly effort from human annotators. Secondly, an unseen new activity cannot be recognized if no training samples of that activity are available in the dataset. In light of these problems, a new approach in this area is proposed in our research. This thesis presents our novel approach to address the problem of human activity recognition when few or no training samples of the target activities are available. The main hypothesis is that the problem can be solved by the proposed NuActiv activity recognition framework, which consists of modeling the hierarchical and sequential structure of human activities, as well as bringing humans in the loop of model training. By injecting human knowledge about the hierarchical nature of human activities, a semantic attribute representation and a two-layer attribute-based learning approach are designed. To model the sequential structure, a probabilistic graphical model is further proposed to take into account the temporal dependency of activities and attributes. Finally, an active learning algorithm is developed to reinforce the recognition accuracy using minimal user feedback. The hypothesis and approaches presented in this thesis are validated by two case studies and real-world experiments on exercise activities and daily life activities. Experimental results show that the NuActiv framework can effectively recognize unseen new activities even without any training data, with up to 70-80% precision and recall rate. It also outperforms supervised learning with limited labeled data for the new classes. The results significantly advance the state of the art in human activity recognition, and represent a promising step towards bridging the gap between computers and humans.
APA, Harvard, Vancouver, ISO, and other styles
25

HENRIQUES, Hamon Barros. "Anotação automática de dados geográficos baseada em bancos de dados abertos e interligados." Universidade Federal de Campina Grande, 2015. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/594.

Full text
Abstract:
Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-05-07T16:21:38Z No. of bitstreams: 1 HAMON BARROS HENRIQUES - DISSERTAÇÃO PPGCC 2015..pdf: 3136584 bytes, checksum: a73ddf1f3aa24a230079e12abc8cee00 (MD5)
Made available in DSpace on 2018-05-07T16:21:38Z (GMT). No. of bitstreams: 1 HAMON BARROS HENRIQUES - DISSERTAÇÃO PPGCC 2015..pdf: 3136584 bytes, checksum: a73ddf1f3aa24a230079e12abc8cee00 (MD5) Previous issue date: 2015-08-31
Recentemente, infraestruturas de dados espaciais (IDE) têm se popularizado como uma importante solução para facilitar a interoperabilidade de dados geográficos oferecidos por diferentes organizações. Um importante desafio que precisa ser superado por estas infraestruturas consiste em permitir que seus clientes possam localizar facilmente os dados e serviços que se encontram disponíveis. Atualmente, esta tarefa é implementada a partir de serviços de catálogo. Embora tais serviços tenham representado um importante avanço para a recuperação de dados geográficos, estes ainda possuem limitações importantes. Algumas destas limitações surgem porque os serviços de catálogo resolvem suas consultas com base nas informações contidas em seus registros de metadados, que normalmente descrevem as características do serviço como um todo. Além disso, muitos catálogos atuais resolvem consultas com restrições temáticas apenas com base em palavras-chaves, e não possuem meios formais para descrever a semântica dos recursos disponíveis. Para resolver a falta de semântica, esta dissertação apresenta uma solução para a anotação semântica automática das camadas e dos seus respectivos atributos disponibilizados em uma IDE. Com isso, motores de busca, que utilizam ontologias como insumo para a resolução de suas consultas, irão encontrar os dados geográficosqueestãorelacionadossemanticamenteaumdeterminadotema pesquisado. Também foi descrita nesta pesquisa uma avaliação do desempenho da solução proposta sobre uma amostra de serviços Web Feature Service.
Recently, Spatial Data Infrastructure (SDI) has become popular as an important solution for easing the interoperability if geographic data offered by different organizations. An important challenge that must be overcome by such infrastructures consists in allowing their users to easily locating the available data and services. Presently, this task is implemented by means of catalog services. Although such services represent an important advance for retrieval of geographic data, they still have serious limitations. Some of these limitations arise because the catalog service resolves their queries based on information contained in their metadata records, which normally describes the characteristics of the service as a whole. In addition, many current catalogs solve queries with thematic restrictions based only on keywords, and have no formal means for describing the semantics of available resources. To resolve the lack of semantics, this dissertation presents a solution for automatic semantic annotation of feature types and their attributes available in an IDE.With this, search engines, which use ontologies as input for solving their queries will find the geographic data that are semantically related to a particular topic searched. Also has described in this research an evaluation of the performance of the proposed solution on a sample of Web Feature Service services.
APA, Harvard, Vancouver, ISO, and other styles
26

Lodrová, Dana. "Bezpečnost biometrických systémů." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-261226.

Full text
Abstract:
Hlavním přínosem této práce jsou dva nové přístupy pro zvýšení bezpečnosti biometrických systémů založených na rozpoznávání podle otisků prstů. První přístup je z oblasti testování živosti a znemožňuje použití různých typů falešných otisků prstů a jiných metod oklamání senzoru v průběhu procesu snímání otisků. Tento patentovaný přístup je založen na změně barvy a šířky papilárních linií vlivem přitlačení prstu na skleněný podklad. Výsledná jednotka pro testování živosti může být integrována do optických senzorů.  Druhý přístup je z oblasti standardizace a zvyšuje bezpečnost a interoperabilitu procesů extrakce markantů a porovnání. Pro tyto účely jsem vytvořila metodologii, která stanovuje míry sémantické shody pro extraktory markantů otisků prstů. Markanty nalezené testovanými extraktory jsou porovnávány oproti Ground-Truth markantům získaným pomocí shlukování dat poskytnutých daktyloskopickými experty. Tato navrhovaná metodologie je zahrnuta v navrhovaném dodatku k normě ISO/IEC 29109-2 (Amd. 2 WD4).
APA, Harvard, Vancouver, ISO, and other styles
27

Yu, Ching-Tzu, and 尤敬慈. "A Semantic Annotation Approach for Dynamic IoT Sensor Data." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/20036297816656973858.

Full text
Abstract:
碩士
國立交通大學
資訊管理研究所
103
In a dynamic Internet of Things (IoT) environment, sensors are used to continually collect data. However, it is difficult to transform those data into a machine-readable and machine-interpretable form. In this paper, we propose a semantic annotation approach to annotate sensor data via semantics. First, a base ontology is built. Then, new knowledge is collected from input data by using the K-Means clustering, and updated into the base ontology. The updated ontology forms the basis for semantic annotation. The simulation results show that we analysis the data for one month period week by week using the proposed approach is able to find useful knowledge out of the new input data. Therefore, we can annotate sensor data with more knowledge.
APA, Harvard, Vancouver, ISO, and other styles
28

Ja-HwungSu and 蘇家輝. "Multimedia Data Mining Techniques for Semantic Annotation, Retrieval and Recommendation." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/05323447331505634288.

Full text
Abstract:
博士
國立成功大學
資訊工程學系碩博士班
98
In recent years, the advance of digital capturing technologies lead to the rapid growth of multimedia data in various formats, such as image, music, video and so on. Moreover, the modern telecommunication systems make multimedia data widespread and extremely large. Hence, how to conceptualize, retrieve and recommend the multimedia data from such massive multimedia data resources has been becoming an attractive and challenging issue over the past few years. To deal with this issue, the primary aim of this dissertation is to develop effective multimedia data mining techniques for discovering the valuable knowledge from multimedia data, so as to achieve the high quality of multimedia annotation, retrieval and recommendation. Nowadays, a considerable number of studies in the field of multimedia annotations incur the difficulties of diverse relationships between human concepts and visual contents, namely diverse visual-concept associations. So-called visual-concept diversity indicates that, a set of different concepts share with very similar visual features. To alleviate the problems of diverse visual-concept associations, this dissertation presents the integrated mining of visual, speech and text features for semantic image/video annotation. For image annotation, we propose a visual-based annotation method to disambiguate the image sense while a number of senses are shared by a number of images. Additionally, a textual-based annotation method, which attempts to discover the affinities of image captions and web-page keywords, is also proposed to attack the lack of visual-based annotations. For video annotation, with considering the temporal continuity, the frequent visual, textual and visual-textual patterns can be mined to support semantic video annotation by proposed video annotation models. Based on the image annotation, the user’s interest and visual images can be bridged semantically for further textual-based image retrieval. However, little work has highlighted the conceptual retrieval from textual annotations to visual images in the last few years. To this end, the second intention in this dissertation is to retrieve the images by proposed image annotation, concept matching and fuzzy ranking techniques. In addition to textual-based image retrieval, the textual-based video retrieval cannot earn the user’s satisfaction either due to the problems of diverse query concepts. To supplement the weakness of textual-based video retrieval, we propose an innovative method to mine the temporal patterns from the video contents for supporting content-based video retrieval. On the basis of discovered temporal visual patterns, an efficient indexing technique and an effective sequence matching technique are integrated to reduce the computation cost and to raise the retrieval accuracy, respectively. In contrast to passive image/video retrieval, music recommendation is the final concentration in this dissertation to actively provide the users with the preferred music pieces. In this work, we design a novel music recommender that integrates music content mining and collaborative filtering to help the users find what she/he prefers from a huge amount of music collections. By discovering preferable perceptual-patterns from music pieces, the user’s listening interest and music can be associated effectively. Also the traditional rating diversity problem can be alleviated. For each proposed approach above, the experimental results in this dissertation reveal that, our proposed multimedia data mining methods are beneficial for better multimedia annotation, retrieval and recommendation so as to apply to some real multimedia applications, such as mobile multimedia retrieval and recommendation.
APA, Harvard, Vancouver, ISO, and other styles
29

Καναβός, Ανδρέας. "Σημασιολογικές μηχανές αναζήτησης Παγκόσμιου Ιστού." Thesis, 2012. http://hdl.handle.net/10889/5328.

Full text
Abstract:
Οι μηχανές αναζήτησης είναι ένα ανεκτίμητο εργαλείο για την ανάκτηση πληροφοριών από το διαδίκτυο. Απαντώντας στα ερωτήματα του χρήστη, επιστρέφουν μια λίστα με αποτελέσματα, ταξινομημένα κατά σειρά, με βάση τη συνάφεια του περιεχομένου τους προς το ερώτημα. Ωστόσο, αν και οι μηχανές αναζήτησης είναι σίγουρα αρκετά καλές στην αναζήτηση συγκεκριμένων ερωτημάτων, όπως είναι η εύρεση μιας συγκεκριμένης ιστοσελίδας, αντίθετα μπορούν να είναι λιγότερο αποτελεσματικές όσον αφορά την αναζήτηση ασαφών, προς αυτές, ερωτημάτων, όπως για παράδειγμα όταν συναντούμε το φαινόμενο της αμφισημίας, όπου μια λέξη μπορεί να πάρει περισσότερες από μία έννοιες μέσα στα συμφραζόμενα διαφορετικής πρότασης. Άλλο ένα παράδειγμα ερωτήματος είναι όταν υπάρχουν περισσότερες από δύο υποκατηγορίες και νοήματα σ’ ένα ερώτημα, πράγμα που σημαίνει ότι ο χρήστης θα πρέπει να διατρέξει έναν μεγάλο αριθμό αποτελεσμάτων για να βρει αυτά που τον ενδιαφέρουν. Στόχος της παρούσας διπλωματικής εργασίας είναι η ανάπτυξη ενός έμπειρου συστήματος, που θα μετά-επεξεργάζεται τις απαντήσεις μας κλασικής μηχανής αναζήτησης και θα ομαδοποιεί τα αποτελέσματα σε μια ιεραρχία από κατηγορίες με βάση το περιεχόμενο τους. Οι σημαντικότερες σημερινές λύσεις πάνω στο πρόβλημα της αντιστοίχησης των αποτελεσμάτων σε συστάδες είναι τα συστήματα Vivisimo, Carrot, CREDO και SnakeT. Η συνεισφορά που προτείνεται στη παρούσα εργασία, είναι η χρήση μίας σειράς τεχνικών που βελτιώνουν την ποιότητα των ομάδων απάντησης. Μία πρωτότυπη τεχνική που χρησιμοποιήθηκε στην παρούσα εργασία είναι η αναδιατύπωση των ερωτημάτων (query reformulation) μέσω διαφόρων στρατηγικών. Ο λόγος που παρουσιάζονται τέτοιες στρατηγικές, είναι επειδή συχνά οι χρήστες τροποποιούν ένα προηγούμενο ερώτημα αναζήτησης ώστε να ανακτήσουν καλύτερα αποτελέσματα ή κι επειδή πολλές φορές δεν μπορούν να διατυπώσουν σωστά ένα ερώτημα λόγω της μη γνώσης επιθυμητών αποτελεσμάτων. Επιπλέον, επωφεληθήκαμε από τη Wikipedia αντλώντας δεδομένα από τους τίτλους των σελίδων αλλά κι από τις κατηγορίες στις οποίες ανήκουν αυτές οι σελίδες. Αυτό γίνεται μέσω της σύνδεσης των συχνών όρων που ανήκουν στα κείμενα των αποτελεσμάτων αναζήτησης με τη σημασιολογική εγκυκλοπαίδεια Wikipedia, με σκοπό την εξαγωγή των διαφορετικών εννοιών και νοημάτων του κάθε όρου. Ειδικότερα, αναζητείται στη Wikipedia η ύπαρξη σελίδας (ή σελίδων για το φαινόμενο της αμφισημίας) που αντιστοιχίζονται στους όρους αυτούς με αποτέλεσμα τη χρησιμοποίηση του τίτλου και της κατηγορίας ως επιπρόσθετη πληροφορία. Τέλος η Wikipedia χρησιμοποιείται και στην ανάθεση ετικετών στις τελικές συστάδες ως επιπρόσθετη πληροφορία κάθε ξεχωριστού κειμένου που βρίσκεται στη συστάδα.
-
APA, Harvard, Vancouver, ISO, and other styles
30

Kýpeť, Jakub. "Sémantická anotace a dotazování nad RDF daty." Master's thesis, 2015. http://www.nusl.cz/ntk/nusl-336763.

Full text
Abstract:
Title: Semantic annotation and querying RDF data Author: Jakub Kýpeť Department: Department of Software Engineering Supervisor: Prof. RNDr. Peter Vojtáš, DrSc. Abstract: The presented thesis in detail describes a design and an implementation of self-sustained server application, that allows us to create and manage semantic annotations for various web pages. In the first part it describes the manual annotations and the human interface we have build for them. In the second part it also describes our implementation for a web crawler and an automatic annotation system utilizing this crawler. The last part of the thesis analyzes the testing of this automated system that has been performed using several e- commerce websites with different domains. Keywords: semantic annotation, querying RDF data, user interface, web crawl- ing, automatization
APA, Harvard, Vancouver, ISO, and other styles
31

Usbeck, Ricardo. "Knowledge Extraction for Hybrid Question Answering." Doctoral thesis, 2016. https://ul.qucosa.de/id/qucosa%3A15647.

Full text
Abstract:
Since the proposal of hypertext by Tim Berners-Lee to his employer CERN on March 12, 1989 the World Wide Web has grown to more than one billion Web pages and still grows. With the later proposed Semantic Web vision,Berners-Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is the current implementation of the Semantic Web) grow continuously. This is a mixed blessing, as the two forms of the Web grow concurrently and most commonly contain different pieces of information. Modern information systems must thus bridge a Semantic Gap to allow a holistic and unified access to information about a particular information independent of the representation of the data. One way to bridge the gap between the two forms of the Web is the extraction of structured data, i.e., RDF, from the growing amount of unstructured and semi-structured information (e.g., tables and XML) on the Document Web. Note, that unstructured data stands for any type of textual information like news, blogs or tweets. While extracting structured data from unstructured data allows the development of powerful information system, it requires high-quality and scalable knowledge extraction frameworks to lead to useful results. The dire need for such approaches has led to the development of a multitude of annotation frameworks and tools. However, most of these approaches are not evaluated on the same datasets or using the same measures. The resulting Evaluation Gap needs to be tackled by a concise evaluation framework to foster fine-grained and uniform evaluations of annotation tools and frameworks over any knowledge bases. Moreover, with the constant growth of data and the ongoing decentralization of knowledge, intuitive ways for non-experts to access the generated data are required. Humans adapted their search behavior to current Web data by access paradigms such as keyword search so as to retrieve high-quality results. Hence, most Web users only expect Web documents in return. However, humans think and most commonly express their information needs in their natural language rather than using keyword phrases. Answering complex information needs often requires the combination of knowledge from various, differently structured data sources. Thus, we observe an Information Gap between natural-language questions and current keyword-based search paradigms, which in addition do not make use of the available structured and unstructured data sources. Question Answering (QA) systems provide an easy and efficient way to bridge this gap by allowing to query data via natural language, thus reducing (1) a possible loss of precision and (2) potential loss of time while reformulating the search intention to transform it into a machine-readable way. Furthermore, QA systems enable answering natural language queries with concise results instead of links to verbose Web documents. Additionally, they allow as well as encourage the access to and the combination of knowledge from heterogeneous knowledge bases (KBs) within one answer. Consequently, three main research gaps are considered and addressed in this work: First, addressing the Semantic Gap between the unstructured Document Web and the Semantic Gap requires the development of scalable and accurate approaches for the extraction of structured data in RDF. This research challenge is addressed by several approaches within this thesis. This thesis presents CETUS, an approach for recognizing entity types to populate RDF KBs. Furthermore, our knowledge base-agnostic disambiguation framework AGDISTIS can efficiently detect the correct URIs for a given set of named entities. Additionally, we introduce REX, a Web-scale framework for RDF extraction from semi-structured (i.e., templated) websites which makes use of the semantics of the reference knowledge based to check the extracted data. The ongoing research on closing the Semantic Gap has already yielded a large number of annotation tools and frameworks. However, these approaches are currently still hard to compare since the published evaluation results are calculated on diverse datasets and evaluated based on different measures. On the other hand, the issue of comparability of results is not to be regarded as being intrinsic to the annotation task. Indeed, it is now well established that scientists spend between 60% and 80% of their time preparing data for experiments. Data preparation being such a tedious problem in the annotation domain is mostly due to the different formats of the gold standards as well as the different data representations across reference datasets. We tackle the resulting Evaluation Gap in two ways: First, we introduce a collection of three novel datasets, dubbed N3, to leverage the possibility of optimizing NER and NED algorithms via Linked Data and to ensure a maximal interoperability to overcome the need for corpus-specific parsers. Second, we present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools and frameworks on multiple datasets. The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Moreover, the increasing the demand for natural-language interfaces as depicted by current mobile applications requires systems to deeply understand the underlying user information need. In conclusion, the natural language interface for asking questions requires a hybrid approach to data usage, i.e., simultaneously performing a search on full-texts and semantic knowledge bases. To close the Information Gap, this thesis presents HAWK, a novel entity search approach developed for hybrid QA based on combining structured RDF and unstructured full-text data sources.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography