Dissertations / Theses: 'Semantics - Data processing'

1

鄧偉明 and Wai-ming Tang. "Semantics of authentication in workflow security." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2001. http://hub.hku.hk/bib/B30110828.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Sax, Matthias J. "Performance Optimizations and Operator Semantics for Streaming Data Flow Programs." Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21424.

Full text

Abstract:

Unternehmen sammeln mehr Daten als je zuvor und müssen auf diese Informationen zeitnah reagieren. Relationale Datenbanken eignen sich nicht für die latenzfreie Verarbeitung dieser oft unstrukturierten Daten. Um diesen Anforderungen zu begegnen, haben sich in der Datenbankforschung seit dem Anfang der 2000er Jahre zwei neue Forschungsrichtungen etabliert: skalierbare Verarbeitung unstrukturierter Daten und latenzfreie Datenstromverarbeitung. Skalierbare Verarbeitung unstrukturierter Daten, auch bekannt unter dem Begriff "Big Data"-Verarbeitung, hat in der Industrie schnell Einzug erhalten. Gleichzeitig wurden in der Forschung Systeme zur latenzfreien Datenstromverarbeitung entwickelt, die auf eine verteilte Architektur, Skalierbarkeit und datenparallele Verarbeitung setzen. Obwohl diese Systeme in der Industrie vermehrt zum Einsatz kommen, gibt es immer noch große Herausforderungen im praktischen Einsatz. Diese Dissertation verfolgt zwei Hauptziele: Zuerst wird das Laufzeitverhalten von hochskalierbaren datenparallelen Datenstromverarbeitungssystemen untersucht. Im zweiten Hauptteil wird das "Dual Streaming Model" eingeführt, das eine Semantik zur gleichzeitigen Verarbeitung von Datenströmen und Tabellen beschreibt. Das Ziel unserer Untersuchung ist ein besseres Verständnis über das Laufzeitverhalten dieser Systeme zu erhalten und dieses Wissen zu nutzen um Anfragen automatisch ausreichende Rechenkapazität zuzuweisen. Dazu werden ein Kostenmodell und darauf aufbauende Optimierungsalgorithmen für Datenstromanfragen eingeführt, die Datengruppierung und Datenparallelität einbeziehen. Das vorgestellte Datenstromverarbeitungsmodell beschreibt das Ergebnis eines Operators als kontinuierlichen Strom von Veränderugen auf einer Ergebnistabelle. Dabei behandelt unser Modell die Diskrepanz der physikalischen und logischen Ordnung von Datenelementen inhärent und erreicht damit eine deterministische Semantik und eine minimale Verarbeitungslatenz.
Modern companies are able to collect more data and require insights from it faster than ever before. Relational databases do not meet the requirements for processing the often unstructured data sets with reasonable performance. The database research community started to address these trends in the early 2000s. Two new research directions have attracted major interest since: large-scale non-relational data processing as well as low-latency data stream processing. Large-scale non-relational data processing, commonly known as "Big Data" processing, was quickly adopted in the industry. In parallel, low latency data stream processing was mainly driven by the research community developing new systems that embrace a distributed architecture, scalability, and exploits data parallelism. While these systems have gained more and more attention in the industry, there are still major challenges to operate them at large scale. The goal of this dissertation is two-fold: First, to investigate runtime characteristics of large scale data-parallel distributed streaming systems. And second, to propose the "Dual Streaming Model" to express semantics of continuous queries over data streams and tables. Our goal is to improve the understanding of system and query runtime behavior with the aim to provision queries automatically. We introduce a cost model for streaming data flow programs taking into account the two techniques of record batching and data parallelization. Additionally, we introduce optimization algorithms that leverage our model for cost-based query provisioning. The proposed Dual Streaming Model expresses the result of a streaming operator as a stream of successive updates to a result table, inducing a duality between streams and tables. Our model handles the inconsistency of the logical and the physical order of records within a data stream natively, which allows for deterministic semantics as well as low latency query execution.

APA, Harvard, Vancouver, ISO, and other styles

3

Nedas, Konstantinos A. "Semantic Similarity of Spatial Scenes." Fogler Library, University of Maine, 2006. http://www.library.umaine.edu/theses/pdf/NedasKA2006.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Giese, Holger, Stephan Hildebrandt, and Leen Lambers. "Toward bridging the gap between formal semantics and implementation of triple graph grammars." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4521/.

Full text

Abstract:

The correctness of model transformations is a crucial element for the model-driven engineering of high quality software. A prerequisite to verify model transformations at the level of the model transformation specification is that an unambiguous formal semantics exists and that the employed implementation of the model transformation language adheres to this semantics. However, for existing relational model transformation approaches it is usually not really clear under which constraints particular implementations are really conform to the formal semantics. In this paper, we will bridge this gap for the formal semantics of triple graph grammars (TGG) and an existing efficient implementation. Whereas the formal semantics assumes backtracking and ignores non-determinism, practical implementations do not support backtracking, require rule sets that ensure determinism, and include further optimizations. Therefore, we capture how the considered TGG implementation realizes the transformation by means of operational rules, define required criteria and show conformance to the formal semantics if these criteria are fulfilled. We further outline how static analysis can be employed to guarantee these criteria.

APA, Harvard, Vancouver, ISO, and other styles

5

Wong, Ping-wai, and 黃炳蔚. "Semantic annotation of Chinese texts with message structures based on HowNet." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B38212389.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Herre, Heinrich, and Axel Hummel. "A paraconsistent semantics for generalized logic programs." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4149/.

Full text

Abstract:

We propose a paraconsistent declarative semantics of possibly inconsistent generalized logic programs which allows for arbitrary formulas in the body and in the head of a rule (i.e. does not depend on the presence of any specific connective, such as negation(-as-failure), nor on any specific syntax of rules). For consistent generalized logic programs this semantics coincides with the stable generated models introduced in [HW97], and for normal logic programs it yields the stable models in the sense of [GL88].

APA, Harvard, Vancouver, ISO, and other styles

7

Lamprecht, Anna-Lena, Tiziana Margaria, and Bernhard Steffen. "Bio-jETI : a framework for semantics-based service composition." Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4506/.

Full text

Abstract:

Background: The development of bioinformatics databases, algorithms, and tools throughout the last years has lead to a highly distributedworld of bioinformatics services. Without adequatemanagement and development support, in silico researchers are hardly able to exploit the potential of building complex, specialized analysis processes from these services. The Semantic Web aims at thoroughly equipping individual data and services with machine-processable meta-information, while workflow systems support the construction of service compositions. However, even in this combination, in silico researchers currently would have to deal manually with the service interfaces, the adequacy of the semantic annotations, type incompatibilities, and the consistency of service compositions. Results: In this paper, we demonstrate by means of two examples how Semantic Web technology together with an adequate domain modelling frees in silico researchers from dealing with interfaces, types, and inconsistencies. In Bio-jETI, bioinformatics services can be graphically combined to complex services without worrying about details of their interfaces or about type mismatches of the composition. These issues are taken care of at the semantic level by Bio-jETI’s model checking and synthesis features. Whenever possible, they automatically resolve type mismatches in the considered service setting. Otherwise, they graphically indicate impossible/incorrect service combinations. In the latter case, the workflow developermay either modify his service composition using semantically similar services, or ask for help in developing the missing mediator that correctly bridges the detected type gap. Newly developed mediators should then be adequately annotated semantically, and added to the service library for later reuse in similar situations. Conclusion: We show the power of semantic annotations in an adequately modelled and semantically enabled domain setting. Using model checking and synthesis methods, users may orchestrate complex processes from a wealth of heterogeneous services without worrying about interfaces and (type) consistency. The success of this method strongly depends on a careful semantic annotation of the provided services and on its consequent exploitation for analysis, validation, and synthesis. We are convinced that these annotations will become standard, as they will become preconditions for the success and widespread use of (preferred) services in the Semantic Web

APA, Harvard, Vancouver, ISO, and other styles

8

Zhan, Tianjie. "Semantic analysis for extracting fine-grained opinion aspects." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1213.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Otten, Frederick John. "Using semantic knowledge to improve compression on log files." Thesis, Rhodes University, 2008. http://eprints.ru.ac.za/1660/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Harrison, Dave. "Functional real-time programming : the language Ruth and its semantics." Thesis, University of Stirling, 1988. http://hdl.handle.net/1893/12116.

Full text

Abstract:

Real-time systems are amongst the most safety critical systems involving computer software and the incorrect functioning of this software can cause great damage, up to and including the loss of life. If seems sensible therefore to write real-time software in a way that gives us the best chance of correctly implementing specifications. Because of the high level of functional programming languages, their semantic simplicity and their amenability to formal reasoning and correctness preserving transformation it thus seems natural to use a functional language for this task. This thesis explores the problems of applying functional programming languages to real-time by defining the real-time functional programming language Ruth. The first part of the thesis concerns the identification of the particular problems associated with programming real-time systems. These can broadly be stated as a requirement that a real-time language must be able to express facts about time, a feature we have called time expressibility. The next stage is to provide time expressibility within a purely functional framework. This is accomplished by the use of timestamps on inputs and outputs and by providing a real-time clock as an input to Ruth programs. The final major part of the work is the construction of a formal definition of the semantics of Ruth to serve as a basis for formal reasoning and transformation. The framework within which the formal semantics of a real-time language are defined requires time expressibility in the same way as the real-time language itself. This is accomplished within the framework of domain theory by the use of specialised domains for timestamped objects, called herring-bone domains. These domains could be used as the basis for the definition of the semantics of any real-time language.

APA, Harvard, Vancouver, ISO, and other styles

11

Pham, Son Bao Computer Science &amp Engineering Faculty of Engineering UNSW. "Incremental knowledge acquisition for natural language processing." Awarded by:University of New South Wales. School of Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/26299.

Full text

Abstract:

Linguistic patterns have been used widely in shallow methods to develop numerous NLP applications. Approaches for acquiring linguistic patterns can be broadly categorised into three groups: supervised learning, unsupervised learning and manual methods. In supervised learning approaches, a large annotated training corpus is required for the learning algorithms to achieve decent results. However, annotated corpora are expensive to obtain and usually available only for established tasks. Unsupervised learning approaches usually start with a few seed examples and gather some statistics based on a large unannotated corpus to detect new examples that are similar to the seed ones. Most of these approaches either populate lexicons for predefined patterns or learn new patterns for extracting general factual information; hence they are applicable to only a limited number of tasks. Manually creating linguistic patterns has the advantage of utilising an expert's knowledge to overcome the scarcity of annotated data. In tasks with no annotated data available, the manual way seems to be the only choice. One typical problem that occurs with manual approaches is that the combination of multiple patterns, possibly being used at different stages of processing, often causes unintended side effects. Existing approaches, however, do not focus on the practical problem of acquiring those patterns but rather on how to use linguistic patterns for processing text. A systematic way to support the process of manually acquiring linguistic patterns in an efficient manner is long overdue. This thesis presents KAFTIE, an incremental knowledge acquisition framework that strongly supports experts in creating linguistic patterns manually for various NLP tasks. KAFTIE addresses difficulties in manually constructing knowledge bases of linguistic patterns, or rules in general, often faced in existing approaches by: (1) offering a systematic way to create new patterns while ensuring they are consistent; (2) alleviating the difficulty in choosing the right level of generality when creating a new pattern; (3) suggesting how existing patterns can be modified to improve the knowledge base's performance; (4) making the effort in creating a new pattern, or modifying an existing pattern, independent of the knowledge base's size. KAFTIE, therefore, makes it possible for experts to efficiently build large knowledge bases for complex tasks. This thesis also presents the KAFDIS framework for discourse processing using new representation formalisms: the level-of-detail tree and the discourse structure graph.

APA, Harvard, Vancouver, ISO, and other styles

12

Barber, Nicole. "Aktionsart coercion." University of Western Australia. School of Humanities, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0248.

Full text

Abstract:

This study aimed to investigate English Aktionsart coercion, particularly novel coercion, through corpora-based research. Novel coercions are those which need some contextual support in order to make sense of or be grammatical. Due to the nature of the data, a necessary part of the study was the design of a program to help in the process of tagging corpora for Aktionsart. This thesis starts with a discussion of five commonly accepted Aktionsarten: state, activity, achievement, accomplishment, and semelfactive. One significant contribution of the thesis is that it offers a comprehensive review and discussion of various theories that have been proposed to account for Aktionsart or aspectual coercion, as there is no such synthesis available in the literature. Thus the thesis moves on to a review of many of the more prominent works in the area of Aktionsart coercion, including Moens and Steedman (1988), Pustejovsky (1995), and De Swart (1998). I also present a few theories drawn from less prominent studies by authors in the area who have different or interesting views on the topic, such as Bickel (1997), Krifka (1998), and Xiao and McEnery (2004). In order to study the Aktionsart coercion of verbs in large corpora, examples of Aktionsart coercion needed to be collected. I aimed to design a computer program that could ideally perform a large portion of this task automatically. I present the methods I used in designing the program, as well as the process involved in using it to collect data. Some major steps in my research were the tagging of corpora, counting of coercion 3 frequency by type, and the selection of representative examples of different types of coercion for analysis and discussion. All of the examples collected from the corpora, both by my Aktionsart-tagging program and manually, were conventional coercions. As such there was no opportunity for an analysis of novel coercions. I nevertheless discuss the examples of conventional coercion that I gathered from the corpora analysis, with particular reference to Moens and Steedmans (1988) theory. Three dominant types of coercion were identified in the data: from activities into accomplishments, activities into states, and accomplishments into states. There were two main ways coercions taking place in the data: from activity to accomplishment through the addition of an endpoint, and from various Aktionsarten into state by coercing the event into being a property of someone/something. Many of the Aktionsart coercion theories are supported at least in part by the data found in natural language. One of the most prominent coercions that is underrepresented in the data is from achievement to accomplishment through the addition of a preparatory process. I conclude that while there are reasons for analysing Aktionsart at verb phrase or sentence level, this does not mean the possibility of analyses at the lexical level should be ignored.

APA, Harvard, Vancouver, ISO, and other styles

13

Yang, Yimin. "Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2254.

Full text

Abstract:

Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.

APA, Harvard, Vancouver, ISO, and other styles

14

Gunaratna, Kalpa. "Semantics-based Summarization of Entities in Knowledge Graphs." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1496124815009777.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Varde, Aparna S. "Graphical data mining for computational estimation in materials science applications." Link to electronic thesis, 2006. http://www.wpi.edu/Pubs/ETD/Available/etd-081506-152633/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Sousa, Sidney Roberto de. "Gerenciamento de anotações semanticas de dados na Web para aplicações agricolas." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275829.

Full text

Abstract:

Orientador: Claudia Maria Bauzer Medeiros
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-15T17:25:33Z (GMT). No. of bitstreams: 1 Sousa_SidneyRobertode_M.pdf: 6639415 bytes, checksum: fbd426bff26dda1788b1310f9167190c (MD5) Previous issue date: 2010
Resumo: Sistemas de informação geográfica a cada vez mais utilizam informação geo-espacial da Web para produzir informação geográfica. Um grande desafio para tais sistemas é encontrar dados relevantes, onde tal busca é frequentemente baseada em palavras-chave ou nome de arquivos. Porém, tais abordagens carecem de semântica. Desta forma, torna-se necessário oferecer mecanismos para preparação de dados, afim de auxiliar a recuperação de dados semanticamente relevantes. Para atacar este problema, esta dissertação de mestrado propôem uma arquitetura baseada em serviços para gerenciar anotações semânticas. Neste trabalho, uma anotação semântica é um conjunto de triplas - chamadas unidades de anotação semântica - , onde subject é um documento geo-espacial, (metadata field) é um campo de metadados sobre este documento e object é um termo de ontologia que associa semanticamente o campo de metadados a algum conceito apropriado. As principais contribuições desta dissertação são: um estudo comparativo sobre ferramentas de anotação; especificação e implementação de uma arquitetura baseada em serviços para gerenciar anotações semânticas, incluindo serviços para manuseio de termos de ontologias; e uma análise comparativa de mecanismos para armazenar anotações semânticas. O trabalho toma como estudo de caso anotações semânticas sobre documentos agrícolas
Abstract: Geographic information systems (GIS) are increasingly using geospatial data from the Web to produce geographic information. One big challenge is to find the relevant data, which often is based on keywords or even file names. However, these approaches lack semantics. Thus, it is necessary to provide mechanisms to prepare data to help retrieval of semantically relevant data. To attack this problem, this dissertation proposes a service-based architecture to manage semantic annotations. In this work, a semantic annotation is a set of triples - called semantic annotation units - < subject? metadata field? object >, where subject is a geospatial resource, (metadata field) contains some characteristic about this resource, and object is an ontology term that semantically associates the metadata field to some appropriate concept. The main contributions of this dissertation are: a comparative study on annotation tools; specification and implementation of a service-based architecture to manage semantic annotations, including services for handling ontology terms; and a comparative analysis of mechanisms for storing semantic annotations. The work takes as case study semantic annotations about agricultural resources
Mestrado
Banco de Dados
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

17

Vitaliano, Filho Arnaldo Francisco 1982. "Mecanismos de anotação semântica para workfows cientificos." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275760.

Full text

Abstract:

Orientador: Claudia Maria Bauzer Medeiros
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-18T01:37:29Z (GMT). No. of bitstreams: 1 VitalianoFilho_ArnaldoFrancisco_M.pdf: 2435279 bytes, checksum: d273e44a51be70002d918835c2e5c11b (MD5) Previous issue date: 2009
Resumo: O compartilhamento de informações, processos e modelos de experimentos entre cientistas de diferentes organizações e domínios do conhecimento vem aumentando com a disponibilização dessas informações e modelos na Web. Muitos destes modelos de experimentos são descritos como workflows científicos. Entretanto, não existe uma padronização para a sua descrição, dificultando assim o reaproveitamento de workflows e seus componentes já existentes. A dissertação contribui para a solução deste problema com os seguintes resultados: a análise dos problemas relativos ao compartilhamento e projeto cooperativo de workflows científicos na Web, análise de aspectos de semântica e metadados relacionados a estes workflows, a disponibilização de um editor Web de workflows usando padrões WFMC e, o desenvolvimento de um modelo de anotação semântica para workflows científicos. Com isto, a dissertação cria a base para permitir a descoberta, reuso e compartilhamento de workflows científicos nas Web. O editor permite que pesquisadores construam seus workflows e anotações de forma online, e permite o consequente teste, com dados externos, do sistema de anotações
Abstract: The sharing of information, processes and models of experiments is increasing among scientists from many organizations and areas of knowledge, and thus there is a need for supply mechanisms of workflow discovery. Many of these models are described as scientific workflows. However, there is no default specification to describe them, which complicates the reuse of workflows and components that are available. This thesis contributes to solving this problem by presenting the following results: analysis of issues related to the sharing and cooperative design of scientific workflows on the Web; analysis of semantic aspects and metadata related to workflows, the development of a Web-based workflow editor, which incorporates our semantic annotation model for scientific workflows. Given these factors, this work creates the basis to allow the discovery, reuse and sharing of scientific workflows in the Web
Mestrado
Banco de Dados
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

18

Herre, Heinrich, and Axel Hummel. "Stationary generated models of generalized logic programs." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4150/.

Full text

Abstract:

The interest in extensions of the logic programming paradigm beyond the class of normal logic programs is motivated by the need of an adequate representation and processing of knowledge. One of the most difficult problems in this area is to find an adequate declarative semantics for logic programs. In the present paper a general preference criterion is proposed that selects the ‘intended’ partial models of generalized logic programs which is a conservative extension of the stationary semantics for normal logic programs of [Prz91]. The presented preference criterion defines a partial model of a generalized logic program as intended if it is generated by a stationary chain. It turns out that the stationary generated models coincide with the stationary models on the class of normal logic programs. The general wellfounded semantics of such a program is defined as the set-theoretical intersection of its stationary generated models. For normal logic programs the general wellfounded semantics equals the wellfounded semantics.

APA, Harvard, Vancouver, ISO, and other styles

19

Faruque, Md Ehsanul. "A Minimally Supervised Word Sense Disambiguation Algorithm Using Syntactic Dependencies and Semantic Generalizations." Thesis, University of North Texas, 2005. https://digital.library.unt.edu/ark:/67531/metadc4969/.

Full text

Abstract:

Natural language is inherently ambiguous. For example, the word "bank" can mean a financial institution or a river shore. Finding the correct meaning of a word in a particular context is a task known as word sense disambiguation (WSD), which is essential for many natural language processing applications such as machine translation, information retrieval, and others. While most current WSD methods try to disambiguate a small number of words for which enough annotated examples are available, the method proposed in this thesis attempts to address all words in unrestricted text. The method is based on constraints imposed by syntactic dependencies and concept generalizations drawn from an external dictionary. The method was tested on standard benchmarks as used during the SENSEVAL-2 and SENSEVAL-3 WSD international evaluation exercises, and was found to be competitive.

APA, Harvard, Vancouver, ISO, and other styles

20

Sinha, Ravi Som. "Graph-based Centrality Algorithms for Unsupervised Word Sense Disambiguation." Thesis, University of North Texas, 2008. https://digital.library.unt.edu/ark:/67531/metadc9736/.

Full text

Abstract:

This thesis introduces an innovative methodology of combining some traditional dictionary based approaches to word sense disambiguation (semantic similarity measures and overlap of word glosses, both based on WordNet) with some graph-based centrality methods, namely the degree of the vertices, Pagerank, closeness, and betweenness. The approach is completely unsupervised, and is based on creating graphs for the words to be disambiguated. We experiment with several possible combinations of the semantic similarity measures as the first stage in our experiments. The next stage attempts to score individual vertices in the graphs previously created based on several graph connectivity measures. During the final stage, several voting schemes are applied on the results obtained from the different centrality algorithms. The most important contributions of this work are not only that it is a novel approach and it works well, but also that it has great potential in overcoming the new-knowledge-acquisition bottleneck which has apparently brought research in supervised WSD as an explicit application to a plateau. The type of research reported in this thesis, which does not require manually annotated data, holds promise of a lot of new and interesting things, and our work is one of the first steps, despite being a small one, in this direction. The complete system is built and tested on standard benchmarks, and is comparable with work done on graph-based word sense disambiguation as well as lexical chains. The evaluation indicates that the right combination of the above mentioned metrics can be used to develop an unsupervised disambiguation engine as powerful as the state-of-the-art in WSD.

APA, Harvard, Vancouver, ISO, and other styles

21

Smith, Marc L. "View-centric reasoning about parallel and distributed computation." Doctoral diss., University of Central Florida, 2000. http://digital.library.ucf.edu/cdm/ref/collection/RTD/id/1597.

Full text

Abstract:

University of Central Florida College of Engineering Thesis
The development of distributed applications has not progressed as rapidly as its enabling technologies. In part, this is due to the difficulty of reasoning about such complex systems. In contrast to sequential systems, parallel systems give rise to parallel events, and the resulting uncertainty of the observed order of these events. Loosely coupled distributed systems complicate this even further by introducing the element of multiple imperfect observers of these parallel events. The goal of this dissertation is to advance parallel and distributed systems development by producing a parameterized model that can be instantiated to reflect the computation and coordination properties of such systems. The result is a model called paraDOS that we show to be general enough to have instantiations of two very distinct distributed computation models, Actors and tuple space. We show how paraDOS allows us to use operational semantics to reason about computation when such reasoning must account for multiple, inconsistent and imperfect views. We then extend the paraDOS model with an abstraction to support composition of communicating computational systems. This extension gives us a tool to reason formally about heterogeneous systems, and about new distributed computing paradigms such as the multiple tuple spaces support seen in Sun's JavaSpaces and IBM's T Spaces.
Ph.D.
Doctorate;
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering and Computer Science
196 p.
xiv, 196 leaves, bound : ill. ; 28 cm.

APA, Harvard, Vancouver, ISO, and other styles

22

Hartig, Olaf. "Querying a Web of Linked Data." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2014. http://dx.doi.org/10.18452/17015.

Full text

Abstract:

In den letzten Jahren haben sich spezielle Prinzipien zur Veröffentlichung strukturierter Daten im World Wide Web (WWW) etabliert. Diese Prinzipien erlauben es, von den jeweils angebotenen Daten auf weitere, nach den selben Prinzipien veröffentlichten Daten zu verweisen. Die daraus resultierende Form von Web-Daten wird entsprechend als Linked Data bezeichnet. Mit der Veröffentlichung von Linked Data im WWW entsteht ein sehr großer Datenraum, welcher Daten verschiedenster Anbieter miteinander verbindet und neuartige Möglichkeiten für Web-basierte Anwendungen bietet. Als Basis für die Entwicklung solcher Anwendungen haben mehrere Forschungsgruppen begonnen, Ansätze zu untersuchen, welche diesen Datenraum als eine Art verteilte Datenbank auffassen und die Ausführung deklarativer Anfragen über dieser Datenbank ermöglichen. Forschungsarbeit zu theoretischen Grundlagen der untersuchten Ansätze fehlt jedoch nahezu vollständig. Die vorliegende Dissertation schließt diese Lücke.
During recent years a set of best practices for publishing and connecting structured data on the World Wide Web (WWW) has emerged. These best practices are referred to as the Linked Data principles and the resulting form of Web data is called Linked Data. The increasing adoption of these principles has lead to the creation of a globally distributed space of Linked Data that covers various domains such as government, libraries, life sciences, and media. Approaches that conceive this data space as a huge distributed database and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. However, the main focus of existing work is to address practical challenges that arise in this context. Research on the foundations of such approaches is largely missing. This dissertation closes this gap.

APA, Harvard, Vancouver, ISO, and other styles

23

Doyen, Laurent. "Algorithmic analysis of complex semantics for timed and hybrid automata." Doctoral thesis, Universite Libre de Bruxelles, 2006. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/210853.

Full text

Abstract:

In the field of formal verification of real-time systems, major developments have been recorded in the last fifteen years. It is about logics, automata, process algebra, programming languages, etc. From the beginning, a formalism has played an important role: timed automata and their natural extension,hybrid automata. Those models allow the definition of real-time constraints using real-valued clocks, or more generally analog variables whose evolution is governed by differential equations. They generalize finite automata in that their semantics defines timed words where each symbol is associated with an occurrence timestamp.

The decidability and algorithmic analysis of timed and hybrid automata have been intensively studied in the literature. The central result for timed automata is that they are positively decidable. This is not the case for hybrid automata, but semi-algorithmic methods are known when the dynamics is relatively simple, namely a linear relation between the derivatives of the variables.

With the increasing complexity of nowadays systems, those models are however limited in their classical semantics, for modelling realistic implementations or dynamical systems.

In this thesis, we study the algorithmics of complex semantics for timed and hybrid automata.

On the one hand, we propose implementable semantics for timed automata and we study their computational properties: by contrast with other works, we identify a semantics that is implementable and that has decidable properties.

On the other hand, we give new algorithmic approaches to the analysis of hybrid automata whose dynamics is given by an affine function of its variables.

Doctorat en sciences, Spécialisation Informatique
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

24

Yang, Li. "Improving Topic Tracking with Domain Chaining." Thesis, University of North Texas, 2003. https://digital.library.unt.edu/ark:/67531/metadc4274/.

Full text

Abstract:

Topic Detection and Tracking (TDT) research has produced some successful statistical tracking systems. While lexical chaining, a non-statistical approach, has also been applied to the task of tracking by Carthy and Stokes for the 2001 TDT evaluation, an efficient tracking system based on this technology has yet to be developed. In thesis we investigate two new techniques which can improve Carthy's original design. First, at the core of our system is a semantic domain chainer. This chainer relies not only on the WordNet database for semantic relationships but also on Magnini's semantic domain database, which is an extension of WordNet. The domain-chaining algorithm is a linear algorithm. Second, to handle proper nouns, we gather all of the ones that occur in a news story together in a chain reserved for proper nouns. In this thesis we also discuss the linguistic limitations of lexical chainers to represent textual meaning.

APA, Harvard, Vancouver, ISO, and other styles

25

Cregan, Anne Computer Science &amp Engineering Faculty of Engineering UNSW. "Weaving the semantic web: Contributions and insights." Publisher:University of New South Wales. Computer Science & Engineering, 2008. http://handle.unsw.edu.au/1959.4/42605.

Full text

Abstract:

The semantic web aims to make the meaning of data on the web explicit and machine processable. Harking back to Leibniz in its vision, it imagines a world of interlinked information that computers `understand' and `know' how to process based on its meaning. Spearheaded by the World Wide Web Consortium, ontology languages OWL and RDF form the core of the current technical offerings. RDF has successfully enabled the construction of virtually unlimited webs of data, whilst OWL gives the ability to express complex relationships between RDF data triples. However, the formal semantics of these languages limit themselves to that aspect of meaning that can be captured by mechanical inference rules, leaving many open questions as to other aspects of meaning and how they might be made machine processable. The Semantic Web has faced a number of problems that are addressed by the included publications. Its germination within academia, and logical semantics has seen it struggle to become familiar, accessible and implementable for the general IT population, so an overview of semantic technologies is provided. Faced with competing `semantic' languages, such as the ISO's Topic Map standards, a method for building ISO-compliant Topic Maps in the OWL DL language has been provided, enabling them to take advantage of the more mature OWL language and tools. Supplementation with rules is needed to deal with many real-world scenarios and this is explored as a practical exercise. The available syntaxes for OWL have hindered domain experts in ontology building, so a natural language syntax for OWL designed for use by non-logicians is offered and compared with similar offerings. In recent years, proliferation of ontologies has resulted in far more than are needed in any given domain space, so a mechanism is proposed to facilitate the reuse of existing ontologies by giving contextual information and leveraging social factors to encourage wider adoption of common ontologies and achieve interoperability. Lastly, the question of meaning is addressed in relation to the need to define one's terms and to ground one's symbols by anchoring them effectively, ultimately providing the foundation for evolving a `Pragmatic Web' of action.

APA, Harvard, Vancouver, ISO, and other styles

26

Macario, Carla Geovana do Nascimento. "Anotação semantica de dados geoespaciais." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275838.

Full text

Abstract:

Orientador: Claudia Maria Bauzer Medeiros
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-15T04:11:30Z (GMT). No. of bitstreams: 1 Macario_CarlaGeovanadoNascimento_D.pdf: 3780981 bytes, checksum: 4b8ad7138779392bff940f1f95ad1f51 (MD5) Previous issue date: 2009
Resumo: Dados geoespaciais constituem a base para sistemas de decisão utilizados em vários domínios, como planejamento de transito, fornecimento de serviços ou controle de desastres. Entretanto, para serem usados, estes dados precisam ser analisados e interpretados, atividades muitas vezes trabalhosas e geralmente executadas por especialistas. Apesar disso estas interpretacoes nao sao armazenadas e quando o são, geralmente correspondem a alguma informacao textual e em linguagem própria, gravadas em arquivos tecnicos. A ausencia de solucoes eficientes para armazenar estas interpretaçães leva a problemas como retrabalho e dificuldades de compartilhamento de informação. Neste trabalho apresentamos uma soluçao para estes problemas que baseia-se no uso de anotações semânticas, uma abordagem que promove um entendimento comum dos conceitos usados. Para tanto, propomos a adocão de workflows científicos para descricao do processo de anotacão dos dados e tambíem de um esquema de metadados e ontologias bem conhecidas, aplicando a soluçao a problemas em agricultura. As contribuicães da tese envolvem: (i) identificacao de um conjunto de requisitos para busca semantica a dados geoespaciais; (ii) identificacao de características desejóveis para ferramentas de anotacão; (iii) proposta e implementacao parcial de um framework para a anotacão semântica de diferentes tipos de dados geoespaciais; e (iv) identificacao dos desafios envolvidos no uso de workflows para descrever o processo de anotaçcaão. Este framework foi parcialmente validado, com implementação para aplicações em agricultura
Abstract: Geospatial data are a basis for decision making in a wide range of domains, such as traffic planning, consumer services disasters controlling. However, to be used, these kind of data have to be analyzed and interpreted, which constitutes a hard task, prone to errors, and usually performed by experts. Although all of these factors, the interpretations are not stored. When this happens, they correspond to descriptive text, which is stored in technical files. The absence of solutions to efficiently store them leads to problems such as rework and difficulties in information sharing. In this work we present a solution for these problems based on semantic annotations, an approach for a common understanding of concepts being used. We propose the use of scientific workflows to describe the annotation process for each kind of data, and also the adoption of well known metadata schema and ontologies. The contributions of this thesis involves: (i) identification of requirements for semantic search of geospatial data; (ii) identification of desirable features for annotation tools; (iii) proposal, and partial implementation, of a a framework for semantic annotation of different kinds of geospatial data; and (iv) identification of challenges in adopting scientific workflows for describing the annotation process. This framework was partially validated, through an implementation to produce annotations for applications in agriculture
Doutorado
Banco de Dados
Doutora em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

27

Santanchè, André 1968. "Fluid Web e componentes de conteudo digital : da visão centrada em documentos para a visão centrada em conteudo." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276279.

Full text

Abstract:

Orientador: Claudia Bauer Medeiros
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-07T03:38:23Z (GMT). No. of bitstreams: 1 Santanche_Andre_D.pdf: 5630081 bytes, checksum: a9ac93609b33f3525c7597c3bbc398b9 (MD5) Previous issue date: 2006
Resumo: A Web está evoluindo de um espaço para publicação/consumo de documentos para um ambiente para trabalho colaborativo, onde o conteúdo digital pode viajar e ser replicado, adaptado, decomposto, fundido e transformado. Designamos esta perspectiva por Fluid Web. Esta visão requer uma reformulação geral da abordagem típica orientada a documentos que permeia o gerenciamento de conteúdo na Web. Esta tese apresenta nossa solução para a Fluid Web, que permite nos deslocarmos de uma perspectiva orientada a documentos para outra orientada a conteúdo, onde "conteúdo" pode ser qualquer objeto digital. A solução é baseada em dois eixos: (i) uma unidade auto-descritiva que encapsula qualquer tipo de artefato de conteúdo - o Componente de Conteúdo Digital (Digital Content Component - DCC); e (ii) uma infraestrutura para a Fluid Web que permite o gerenciamento e distribuição de DCCs na Web, cujo objetivo é dar suporte à colaboração na Web. Concebidos para serem reusados e adaptados, os DCCs encapsulam dados e software usando uma única estrutura, permitindo deste modo composição homogênea e processamento de qualquer conteúdo digital, seja este executável ou não. Estas propriedades são exploradas pela nossa infraestrutura para a Fluid Web, que engloba mecanismos de descoberta e de anotação de DCCs em múltiplos níveis, gerenciamento de configurações e controle de versões. Nosso trabalho explora padrões de Web Semântica e ontologias taxonômicas, que servem como uma ponte semântica, unificando vocabulários para gerenciamento de DCCs e facilitando as tarefas de descrição/indexação/descoberta de conteúdo. Os DCCs e sua infraestrura foram implementados e são ilustrados por meio de exemplos práticos, para aplicações científicas. As principais contribuições desta tese são: o modelo de Digital Content Component; o projeto da infraestrutura para a Fluid Web baseada em DCCs, com suporte para armazenamento baseado em repositórios, compartilhamento, controle de versões e gerenciamento de configurações distribuídas; um algoritmo para a descoberta de conteúdo digital que explora a semântica associada aos DCCs; e a validação prática dos principais conceitos desta pesquisa, com a implementação de protótipos
Abstract: The Web is evolving from a space for publicationj consumption of documents to an environment for collaborative work, where digital content can traveI and be replicated, adapted, decomposed, fusioned and transformed. We call this the Fluid Web perspective. This view requires a thorough revision of the typical document-oriented approach that permeates content management on the Web. This thesis presents our solution for the Fluid Web, which allows moving from the document-oriented to a content-oriented perspective, where "content" can be any digital object. The solution is based on two axes: a self-descriptive unit to encapsulate any kind of content artifact - the Digital Content Component (DCC); and a Fluid Web infrastructure that provides management and deployment of DCCs through the Web, and whose goal is to support collaboration on the Web. Designed to be reused and adapted, DCCs encapsulate data and software using a single structure, thus allowing homogeneous composition and processing of any digital content, be it executable or noto These properties are exploited by our Fluid Web infrastructure, which supports DCC multilevel annotation and discovery mechanisms, configuration management and version controI. Our work extensively explores Semantic Web standards and taxonomic ontologies, which serve as a semantic bridge, unifying DCC management vocabularies and improving DCC descriptionjindexingjdiscovery. DCCs and infrastructure have been implemented and are illustrated by means of examples, for scientific applications. The main contributions of this thesis are: the model of Digital Content Component; the design of the Fluid Web infrastructure based on DCCs, with support for repositorybased storage, distributed sharing, version control and configuration management; an algorithm for digital content discovery that explores DCe semantics; and a practical validation of the main concepts in this research through implementation of prototypes
Doutorado
Banco de Dados
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

28

Thakur, Amritanshu. "Semantic construction with provenance for model configurations in scientific workflows." Master's thesis, Mississippi State : Mississippi State University, 2008. http://library.msstate.edu/etd/show.asp?etd=etd-07312008-092758.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Schwartz, Hansen A. "The acquisition of lexical knowledge from the web for aspects of semantic interpretation." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5028.

Full text

Abstract:

Applications to word sense disambiguation, an aspect of semantic interpretation, are used to evaluate the contributions. Disambiguation systems which utilize semantically annotated training data are considered supervised. The algorithms of this dissertation are considered minimally-supervised; they do not require training data created by humans, though they may use human-created data sources. In the case of evaluating a database of common sense knowledge, integrating the knowledge into an existing minimally-supervised disambiguation system significantly improved results -- a 20.5\% error reduction. Similarly, the Web selectors disambiguation system, which acquires knowledge directly as part of the algorithm, achieved results comparable with top minimally-supervised systems, an F-score of 80.2\% on a standard noun disambiguation task. This work enables the study of many subsequent related tasks for improving semantic interpretation and its application to real-world technologies. Other aspects of semantic interpretation, such as semantic role labeling could utilize the same methods presented here for word sense disambiguation. As the Web continues to grow, the capabilities of the systems in this dissertation are expected to increase. Although the Web selectors system achieves great results, a study in this dissertation shows likely improvements from acquiring more data. Furthermore, the methods for acquiring a database of common sense knowledge could be applied in a more exhaustive fashion for other types of common sense knowledge. Finally, perhaps the greatest benefits from this work will come from the enabling of real world technologies that utilize semantic interpretation.; This work investigates the effective acquisition of lexical knowledge from the Web to perform semantic interpretation. The Web provides an unprecedented amount of natural language from which to gain knowledge useful for semantic interpretation. The knowledge acquired is described as common sense knowledge, information one uses in his or her daily life to understand language and perception. Novel approaches are presented for both the acquisition of this knowledge and use of the knowledge in semantic interpretation algorithms. The goal is to increase accuracy over other automatic semantic interpretation systems, and in turn enable stronger real world applications such as machine translation, advanced Web search, sentiment analysis, and question answering. The major contributions of this dissertation consist of two methods of acquiring lexical knowledge from the Web, namely a database of common sense knowledge and Web selectors. The first method is a framework for acquiring a database of concept relationships. To acquire this knowledge, relationships between nouns are found on the Web and analyzed over WordNet using information-theory, producing information about concepts rather than ambiguous words. For the second contribution, words called Web selectors are retrieved which take the place of an instance of a target word in its local context. The selectors serve for the system to learn the types of concepts that the sense of a target word should be similar. Web selectors are acquired dynamically as part of a semantic interpretation algorithm, while the relationships in the database are useful to stand-alone programs. A final contribution of this dissertation concerns a novel semantic similarity measure and an evaluation of similarity and relatedness measures on tasks of concept similarity. Such tasks are useful when applying acquired knowledge to semantic interpretation.
ID: 029808979; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 141-160).
Ph.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

30

Marques, Caio Miguel [UNESP]. "Pangea - Arquitetura semântica para a integração de dados e modelos geoespaciais na Web." Universidade Estadual Paulista (UNESP), 2010. http://hdl.handle.net/11449/98654.

Full text

Abstract:

Made available in DSpace on 2014-06-11T19:29:40Z (GMT). No. of bitstreams: 0 Previous issue date: 2010-08-05Bitstream added on 2014-06-13T18:59:18Z : No. of bitstreams: 1 marques_cm_me_sjrp.pdf: 1538758 bytes, checksum: c5b451433af39d95469d3e12a5eb6665 (MD5)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Em muitas áreas do conhecimento e da atividade humana é requerida, impreterivelmente, a integração de informações geográficas. Atualmente, grande quantidade dessas informações geográficas estão publicadas na Web, por atores diversos, indo desde instituições governamentais, academia, até cidadãos comuns. Esses atores publicam dados geográficos em diversos formatos e utilizando tecnologias variadas. Neste contexto, apesar da enorme quantidade de dados e modelos geográficos publicados na Web, a diversidade de formatos e tecnologias nos quais são disponibilizados, somada à carência das soluções atualmente existentes, limitam o consumo, a integração e o compartilhamento das informações geográficas. Recentemente tem sido propostas abordagens que agregam semântica na descrição das informações geográficas, de modo a possibilitar melhorias no descobrimento e integração desse tipo de informação. Nesse sentido, neste trabalho é apresentado um levantamento das arquiteturas e infraestruturas semânticas utilizadas na integração e compartilhamento de dados e modelos geográficos. Com base nesse levantamento foram identificados os aspectos transversais às infraestruturas estudadas. Tais aspectos foram utilizados na definição do projeto da arquitetura descrita neste trabalho, denominada Pangea, que é composta dos seguintes módulos: anotação semântica, alinhamento de descrição semântica, repositórios semânticos, descobrimento e integração semântica de dados e modelos geográficos. Dentre os módulo mencionados foi implementado o repositório semântico e algumas funcionalidades referentes ao descobrimento e integração semântica de dados. Para avaliar os componentes implementados da Pangea é apresentado um estudo de caso referente ao contexto de derramamento de óleo no litoral
The geographic information is definitely required in many areas of human knowledge and activity. Nowadays, a large part of this geographic information is published on the Web by various authors, from the governmental institutions and academy to the ordinary citizen. These authors publish the geographic data in several formats and using different technologies. In this context, in spite of having a great amount of available data on the Web, the diversity of formats and technologies that they are released, limit the consumption, the integration and the geographic information sharing. Recently, it has been proposed the approach that adds the semantics in the description of geographic information, so the discovery and integration can be enhanced. This work presents a study of semantics architectures and frameworks used in the geographic data integration and sharing. Based in this study, the transversal aspects to the studied architectures were identified. Those aspects were used in the project definition of the Pangea architecture which is composed by the following modules: semantic notation, alignment of semantic description, and semantic integration. In order to evaluate some of the Pangea components, a study of case is conducted in the problems of the environmental domain, considering oil blowout disasters

APA, Harvard, Vancouver, ISO, and other styles

31

Marques, Caio Miguel. "Pangea - Arquitetura semântica para a integração de dados e modelos geoespaciais na Web /." São José do Rio Preto : [s.n.], 2010. http://hdl.handle.net/11449/98654.

Full text

Abstract:

Orientador: Ivan Rizzo Guilherme
Banca: Marilde Terezinha Prado Santos
Banca: Carlos Roberto Valêncio
Resumo: Em muitas áreas do conhecimento e da atividade humana é requerida, impreterivelmente, a integração de informações geográficas. Atualmente, grande quantidade dessas informações geográficas estão publicadas na Web, por atores diversos, indo desde instituições governamentais, academia, até cidadãos comuns. Esses atores publicam dados geográficos em diversos formatos e utilizando tecnologias variadas. Neste contexto, apesar da enorme quantidade de dados e modelos geográficos publicados na Web, a diversidade de formatos e tecnologias nos quais são disponibilizados, somada à carência das soluções atualmente existentes, limitam o consumo, a integração e o compartilhamento das informações geográficas. Recentemente tem sido propostas abordagens que agregam semântica na descrição das informações geográficas, de modo a possibilitar melhorias no descobrimento e integração desse tipo de informação. Nesse sentido, neste trabalho é apresentado um levantamento das arquiteturas e infraestruturas semânticas utilizadas na integração e compartilhamento de dados e modelos geográficos. Com base nesse levantamento foram identificados os aspectos transversais às infraestruturas estudadas. Tais aspectos foram utilizados na definição do projeto da arquitetura descrita neste trabalho, denominada Pangea, que é composta dos seguintes módulos: anotação semântica, alinhamento de descrição semântica, repositórios semânticos, descobrimento e integração semântica de dados e modelos geográficos. Dentre os módulo mencionados foi implementado o repositório semântico e algumas funcionalidades referentes ao descobrimento e integração semântica de dados. Para avaliar os componentes implementados da Pangea é apresentado um estudo de caso referente ao contexto de derramamento de óleo no litoral
Abstract: The geographic information is definitely required in many areas of human knowledge and activity. Nowadays, a large part of this geographic information is published on the Web by various authors, from the governmental institutions and academy to the ordinary citizen. These authors publish the geographic data in several formats and using different technologies. In this context, in spite of having a great amount of available data on the Web, the diversity of formats and technologies that they are released, limit the consumption, the integration and the geographic information sharing. Recently, it has been proposed the approach that adds the semantics in the description of geographic information, so the discovery and integration can be enhanced. This work presents a study of semantics architectures and frameworks used in the geographic data integration and sharing. Based in this study, the transversal aspects to the studied architectures were identified. Those aspects were used in the project definition of the Pangea architecture which is composed by the following modules: semantic notation, alignment of semantic description, and semantic integration. In order to evaluate some of the Pangea components, a study of case is conducted in the problems of the environmental domain, considering oil blowout disasters
Mestre

APA, Harvard, Vancouver, ISO, and other styles

32

Abu, Salih Bilal Ahmad Abdal Rahman. "Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing." Thesis, Curtin University, 2018. http://hdl.handle.net/20.500.11937/70285.

Full text

Abstract:

This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks.

APA, Harvard, Vancouver, ISO, and other styles

33

Endris, Kemele M. [Verfasser]. "Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake / Kemele M. Endris." Bonn : Universitäts- und Landesbibliothek Bonn, 2020. http://d-nb.info/1218301740/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Chui, Ka-lam Elsa, and 徐嘉琳. "A semantic web architecture for personalized profiles." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B2961336X.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Goss, Ryan Gavin. "Enabling e-learning 2.0 in information security education: a semantic web approach." Thesis, Nelson Mandela Metropolitan University, 2009. http://hdl.handle.net/10948/909.

Full text

Abstract:

The motivation for this study argued that current information security ed- ucation systems are inadequate for educating all users of computer systems world wide in acting securely during their operations with information sys- tems. There is, therefore, a pervasive need for information security knowledge in all aspects of modern life. E-Learning 2.0 could possi- bly contribute to solving this problem, however, little or no knowledge currently exists regarding the suitability and practicality of using such systems to infer information security knowledge to learners.

APA, Harvard, Vancouver, ISO, and other styles

36

Perry, Matthew Steven. "A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data." Wright State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=wright1219267560.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Morgan, Jac F. "The design of a small business database using the Semantic Database Model." Thesis, Kansas State University, 1985. http://hdl.handle.net/2097/9867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Yu, Zhiguo. "Cooperative Semantic Information Processing for Literature-Based Biomedical Knowledge Discovery." UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/33.

Full text

Abstract:

Given that data is increasing exponentially everyday, extracting and understanding the information, themes and relationships from large collections of documents is more and more important to researchers in many areas. In this paper, we present a cooperative semantic information processing system to help biomedical researchers understand and discover knowledge in large numbers of titles and abstracts from PubMed query results. Our system is based on a prevalent technique, topic modeling, which is an unsupervised machine learning approach for discovering the set of semantic themes in a large set of documents. In addition, we apply a natural language processing technique to transform the “bag-of-words” assumption of topic models to the “bag-of-important-phrases” assumption and build an interactive visualization tool using a modified, open-source, Topic Browser. In the end, we conduct two experiments to evaluate the approach. The first, evaluates whether the “bag-of-important-phrases” approach is better at identifying semantic themes than the standard “bag-of-words” approach. This is an empirical study in which human subjects evaluate the quality of the resulting topics using a standard “word intrusion test” to determine whether subjects can identify a word (or phrase) that does not belong in the topic. The second is a qualitative empirical study to evaluate how well the system helps biomedical researchers explore a set of documents to discover previously hidden semantic themes and connections. The methodology for this study has been successfully used to evaluate other knowledge-discovery tools in biomedicine.

APA, Harvard, Vancouver, ISO, and other styles

39

Bergström, Mattias, and Per Fahlander. "Evaluating a Novel, Scalable Natural Language Processing Heuristic for Determining Semantic Relatedness." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260067.

Full text

Abstract:

Distributional semantics is a recent research field aiming to quantify how close one text is to another in terms of contextual meaning. In this study we propose and evaluate a novel distributional semantics model on how much agreement its predictions can yield with a set of 12,227 human opinions. We call this method Refined Semantic Relatedness (RSR), which applies an incrementally improvable word association index and some distributional principles for producing theoretically educated predictions. Using 1951 preprocessed Wikipedia articles as a basis for the predictions, the model predicted the human opinions with a Pearson correlation of 0.3. In previous literature it has been claimed that Explicit Semantic Analysis (ESA-Wiki) achieve a corresponding Pearson correlation of 0.72 by utilizing 241,393 preprocessed Wikipedia articles. That is roughly 5.76 times more variance accounted for, although, also a result of considerably more extensive preprocessing in terms of articles. While the predictive value of RSR turned out relatively low as a result of the study’s limitations, this could be addressed in further research. We believe that this paper in any way can contribute with some novel ideas to the field.
Distributionssemantik är ett nytt forskningsfält som syftar till att kvantifiera hur nära en text är till en annan gällande kontextuell innebörd. I den här studien föreslår och utvärderar vi en ny distribuerad semantikmodell på hur överensstämmande dess förutsägelser är med en uppsättning av 12227 mänskliga åsikter. Vi kallar denna metod Refined Semantic Relatedness (RSR), som tillämpar ett inkrementellt förbättringsbart underlagsindex samt några distributionsprinciper för att generera förutsägelser teoretiskt sett bättre än slumpen. Genom att använda 1951st förbearbetade Wikipedia artiklar som grund för dessa förutsägelser, förutspådde modellen de mänskliga åsikterna med en Pearson-korrelation på 0,3. I tidigare litteratur har det hävdats att Explicit Semantic Analysis (ESA Wiki) uppnår en motsvarande Pearson-korrelationpå 0,72 genom att använda 241393 förarbetade Wikipedia artiklar. Detta motsvarar att ungefär 5,76 gånger mer varians tillgodoses, men det är också ett resultat av ett betydligt mer omfattande förarbete av artiklar. Medan värdet av RSR för att förutsäga mänskiliga uppfattningar kring semantisk likhet visade sig vara relativt lågt, så kan de orsakande begränsningarna i studien åtgärdas i framtida forskning. Vi tror att den här rapporten på ett eller annat sätt kan bidra med några nya idéer till forskningsfältet.

APA, Harvard, Vancouver, ISO, and other styles

40

Marupudi, Surendra Brahma. "Framework for Semantic Integration and Scalable Processing of City Traffic Events." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1472505847.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Linckels, Serge, and Christoph Meinel. "An e-librarian service : natural language interface for an efficient semantic search within multimedia resources." Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2009/3308/.

Full text

Abstract:

1 Introduction 1.1 Project formulation 1.2 Our contribution 2 Pedagogical Aspect 4 2.1 Modern teaching 2.2 Our Contribution 2.2.1 Autonomous and exploratory learning 2.2.2 Human machine interaction 2.2.3 Short multimedia clips 3 Ontology Aspect 3.1 Ontology driven expert systems 3.2 Our contribution 3.2.1 Ontology language 3.2.2 Concept Taxonomy 3.2.3 Knowledge base annotation 3.2.4 Description Logics 4 Natural language approach 4.1 Natural language processing in computer science 4.2 Our contribution 4.2.1 Explored strategies 4.2.2 Word equivalence 4.2.3 Semantic interpretation 4.2.4 Various problems 5 Information Retrieval Aspect 5.1 Modern information retrieval 5.2 Our contribution 5.2.1 Semantic query generation 5.2.2 Semantic relatedness 6 Implementation 6.1 Prototypes 6.2 Semantic layer architecture 6.3 Development 7 Experiments 7.1 Description of the experiments 7.2 General characteristics of the three sessions, instructions and procedure 7.3 First Session 7.4 Second Session 7.5 Third Session 7.6 Discussion and conclusion 8 Conclusion and future work 8.1 Conclusion 8.2 Open questions A Description Logics B Probabilistic context-free grammars

APA, Harvard, Vancouver, ISO, and other styles

42

Chan, Wing Sze. "Semantic search of multimedia data objects through collaborative intelligence." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1171.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Courtot, Melanie. "Semantic models in biomedicine : building interoperating ontologies for biomedical data representation and processing in pharmacovigilance." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/46804.

Full text

Abstract:

It is increasingly challenging to analyze the data produced in biomedicine, even more so when relying on manual analysis methods. My hypothesis is that using a common representation of knowledge, implemented via standard tools, and logically formalized can make those datasets computationally amenable, help with data integration from multiple sources and allow to answer complex queries. The first part of this dissertation demonstrates that ontologies can be used as common knowledge models, and details several use cases where they have been applied to existing information in the domain of biomedical investigations, clinical data and vaccine representation. In a second part, I address current issues in developing and implementing ontologies, and proposes solutions to make ontologies and the datasets they are applied to available on the Semantic Web, increasing their visibility and reuse. The last part of my thesis then builds upon the first two, and applies their results to pharmacovigilance, and specifically to analysis of reports of adverse events following immunization. I encoded existing standard clinical guidelines from the Brighton Collaboration in Web Ontology Language (OWL) in the Adverse Events Reporting Ontology (AERO) I developed within the framework of the Open Biological and Biomedical Ontologies Foundry. I show that it is possible to automate the classification of adverse events using the AERO with very high specificity (97%). I also demonstrate that AERO can be used with other types of guidelines. Finally, my pipeline relies on open and widely used data standards (Resource Description Framework (RDF), OWL, SPARQL) for implementation, making the system easily transposable to other domains. This thesis validates the usefulness of ontologies as semantic models in biomedicine enabling automated, computational processing of large datasets. It also fulfills the goal of raising awareness of semantic technologies in the clinical community of users. Following my results the Brighton Collaboration is moving towards providing a logical representation of their guidelines.

APA, Harvard, Vancouver, ISO, and other styles

44

Räling, Romy [Verfasser], Isabell [Akademischer Betreuer] Wartenburger, and Astrid [Akademischer Betreuer] Schröder. "Age of acquisition and semantic typicality effects : evidences for distinct processing origins from behavioural and ERP data in healthy and impaired semantic processing / Romy Räling ; Isabell Wartenburger, Astrid Schröder." Potsdam : Universität Potsdam, 2016. http://d-nb.info/1218400897/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Hellmann, Sebastian. "Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-157932.

Full text

Abstract:

This thesis is a compendium of scientific works and engineering specifications that have been contributed to a large community of stakeholders to be copied, adapted, mixed, built upon and exploited in any way possible to achieve a common goal: Integrating Natural Language Processing (NLP) and Language Resources Using Linked Data The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other and the last few years have seen the emergence of numerous approaches in various disciplines concerned with linguistic resources and NLP tools. It is the challenge of our time to store, interlink and exploit this wealth of data accumulated in more than half a century of computational linguistics, of empirical, corpus-based study of language, and of computational lexicography in all its heterogeneity. The vision of the Giant Global Graph (GGG) was conceived by Tim Berners-Lee aiming at connecting all data on the Web and allowing to discover new relations between this openly-accessible data. This vision has been pursued by the Linked Open Data (LOD) community, where the cloud of published datasets comprises 295 data repositories and more than 30 billion RDF triples (as of September 2011). RDF is based on globally unique and accessible URIs and it was specifically designed to establish links between such URIs (or resources). This is captured in the Linked Data paradigm that postulates four rules: (1) Referred entities should be designated by URIs, (2) these URIs should be resolvable over HTTP, (3) data should be represented by means of standards such as RDF, (4) and a resource should include links to other resources. Although it is difficult to precisely identify the reasons for the success of the LOD effort, advocates generally argue that open licenses as well as open access are key enablers for the growth of such a network as they provide a strong incentive for collaboration and contribution by third parties. In his keynote at BNCOD 2011, Chris Bizer argued that with RDF the overall data integration effort can be “split between data publishers, third parties, and the data consumer”, a claim that can be substantiated by observing the evolution of many large data sets constituting the LOD cloud. As written in the acknowledgement section, parts of this thesis has received numerous feedback from other scientists, practitioners and industry in many different ways. The main contributions of this thesis are summarized here: Part I – Introduction and Background. During his keynote at the Language Resource and Evaluation Conference in 2012, Sören Auer stressed the decentralized, collaborative, interlinked and interoperable nature of the Web of Data. The keynote provides strong evidence that Semantic Web technologies such as Linked Data are on its way to become main stream for the representation of language resources. The jointly written companion publication for the keynote was later extended as a book chapter in The People’s Web Meets NLP and serves as the basis for “Introduction” and “Background”, outlining some stages of the Linked Data publication and refinement chain. Both chapters stress the importance of open licenses and open access as an enabler for collaboration, the ability to interlink data on the Web as a key feature of RDF as well as provide a discussion about scalability issues and decentralization. Furthermore, we elaborate on how conceptual interoperability can be achieved by (1) re-using vocabularies, (2) agile ontology development, (3) meetings to refine and adapt ontologies and (4) tool support to enrich ontologies and match schemata. Part II - Language Resources as Linked Data. “Linked Data in Linguistics” and “NLP & DBpedia, an Upward Knowledge Acquisition Spiral” summarize the results of the Linked Data in Linguistics (LDL) Workshop in 2012 and the NLP & DBpedia Workshop in 2013 and give a preview of the MLOD special issue. In total, five proceedings – three published at CEUR (OKCon 2011, WoLE 2012, NLP & DBpedia 2013), one Springer book (Linked Data in Linguistics, LDL 2012) and one journal special issue (Multilingual Linked Open Data, MLOD to appear) – have been (co-)edited to create incentives for scientists to convert and publish Linked Data and thus to contribute open and/or linguistic data to the LOD cloud. Based on the disseminated call for papers, 152 authors contributed one or more accepted submissions to our venues and 120 reviewers were involved in peer-reviewing. “DBpedia as a Multilingual Language Resource” and “Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Linked Data Cloud” contain this thesis’ contribution to the DBpedia Project in order to further increase the size and inter-linkage of the LOD Cloud with lexical-semantic resources. Our contribution comprises extracted data from Wiktionary (an online, collaborative dictionary similar to Wikipedia) in more than four languages (now six) as well as language-specific versions of DBpedia, including a quality assessment of inter-language links between Wikipedia editions and internationalized content negotiation rules for Linked Data. In particular the work described in created the foundation for a DBpedia Internationalisation Committee with members from over 15 different languages with the common goal to push DBpedia as a free and open multilingual language resource. Part III - The NLP Interchange Format (NIF). “NIF 2.0 Core Specification”, “NIF 2.0 Resources and Architecture” and “Evaluation and Related Work” constitute one of the main contribution of this thesis. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. The core specification is included in and describes which URI schemes and RDF vocabularies must be used for (parts of) natural language texts and annotations in order to create an RDF/OWL-based interoperability layer with NIF built upon Unicode Code Points in Normal Form C. In , classes and properties of the NIF Core Ontology are described to formally define the relations between text, substrings and their URI schemes. contains the evaluation of NIF. In a questionnaire, we asked questions to 13 developers using NIF. UIMA, GATE and Stanbol are extensible NLP frameworks and NIF was not yet able to provide off-the-shelf NLP domain ontologies for all possible domains, but only for the plugins used in this study. After inspecting the software, the developers agreed however that NIF is adequate enough to provide a generic RDF output based on NIF using literal objects for annotations. All developers were able to map the internal data structure to NIF URIs to serialize RDF output (Adequacy). The development effort in hours (ranging between 3 and 40 hours) as well as the number of code lines (ranging between 110 and 445) suggest, that the implementation of NIF wrappers is easy and fast for an average developer. Furthermore the evaluation contains a comparison to other formats and an evaluation of the available URI schemes for web annotation. In order to collect input from the wide group of stakeholders, a total of 16 presentations were given with extensive discussions and feedback, which has lead to a constant improvement of NIF from 2010 until 2013. After the release of NIF (Version 1.0) in November 2011, a total of 32 vocabulary employments and implementations for different NLP tools and converters were reported (8 by the (co-)authors, including Wiki-link corpus, 13 by people participating in our survey and 11 more, of which we have heard). Several roll-out meetings and tutorials were held (e.g. in Leipzig and Prague in 2013) and are planned (e.g. at LREC 2014). Part IV - The NLP Interchange Format in Use. “Use Cases and Applications for NIF” and “Publication of Corpora using NIF” describe 8 concrete instances where NIF has been successfully used. One major contribution in is the usage of NIF as the recommended RDF mapping in the Internationalization Tag Set (ITS) 2.0 W3C standard and the conversion algorithms from ITS to NIF and back. One outcome of the discussions in the standardization meetings and telephone conferences for ITS 2.0 resulted in the conclusion there was no alternative RDF format or vocabulary other than NIF with the required features to fulfill the working group charter. Five further uses of NIF are described for the Ontology of Linguistic Annotations (OLiA), the RDFaCE tool, the Tiger Corpus Navigator, the OntosFeeder and visualisations of NIF using the RelFinder tool. These 8 instances provide an implemented proof-of-concept of the features of NIF. starts with describing the conversion and hosting of the huge Google Wikilinks corpus with 40 million annotations for 3 million web sites. The resulting RDF dump contains 477 million triples in a 5.6 GB compressed dump file in turtle syntax. describes how NIF can be used to publish extracted facts from news feeds in the RDFLiveNews tool as Linked Data. Part V - Conclusions. provides lessons learned for NIF, conclusions and an outlook on future work. Most of the contributions are already summarized above. One particular aspect worth mentioning is the increasing number of NIF-formated corpora for Named Entity Recognition (NER) that have come into existence after the publication of the main NIF paper Integrating NLP using Linked Data at ISWC 2013. These include the corpora converted by Steinmetz, Knuth and Sack for the NLP & DBpedia workshop and an OpenNLP-based CoNLL converter by Brümmer. Furthermore, we are aware of three LREC 2014 submissions that leverage NIF: NIF4OGGD - NLP Interchange Format for Open German Governmental Data, N^3 – A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format and Global Intelligent Content: Active Curation of Language Resources using Linked Data as well as an early implementation of a GATE-based NER/NEL evaluation framework by Dojchinovski and Kliegr. Further funding for the maintenance, interlinking and publication of Linguistic Linked Data as well as support and improvements of NIF is available via the expiring LOD2 EU project, as well as the CSA EU project called LIDER, which started in November 2013. Based on the evidence of successful adoption presented in this thesis, we can expect a decent to high chance of reaching critical mass of Linked Data technology as well as the NIF standard in the field of Natural Language Processing and Language Resources.

APA, Harvard, Vancouver, ISO, and other styles

46

Fan, Yang, Hidehiko Masuhara, Tomoyuki Aotani, Flemming Nielson, and Hanne Riis Nielson. "AspectKE*: Security aspects with program analysis for distributed systems." Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2010/4136/.

Full text

Abstract:

Enforcing security policies to distributed systems is difficult, in particular, when a system contains untrusted components. We designed AspectKE*, a distributed AOP language based on a tuple space, to tackle this issue. In AspectKE*, aspects can enforce access control policies that depend on future behavior of running processes. One of the key language features is the predicates and functions that extract results of static program analysis, which are useful for defining security aspects that have to know about future behavior of a program. AspectKE* also provides a novel variable binding mechanism for pointcuts, so that pointcuts can uniformly specify join points based on both static and dynamic information about the program. Our implementation strategy performs fundamental static analysis at load-time, so as to retain runtime overheads minimal. We implemented a compiler for AspectKE*, and demonstrate usefulness of AspectKE* through a security aspect for a distributed chat system.

APA, Harvard, Vancouver, ISO, and other styles

47

Necşulescu, Silvia. "Automatic acquisition of lexical-semantic relations: gathering information in a dense representation." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/374234.

Full text

Abstract:

Lexical-semantic relationships between words are key information for many NLP tasks, which require this knowledge in the form of lexical resources. This thesis addresses the acquisition of lexical-semantic relation instances. State of the art systems rely on word pair representations based on patterns of contexts where two related words co-occur to detect their relation. This approach is hindered by data sparsity: even when mining very large corpora, not every semantically related word pair co-occurs or not frequently enough. In this work, we investigate novel representations to predict if two words hold a lexical-semantic relation. Our intuition was that these representations should contain information about word co-occurrences combined with information about the meaning of words involved in the relation. These two sources of information have to be the basis of a generalization strategy to be able to provide information even for words that do not co-occur.
Les relacions lexicosemàntiques entre paraules són una informació clau per a moltes tasques del PLN, què requereixen aquest coneixement en forma de recursos lingüístics. Aquesta tesi tracta l’adquisició d'instàncies lexicosemàntiques. Els sistemes actuals utilitzen representacions basades en patrons dels contextos en què dues paraules coocorren per detectar la relació que s'hi estableix. Aquest enfocament s'enfronta a problemes de falta d’informació: fins i tot en el cas de treballar amb corpus de grans dimensions, hi haurà parells de paraules relacionades que no coocorreran, o no ho faran amb la freqüència necessària. Per tant, el nostre objectiu principal ha estat proposar noves representacions per predir si dues paraules estableixen una relació lexicosemàntica. La intuïció era que aquestes representacions noves havien de contenir informació sobre patrons dels contextos, combinada amb informació sobre el significat de les paraules implicades en la relació. Aquestes dues fonts d'informació havien de ser la base d'una estratègia de generalització que oferís informació fins i tot quan les dues paraules no coocorrien.

APA, Harvard, Vancouver, ISO, and other styles

48

Linckels, Serge. "An e-librarian service : supporting explorative learning by a description logics based semantic retrieval tool." Phd thesis, Universität Potsdam, 2008. http://opus.kobv.de/ubp/volltexte/2008/1745/.

Full text

Abstract:

Although educational content in electronic form is increasing dramatically, its usage in an educational environment is poor, mainly due to the fact that there is too much of (unreliable) redundant, and not relevant information. Finding appropriate answers is a rather difficult task being reliant on the user filtering of the pertinent information from the noise. Turning knowledge bases like the online tele-TASK archive into useful educational resources requires identifying correct, reliable, and "machine-understandable" information, as well as developing simple but efficient search tools with the ability to reason over this information. Our vision is to create an E-Librarian Service, which is able to retrieve multimedia resources from a knowledge base in a more efficient way than by browsing through an index, or by using a simple keyword search. In our E-Librarian Service, the user can enter his question in a very simple and human way; in natural language (NL). Our premise is that more pertinent results would be retrieved if the search engine understood the sense of the user's query. The returned results are then logical consequences of an inference rather than of keyword matchings. Our E-Librarian Service does not return the answer to the user's question, but it retrieves the most pertinent document(s), in which the user finds the answer to his/her question. Among all the documents that have some common information with the user query, our E-Librarian Service identifies the most pertinent match(es), keeping in mind that the user expects an exhaustive answer while preferring a concise answer with only little or no information overhead. Also, our E-Librarian Service always proposes a solution to the user, even if the system concludes that there is no exhaustive answer. Our E-Librarian Service was implemented prototypically in three different educational tools. A first prototype is CHESt (Computer History Expert System); it has a knowledge base with 300 multimedia clips that cover the main events in computer history. A second prototype is MatES (Mathematics Expert System); it has a knowledge base with 115 clips that cover the topic of fractions in mathematics for secondary school w.r.t. the official school programme. All clips were recorded mainly by pupils. The third and most advanced prototype is the "Lecture Butler's E-Librarain Service"; it has a Web service interface to respect a service oriented architecture (SOA), and was developed in the context of the Web-University project at the Hasso-Plattner-Institute (HPI). Two major experiments in an educational environment - at the Lycée Technique Esch/Alzette in Luxembourg - were made to test the pertinence and reliability of our E-Librarian Service as a complement to traditional courses. The first experiment (in 2005) was made with CHESt in different classes, and covered a single lesson. The second experiment (in 2006) covered a period of 6 weeks of intensive use of MatES in one class. There was no classical mathematics lesson where the teacher gave explanations, but the students had to learn in an autonomous and exploratory way. They had to ask questions to the E-Librarian Service just the way they would if there was a human teacher.
Obwohl sich die Verfügbarkeit von pädagogischen Inhalten in elektronischer Form stetig erhöht, ist deren Nutzen in einem schulischen Umfeld recht gering. Die Hauptursache dessen ist, dass es zu viele unzuverlässige, redundante und nicht relevante Informationen gibt. Das Finden von passenden Lernobjekten ist eine schwierige Aufgabe, die vom benutzerbasierten Filtern der passenden Informationen abhängig ist. Damit Wissensbanken wie das online Tele-TASK Archiv zu nützlichen, pädagogischen Ressourcen werden, müssen Lernobjekte korrekt, zuverlässig und in maschinenverständlicher Form identifiziert werden, sowie effiziente Suchwerkzeuge entwickelt werden. Unser Ziel ist es, einen E-Bibliothekar-Dienst zu schaffen, der multimediale Ressourcen in einer Wissensbank auf effizientere Art und Weise findet als mittels Navigieren durch ein Inhaltsverzeichnis oder mithilfe einer einfachen Stichwortsuche. Unsere Prämisse ist, dass passendere Ergebnisse gefunden werden könnten, wenn die semantische Suchmaschine den Sinn der Benutzeranfrage verstehen würde. In diesem Fall wären die gelieferten Antworten logische Konsequenzen einer Inferenz und nicht die einer Schlüsselwortsuche. Tests haben gezeigt, dass unser E-Bibliothekar-Dienst unter allen Dokumenten in einer gegebenen Wissensbank diejenigen findet, die semantisch am besten zur Anfrage des Benutzers passen. Dabei gilt, dass der Benutzer eine vollständige und präzise Antwort erwartet, die keine oder nur wenige Zusatzinformationen enthält. Außerdem ist unser System in der Lage, dem Benutzer die Qualität und Pertinenz der gelieferten Antworten zu quantifizieren und zu veranschaulichen. Schlussendlich liefert unser E-Bibliothekar-Dienst dem Benutzer immer eine Antwort, selbst wenn das System feststellt, dass es keine vollständige Antwort auf die Frage gibt. Unser E-Bibliothekar-Dienst ermöglicht es dem Benutzer, seine Fragen in einer sehr einfachen und menschlichen Art und Weise auszudrücken, nämlich in natürlicher Sprache. Linguistische Informationen und ein gegebener Kontext in Form einer Ontologie werden für die semantische Übersetzung der Benutzereingabe in eine logische Form benutzt. Unser E-Bibliothekar-Dienst wurde prototypisch in drei unterschiedliche pädagogische Werkzeuge umgesetzt. In zwei Experimenten wurde in einem pädagogischen Umfeld die Angemessenheit und die Zuverlässigkeit dieser Werkzeuge als Komplement zum klassischen Unterricht geprüft. Die Hauptergebnisse sind folgende: Erstens wurde festgestellt, dass Schüler generell akzeptieren, ganze Fragen einzugeben - anstelle von Stichwörtern - wenn dies ihnen hilft, bessere Suchresultate zu erhalten. Zweitens, das wichtigste Resultat aus den Experimenten ist die Erkenntnis, dass Schuleresultate verbessert werden können, wenn Schüler unseren E-Bibliothekar-Dienst verwenden. Wir haben eine generelle Verbesserung von 5% der Schulresultate gemessen. 50% der Schüler haben ihre Schulnoten verbessert, 41% von ihnen sogar maßgeblich. Einer der Hauptgründe für diese positiven Resultate ist, dass die Schüler motivierter waren und folglich bereit waren, mehr Einsatz und Fleiß in das Lernen und in das Erwerben von neuem Wissen zu investieren.

APA, Harvard, Vancouver, ISO, and other styles

49

Greenwell, Richard. "An approach to the semantic intelligence cloud." Thesis, Edinburgh Napier University, 2018. http://researchrepository.napier.ac.uk/Output/1255157.

Full text

Abstract:

Cloud computing is a disruptive technology that aims to provide a utility approach to computing, where users can obtain their required computing resources without investment in infrastructure, computing platforms or services. Cloud computing resources can be obtained from a number internal or external sources. The heterogeneity of cloud service provision makes comparison of services difficult, with further complexity being introduced by a number of provision approaches such as reserved purchase, on-demand provisioning and spot markets. The aim of the research was to develop a semantic framework for cloud computing services which incorporated Cloud Service Agreements, requirements, pricing and Benefits Management. The proposed approach sees the development of an integrated framework where Cloud Service Agreements describe the relationship between cloud service providers and cloud service users. Requirements are developed from agreements and can use the concepts, relationships and assertions provided as requirements. Pricing in turn is established from requirements. Benefits Management is pervasive across the semantic framework developed. The methods used were to provide a comprehensive review of literature to establish a good theoretical basis for the research undertaken. Then problem solving ontology was developed that defined concepts and relationships for the proposed semantic framework. A number of case studies were used to populate the developed ontology with assertions. Reasoning was used to test the framework was correct. The results produced were a proposed framework of concepts, relationships and assertions for a cloud service descriptions, which are presented as ontology in textual and graphical form. Several parts of the ontology were published on public ontology platforms and, in journal and conference papers. The original contribution to knowledge is seen in the results produced. The proposed framework provides the foundations for development of a unified semantic framework for cloud computing service description and has been used by other researchers developing semantic cloud service description. In the area of Cloud Service Agreements a full coverage of the documents described by major standards organisations have been encoded into the framework. Requirements have been modelled as a unique multilevel semantic representation. Pricing of cloud services has been developed using semantic description that can be mapped to requirements. The existing Benefits Management approach has been reimplemented using semantic description. In conclusion a proposed framework has been developed that allows the semantic description of cloud computing services. This approach provides greater expression than simplistic frameworks that use mathematical formulas or models with simple relationships between concepts. The proposed framework is limited to a narrow area of service description and requires expansion to be viable in a commercial setting. Further work sees the development of software toolsets based on the semantic description developed to realise a viable product for mapping high level cloud service requirements to low level cloud resources.

APA, Harvard, Vancouver, ISO, and other styles

50

Zampetakis, Stamatis. "Scalable algorithms for cloud-based Semantic Web data management." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112199/document.

Full text

Abstract:

Afin de construire des systèmes intelligents, où les machines sont capables de raisonner exactement comme les humains, les données avec sémantique sont une exigence majeure. Ce besoin a conduit à l’apparition du Web sémantique, qui propose des technologies standards pour représenter et interroger les données avec sémantique. RDF est le modèle répandu destiné à décrire de façon formelle les ressources Web, et SPARQL est le langage de requête qui permet de rechercher, d’ajouter, de modifier ou de supprimer des données RDF. Être capable de stocker et de rechercher des données avec sémantique a engendré le développement des nombreux systèmes de gestion des données RDF.L’évolution rapide du Web sémantique a provoqué le passage de systèmes de gestion des données centralisées à ceux distribués. Les premiers systèmes étaient fondés sur les architectures pair-à-pair et client-serveur, alors que récemment l’attention se porte sur le cloud computing.Les environnements de cloud computing ont fortement impacté la recherche et développement dans les systèmes distribués. Les fournisseurs de cloud offrent des infrastructures distribuées autonomes pouvant être utilisées pour le stockage et le traitement des données. Les principales caractéristiques du cloud computing impliquent l’évolutivité́, la tolérance aux pannes et l’allocation élastique des ressources informatiques et de stockage en fonction des besoins des utilisateurs.Cette thèse étudie la conception et la mise en œuvre d’algorithmes et de systèmes passant à l’échelle pour la gestion des données du Web sémantique sur des platformes cloud. Plus particulièrement, nous étudions la performance et le coût d’exploitation des services de cloud computing pour construire des entrepôts de données du Web sémantique, ainsi que l’optimisation de requêtes SPARQL pour les cadres massivement parallèles.Tout d’abord, nous introduisons les concepts de base concernant le Web sémantique et les principaux composants des systèmes fondés sur le cloud. En outre, nous présentons un aperçu des systèmes de gestion des données RDF (centralisés et distribués), en mettant l’accent sur les concepts critiques de stockage, d’indexation, d’optimisation des requêtes et d’infrastructure.Ensuite, nous présentons AMADA, une architecture de gestion de données RDF utilisant les infrastructures de cloud public. Nous adoptons le modèle de logiciel en tant que service (software as a service - SaaS), où la plateforme réside dans le cloud et des APIs appropriées sont mises à disposition des utilisateurs, afin qu’ils soient capables de stocker et de récupérer des données RDF. Nous explorons diverses stratégies de stockage et d’interrogation, et nous étudions leurs avantages et inconvénients au regard de la performance et du coût monétaire, qui est une nouvelle dimension importante à considérer dans les services de cloud public.Enfin, nous présentons CliqueSquare, un système distribué de gestion des données RDF basé sur Hadoop. CliqueSquare intègre un nouvel algorithme d’optimisation qui est capable de produire des plans massivement parallèles pour des requêtes SPARQL. Nous présentons une famille d’algorithmes d’optimisation, s’appuyant sur les équijointures n- aires pour générer des plans plats, et nous comparons leur capacité à trouver les plans les plus plats possibles. Inspirés par des techniques de partitionnement et d’indexation existantes, nous présentons une stratégie de stockage générique appropriée au stockage de données RDF dans HDFS (Hadoop Distributed File System). Nos résultats expérimentaux valident l’effectivité et l’efficacité de l’algorithme d’optimisation démontrant également la performance globale du système
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems. The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frameworks.First, we introduce the basic concepts around Semantic Web and the main components and frameworks interacting in massively parallel cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure. Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services. Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Semantics - Data processing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles