Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Relation extractor.

Дисертації з теми "Relation extractor"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 дисертацій для дослідження на тему "Relation extractor".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Філоненко, О. В., Олена Петрівна Черних та Олександр Миколайович Шеін. "Фільтрування інтернет спаму за допомогою обробки природної мови". Thesis, Національний технічний університет "Харківський політехнічний інститут", 2017. http://repository.kpi.kharkov.ua/handle/KhPI-Press/43684.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Scheible, Silke. "Computational treatment of superlatives." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/4153.

Повний текст джерела
Анотація:
The use of gradable adjectives and adverbs represents an important means of expressing comparison in English. The grammatical forms of comparatives and superlatives are used to express explicit orderings between objects with respect to the degree to which they possess some gradable property. While comparatives are commonly used to compare two entities (e.g., “The blue whale is larger than an African elephant”), superlatives such as “The blue whale is the largest mammal” are used to express a comparison between a target entity (here, the blue whale) and its comparison set (the set of mammals), with the target ranked higher or lower on a scale of comparison than members of the comparison set. Superlatives thus highlight the uniqueness of the target with respect to its comparison set. Although superlatives are frequently found in natural language, with the exception of recent work by (Bos and Nissim, 2006) and (Jindal and Liu, 2006b), they have not yet been investigated within a computational framework. And within the framework of theoretical linguistics, studies of superlatives have mainly focused on semantic properties that may only rarely occur in natural language (Szabolsci (1986), Heim (1999)). My PhD research aims to pave the way for a comprehensive computational treatment of superlatives. The initial question I am addressing is that of automatically extracting useful information about the target entity, its comparison set and their relationship from superlative constructions. One of the central claims of the thesis is that no unified computational treatment of superlatives is possible because of their great semantic complexity and the variety of syntactic structures in which they occur. I propose a classification of superlative surface forms, and initially focus on so-called “ISA superlatives”, which make explicit the IS-A relation that holds between target and comparison set. They are suitable for a computational approach because both their target and comparison set are usually explicitly realised in the text. I also aim to show that the findings of this thesis are of potential benefit for NLP applications such as Question Answering, Natural Language Generation, Ontology Learning, and Sentiment Analysis/Opinion Mining. In particular, I investigate the use of the “Superlative Relation Extractor“ implemented in this project in the area of Sentiment Analysis/Opinion Mining, and claim that a superlative analysis of the sort presented in this thesis, when applied to product evaluations and recommendations, can provide just the kind of information that Opinion Mining aims to identify.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Hachey, Benjamin. "Towards generic relation extraction." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3978.

Повний текст джерела
Анотація:
A vast amount of usable electronic data is in the form of unstructured text. The relation extraction task aims to identify useful information in text (e.g., PersonW works for OrganisationX, GeneY encodes ProteinZ) and recode it in a format such as a relational database that can be more effectively used for querying and automated reasoning. However, adapting conventional relation extraction systems to new domains or tasks requires significant effort from annotators and developers. Furthermore, previous adaptation approaches based on bootstrapping start from example instances of the target relations, thus requiring that the correct relation type schema be known in advance. Generic relation extraction (GRE) addresses the adaptation problem by applying generic techniques that achieve comparable accuracy when transferred, without modification of model parameters, across domains and tasks. Previous work on GRE has relied extensively on various lexical and shallow syntactic indicators. I present new state-of-the-art models for GRE that incorporate governordependency information. I also introduce a dimensionality reduction step into the GRE relation characterisation sub-task, which serves to capture latent semantic information and leads to significant improvements over an unreduced model. Comparison of dimensionality reduction techniques suggests that latent Dirichlet allocation (LDA) – a probabilistic generative approach – successfully incorporates a larger and more interdependent feature set than a model based on singular value decomposition (SVD) and performs as well as or better than SVD on all experimental settings. Finally, I will introduce multi-document summarisation as an extrinsic test bed for GRE and present results which demonstrate that the relative performance of GRE models is consistent across tasks and that the GRE-based representation leads to significant improvements over a standard baseline from the literature. Taken together, the experimental results 1) show that GRE can be improved using dependency parsing and dimensionality reduction, 2) demonstrate the utility of GRE for the content selection step of extractive summarisation and 3) validate the GRE claim of modification-free adaptation for the first time with respect to both domain and task. This thesis also introduces data sets derived from publicly available corpora for the purpose of rigorous intrinsic evaluation in the news and biomedical domains.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

NUNES, THIAGO RIBEIRO. "BUILDING RELATION EXTRACTORS THROUGH DISTANT SUPERVISION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=21588@1.

Повний текст джерела
Анотація:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Um problema conhecido no processo de construção de extratores de relações semânticas supervisionados em textos em linguagem natural é a disponibilidade de uma quantidade suficiente de exemplos positivos para um conjunto amplo de relações-alvo. Este trabalho apresenta uma abordagem supervisionada a distância para construção de extratores de relações a um baixo custo combinando duas das maiores fontes de informação estruturada e não estruturada disponíveis na Web, o DBpedia e a Wikipedia. O método implementado mapeia relações da ontologia do DBpedia de volta para os textos da Wikipedia para montar um conjunto amplo de exemplos contendo mais de 100.000 sentenças descrevendo mais de 90 relações do DBpedia para os idiomas Inglês e Português. Inicialmente, são extraídas sentenças dos artigos da Wikipedia candidatas a expressar relações do DBpedia. Após isso, esses dados são pré-processados e normalizados através da filtragem de sentenças irrelevantes. Finalmente, extraem-se características dos exemplos para treinamento e avaliação de extratores de relações utilizando SVM. Os experimentos realizados nos idiomas Inglês e Português, através de linhas de base, mostram as melhorias alcançadas quando combinados diferentes tipos de características léxicas, sintáticas e semânticas. Para o idioma Inglês, o extrator construído foi treinado em um corpus constituído de 90 relações com 42.471 exemplos de treinamento, atingindo 81.08 por cento de medida F1 em um conjunto de testes contendo 28.773 instâncias. Para Português, o extrator foi treinado em um corpus de 50 relações com 200 exemplos por relação, resultando em um valor de 81.91 por cento de medida F1 em um conjunto de testes contendo 18.333 instâncias. Um processo de Extração de Relações (ER) é constituído de várias etapas, que vão desde o pré-processamento dos textos até o treinamento e a avaliação de detectores de relações supervisionados. Cada etapa pode admitir a implementação de uma ou várias técnicas distintas. Portanto, além da abordagem, este trabalho apresenta, também, detalhes da arquitetura de um framework para apoiar a implementação e a realização de experimentos em um processo de ER.
A well known drawback in building machine learning semantic relation detectors for natural language is the availability of a large number of qualified training instances for the target relations. This work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining the two largest resources of structured and unstructured content available on the Web, the DBpedia and the Wikipedia resources. We map the DBpedia ontology back to the Wikipedia to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described at DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct SVM detectors. The experiments performed on the English and Portuguese baselines shows that the lexical and syntactic features extracted from Wikipedia texts combined with the semantic features extracted from DBpedia can significantly improve the performance of relation detectors. For English language, the SVM detector was trained in a corpus formed by 90 DBpedia relations and 42.471 training instances, achieving 81.08 per cent of F-Measure when applied to a test set formed by 28.773 instances. The Portuguese detector was trained with 50 DBpedia relations and 200 examples by relation, being evaluated in 81.91 per cent of F-Measure in a test set containing 18.333 instances. A Relation Extraction (RE) process has many distinct steps that usually begins with text pre-processing and finish with the training and the evaluation of relation detectors. Therefore, this works not only presents an RE approach but also an architecture of a framework that supports the implementation and the experiments of a RE process.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Minard, Anne-Lyse. "Extraction de relations en domaine de spécialité." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00777749.

Повний текст джерела
Анотація:
La quantité d'information disponible dans le domaine biomédical ne cesse d'augmenter. Pour que cette information soit facilement utilisable par les experts d'un domaine, il est nécessaire de l'extraire et de la structurer. Pour avoir des données structurées, il convient de détecter les relations existantes entre les entités dans les textes. Nos recherches se sont focalisées sur la question de l'extraction de relations complexes représentant des résultats expérimentaux, et sur la détection et la catégorisation de relations binaires entre des entités biomédicales. Nous nous sommes intéressée aux résultats expérimentaux présentés dans les articles scientifiques. Nous appelons résultat expérimental, un résultat quantitatif obtenu suite à une expérience et mis en relation avec les informations permettant de décrire cette expérience. Ces résultats sont importants pour les experts en biologie, par exemple pour faire de la modélisation. Dans le domaine de la physiologie rénale, une base de données a été créée pour centraliser ces résultats d'expérimentation, mais l'alimentation de la base est manuelle et de ce fait longue. Nous proposons une solution pour extraire automatiquement des articles scientifiques les connaissances pertinentes pour la base de données, c'est-à-dire des résultats expérimentaux que nous représentons par une relation n-aire. La méthode procède en deux étapes : extraction automatique des documents et proposition de celles-ci pour validation ou modification par l'expert via une interface. Nous avons également proposé une méthode à base d'apprentissage automatique pour l'extraction et la classification de relations binaires en domaine de spécialité. Nous nous sommes intéressée aux caractéristiques et variétés d'expressions des relations, et à la prise en compte de ces caractéristiques dans un système à base d'apprentissage. Nous avons étudié la prise en compte de la structure syntaxique de la phrase et la simplification de phrases dirigée pour la tâche d'extraction de relations. Nous avons en particulier développé une méthode de simplification à base d'apprentissage automatique, qui utilise en cascade plusieurs classifieurs.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Augenstein, Isabelle. "Web relation extraction with distant supervision." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/13247/.

Повний текст джерела
Анотація:
Being able to find relevant information about prominent entities quickly is the main reason to use a search engine. However, with large quantities of information on the World Wide Web, real time search over billions of Web pages can waste resources and the end user’s time. One of the solutions to this is to store the answer to frequently asked general knowledge queries, such as the albums released by a musical artist, in a more accessible format, a knowledge base. Knowledge bases can be created and maintained automatically by using information extraction methods, particularly methods to extract relations between proper names (named entities). A group of approaches for this that has become popular in recent years are distantly supervised approaches as they allow to train relation extractors without text-bound annotation, using instead known relations from a knowledge base to heuristically align them with a large textual corpus from an appropriate domain. This thesis focuses on researching distant supervision for the Web domain. A new setting for creating training and testing data for distant supervision from the Web with entity-specific search queries is introduced and the resulting corpus is published. Methods to recognise noisy training examples as well as methods to combine extractions based on statistics derived from the background knowledge base are researched. Using co-reference resolution methods to extract relations from sentences which do not contain a direct mention of the subject of the relation is also investigated. One bottleneck for distant supervision for Web data is identified to be named entity recognition and classification (NERC), since relation extraction methods rely on it for identifying relation arguments. Typically, existing pre-trained tools are used, which fail in diverse genres with non-standard language, such as the Web genre. The thesis explores what can cause NERC methods to fail in diverse genres and quantifies different reasons for NERC failure. Finally, a novel method for NERC for relation extraction is proposed based on the idea of jointly training the named entity classifier and the relation extractor with imitation learning to reduce the reliance on external NERC tools. This thesis improves the state of the art in distant supervision for knowledge base population, and sheds light on and proposes solutions for issues arising for information extraction for not traditionally studied domains.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Jean-Louis, Ludovic. "Approches supervisées et faiblement supervisées pour l’extraction d’événements et le peuplement de bases de connaissances." Thesis, Paris 11, 2011. http://www.theses.fr/2011PA112288/document.

Повний текст джерела
Анотація:
La plus grande partie des informations disponibles librement sur le Web se présentent sous une forme textuelle, c'est-à-dire non-structurée. Dans un contexte comme celui de la veille, il est très utile de pouvoir présenter les informations présentes dans les textes sous une forme structurée en se focalisant sur celles jugées pertinentes vis-à-vis du domaine d'intérêt considéré. Néanmoins, lorsque l'on souhaite traiter ces informations de façon systématique, les méthodes manuelles ne sont pas envisageables du fait du volume important des données à considérer.L'extraction d'information s'inscrit dans la perspective de l'automatisation de ce type de tâches en identifiant dans des textes les informations concernant des faits (ou événements) afin de les stocker dans des structures de données préalablement définies. Ces structures, appelées templates (ou formulaires), agrègent les informations caractéristiques d'un événement ou d'un domaine d'intérêt représentées sous la forme d'entités nommées (nom de lieux, etc.).Dans ce contexte, le travail de thèse que nous avons mené s'attache à deux grandes problématiques : l'identification des informations liées à un événement lorsque ces informations sont dispersées à une échelle textuelle en présence de plusieurs occurrences d'événements de même type;la réduction de la dépendance vis-à-vis de corpus annotés pour la mise en œuvre d'un système d'extraction d'information.Concernant la première problématique, nous avons proposé une démarche originale reposant sur deux étapes. La première consiste en une segmentation événementielle identifiant dans un document les zones de texte faisant référence à un même type d'événements, en s'appuyant sur des informations de nature temporelle. Cette segmentation détermine ainsi les zones sur lesquelles le processus d'extraction doit se focaliser. La seconde étape sélectionne à l'intérieur des segments identifiés comme pertinents les entités associées aux événements. Elle conjugue pour ce faire une extraction de relations entre entités à un niveau local et un processus de fusion global aboutissant à un graphe d'entités. Un processus de désambiguïsation est finalement appliqué à ce graphe pour identifier l'entité occupant un rôle donné vis-à-vis d'un événement lorsque plusieurs sont possibles.La seconde problématique est abordée dans un contexte de peuplement de bases de connaissances à partir de larges ensembles de documents (plusieurs millions de documents) en considérant un grand nombre (une quarantaine) de types de relations binaires entre entités nommées. Compte tenu de l'effort représenté par l'annotation d'un corpus pour un type de relations donné et du nombre de types de relations considérés, l'objectif est ici de s'affranchir le plus possible du recours à une telle annotation tout en conservant une approche par apprentissage. Cet objectif est réalisé par le biais d'une approche dite de supervision distante prenant comme point de départ des exemples de relations issus d'une base de connaissances et opérant une annotation non supervisée de corpus en fonction de ces relations afin de constituer un ensemble de relations annotées destinées à la construction d'un modèle par apprentissage. Cette approche a été évaluée à large échelle sur les données de la campagne TAC-KBP 2010
The major part of the information available on the web is provided in textual form, i.e. in unstructured form. In a context such as technology watch, it is useful to present the information extracted from a text in a structured form, reporting only the pieces of information that are relevant to the considered field of interest. Such processing cannot be performed manually at large scale, given the large amount of data available. The automated processing of this task falls within the Information extraction (IE) domain.The purpose of IE is to identify, within documents, pieces of information related to facts (or events) in order to store this information in predefined data structures. These structures, called templates, aggregate fact properties - often represented by named entities - concerning an event or an area of interest.In this context, the research performed in this thesis addresses two problems:identifying information related to a specific event, when the information is scattered across a text and several events of the same type are mentioned in the text;reducing the dependency to annotated corpus for the implementation of an Information Extraction system.Concerning the first problem, we propose an original approach that relies on two steps. The first step operates an event-based text segmentation, which identifies within a document the text segments on which the IE process shall focus to look for the entities associated with a given event. The second step focuses on template filling and aims at selecting, within the segments identified as relevant by the event-based segmentation, the entities that should be used as fillers, using a graph-based method. This method is based on a local extraction of relations between entities, that are merged in a relation graph. A disambiguation step is then performed on the graph to identify the best candidates to fill the information template.The second problem is treated in the context of knowledge base (KB) population, using a large collection of texts (several millions) from which the information is extracted. This extraction also concerns a large number of relation types (more than 40), which makes the manual annotation of the collection too expensive. We propose, in this context, a distant supervision approach in order to use learning techniques for this extraction, without the need of a fully annotated corpus. This distant supervision approach uses a set of relations from an existing KB to perform an unsupervised annotation of a collection, from which we learn a model for relation extraction. This approach has been evaluated at a large scale on the data from the TAC-KBP 2010 evaluation campaign
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Afzal, Naveed. "Unsupervised relation extraction for e-learning applications." Thesis, University of Wolverhampton, 2011. http://hdl.handle.net/2436/299064.

Повний текст джерела
Анотація:
In this modern era many educational institutes and business organisations are adopting the e-Learning approach as it provides an effective method for educating and testing their students and staff. The continuous development in the area of information technology and increasing use of the internet has resulted in a huge global market and rapid growth for e-Learning. Multiple Choice Tests (MCTs) are a popular form of assessment and are quite frequently used by many e-Learning applications as they are well adapted to assessing factual, conceptual and procedural information. In this thesis, we present an alternative to the lengthy and time-consuming activity of developing MCTs by proposing a Natural Language Processing (NLP) based approach that relies on semantic relations extracted using Information Extraction to automatically generate MCTs. Information Extraction (IE) is an NLP field used to recognise the most important entities present in a text, and the relations between those concepts, regardless of their surface realisations. In IE, text is processed at a semantic level that allows the partial representation of the meaning of a sentence to be produced. IE has two major subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). In this work, we present two unsupervised RE approaches (surface-based and dependency-based). The aim of both approaches is to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. In the surface-based approach, we examined different surface pattern types, each implementing different assumptions about the linguistic expression of semantic relations between named entities while in the dependency-based approach we explored how dependency relations based on dependency trees can be helpful in extracting relations between named entities. Our findings indicate that the presented approaches are capable of achieving high precision rates. Our experiments make use of traditional, manually compiled corpora along with similar corpora automatically collected from the Web. We found that an automatically collected web corpus is still unable to ensure the same level of topic relevance as attained in manually compiled traditional corpora. Comparison between the surface-based and the dependency-based approaches revealed that the dependency-based approach performs better. Our research enabled us to automatically generate questions regarding the important concepts present in a domain by relying on unsupervised relation extraction approaches as extracted semantic relations allow us to identify key information in a sentence. The extracted patterns (semantic relations) are then automatically transformed into questions. In the surface-based approach, questions are automatically generated from sentences matched by the extracted surface-based semantic pattern which relies on a certain set of rules. Conversely, in the dependency-based approach questions are automatically generated by traversing the dependency tree of extracted sentence matched by the dependency-based semantic patterns. The MCQ systems produced from these surface-based and dependency-based semantic patterns were extrinsically evaluated by two domain experts in terms of questions and distractors readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The evaluation results revealed that the MCQ system based on dependency-based semantic relations performed better than the surface-based one. A major outcome of this work is an integrated system for MCQ generation that has been evaluated by potential end users.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Loper, Edward (Edward Daniel) 1977. "Applying semantic relation extraction to information retrieval." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86521.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Imani, Mahsa. "Evaluating open relation extraction over conversational texts." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/45978.

Повний текст джерела
Анотація:
In this thesis, for the first time the performance of Open IE systems on conversational data has been studied. Due to lack of test datasets in this domain, a method for creating the test dataset covering a wide range of conversational data has been proposed. Conversational text is more complex and challenging for relation extraction because of its cryptic content and ungrammatical colloquial language. As a consequence text simplification has been used as a remedy to empower Open IE tools for relation extraction. Experimental results show that text simplification helps OLLIE, a state of the art for relation extraction, find new relations, extract more accurate relations and assign higher confidence scores to correct relations and lower confidence scores to incorrect relations for most datasets. Results also show some conversational modalities such as emails and blogs are easier for relation extraction task while people reviews on products is the most difficult modality.
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Dhyani, Dushyanta Dhyani. "Boosting Supervised Neural Relation Extraction with Distant Supervision." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524095334803486.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Bai, Fan. "Structured Minimally Supervised Learning for Neural Relation Extraction." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu159666392917093.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Zhang, Shaomin. "Thematic knowledge extraction." Thesis, Nottingham Trent University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272437.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Granada, Roger Leitzke. "Evaluation of methods for taxonomic relation extraction from text." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2015. http://tede2.pucrs.br/tede2/handle/tede/7108.

Повний текст джерела
Анотація:
Submitted by Setor de Tratamento da Informa??o - BC/PUCRS (tede2@pucrs.br) on 2016-12-26T16:34:57Z No. of bitstreams: 1 TES_ROGER_LEITZKE_GRANADA_COMPLETO.pdf: 2483840 bytes, checksum: 8f81d3f0496d8fa8d3a1b013dfdf932b (MD5)
Made available in DSpace on 2016-12-26T16:34:57Z (GMT). No. of bitstreams: 1 TES_ROGER_LEITZKE_GRANADA_COMPLETO.pdf: 2483840 bytes, checksum: 8f81d3f0496d8fa8d3a1b013dfdf932b (MD5) Previous issue date: 2015-09-28
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES
Sistemas de informa??o modernos t?m mudado a ideia ?processamento de dados? para a ideia de ?processamento de conceitos?, assim, ao inv?s de processarem palavras, tais sistemas fazem o processamento de conceitos que cont?m ignificado e que compartilham contextos com outros contextos. Ontologias s?o normalmente utilizadas como uma estrutura que captura o conhecimento a cerca de uma certa ?rea, provendo conceitos e rela??es entre tais conceitos. Tradicionalmente, hierarquias de conceitos s?o constru?das manualmente por engenheiros do conhecimento ou especialistas do dom?nio. Entretanto, este tipo de constru??o sofre com diversas limita??es, tais como, cobertura e o alto custo de extens?o e manuten??o. Assim, se faz necess?ria a constru??o de tais estruturas automaticamente. O suporte (semi-)automatico no desenvolvimento de ontologias ? comumente referenciado como aprendizagem de ontologias e ? normalmente dividido em etapas, como identifica??o de conceitos, detec??o de rela??es hierarquicas e n?o hierarquicas, e extra??o de axiomas. ? razo?vel dizer que entre tais passos a fronteira est? no estabelecimento de hierarquias de conceitos, pois ? a espinha dorsal das ontologias e, por consequ?ncia, uma boa hierarquia de conceitos ? um recurso v?lido para v?rias aplica??es de ontologias. Hierarquias de conceitos s?o representadas por estruturas em ?rvore com relacionamentos de especializa??o/generaliza??o, onde conceitos nos n?veis mais baixos s?o mais espec?ficos e conceitos nos n?veis mais altos s?o mais gerais. A constru??o autom?tica de tais hierarquias ? uma tarefa complexa e desde a d?cada de 80 muitos trabalhos t?m proposto melhores formas para fazer a extra??o de rela??es entre conceitos. Estas propostas nunca foram contrastadas usando um mesmo conjunto de dados. Tal compara??o ? importante para ver se os m?todos s?o complementares ou incrementais, bem como se apresentam diferentes tend?ncias em rela??o ? precis?o e abrang?ncia, i.e., alguns podem ser bastante precisos e ter uma baixa abrang?ncia enquanto outros t?m uma abrang?ncia melhor por?m com uma baixa precis?o. Outro aspecto refere-se ? varia??o dos resultados em diferentes l?nguas. Esta tese avalia os m?todos utilizando m?tricas de hierarquias como densidade e profundidade, e m?tricas de evalia??o como precis?o e abrang?ncia. A avalia??o ? realizada utilizando o mesmo corpora, consistindo de textos paralelos e compar?veis em ingl?s e portugu?s. S?o realizadas avalia??es autom?tica e manual, sendo a sa?da de sete m?todos avaliados automaticamente e quatro manualmente. Os resultados d?o uma luz sobre a abrang?ncia dos m?todos que s?o utilizados no estado da arte de acordo com a literatura.
Modern information systems are changing the idea of ?data processing? to the idea of ?concept processing?, meaning that instead of processing words, such systems process semantic concepts which carry meaning and share contexts with other concepts. Ontology is commonly used as a structure that captures the knowledge about a certain area via providing concepts and relations between them. Traditionally, concept hierarchies have been built manually by knowledge engineers or domain experts. However, the manual construction of a concept hierarchy suffers from several limitations such as its coverage and the enormous costs of extension and maintenance. Furthermore, keeping up with a hand-crafted concept hierarchy along with the evolution of domain knowledge is an overwhelming task, being necessary to build concept hierarchies automatically. The (semi-)automatic support in ontology development is usually referred to as ontology learning. The ontology learning from texts is usually divided in steps, going from concepts identification, passing through hierarchy and non-hierarchy relations detection and, seldom, axiom extraction. It is reasonable to say that among these steps the current frontier is in the establishment of concept hierarchies, since this is the backbone of ontologies and, therefore, a good concept hierarchy is already a valuable resource for many ontology applications. A concept hierarchy is represented with a tree-structured form with specialization/generalization relations between concepts, in which lower-level concepts are more specific while higher-level are more general. The automatic construction of concept hierarchies from texts is a complex task and since the 1980 decade a large number of works have been proposing approaches to better extract relations between concepts. These different proposals have never been contrasted against each other on the same set of data and across different languages. Such comparison is important to see whether they are complementary or incremental, also we can see whether they present different tendencies towards recall and precision, i.e., some can be very precise but with very low recall and others can achieve better recall but low precision. Another aspect concerns to the variation of results for different languages. This thesis evaluates these different methods on the basis of hierarchy metrics such as density and depth, and evaluation metrics such as Recall and Precision. The evaluation is performed over the same corpora, which consist of English and Portuguese parallel and comparable texts. Both automatic and manual evaluations are presented. The output of seven methods are evaluated automatically and the output of four methods are evaluated manually. Results shed light over the comprehensive set of methods that are the state of the art according to the literature in the area.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Chauhan, Geeticka. "REflex: Flexible framework for Relation Extraction in multiple domains." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122694.

Повний текст джерела
Анотація:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 81-89).
Relation Extraction (RE) refers to the problem of extracting semantic relationships between concepts in a given sentence, and is an important component of Natural Language Understanding (NLU). It has been popularly studied in both the general purpose as well as the medical domains, and researchers have explored the effectiveness of different neural network architectures. However, systematic comparison of methods for RE is difficult because many experiments in the field are not described precisely enough to be completely reproducible and many papers fail to report ablation studies that would highlight the relative contributions of their various combined techniques. As a result, there is a lack of consensus on techniques that will generalize to novel tasks, datasets and contexts. This thesis introduces a unifying framework for RE known as REflex, applied on 3 highly used datasets (from the general, biomedical and clinical domains), with the ability to be extendable to new datasets. REflex allows exploration of the effect of different modeling techniques, pre-processing, training methodologies and evaluation metrics on a dataset of choice. This work performs such a systematic exploration on the 3 datasets and reveals interesting insights from pre-processing and training methodologies that often go unreported in the literature. Other insights from this exploration help in providing recommendations for future research in RE. REflex has experimental as well as design goals. The experimental goals are in identification of sources of variability in results for the 3 datasets and provide the field with a strong baseline model to compare against for future improvements. The design goals are in identification of best practices for relation extraction and to be a guide for approaching new datasets.
by Geeticka Chauhan.
S.M.
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Dahlbom, Norgren Nils. "Relation Classification Between the Extracted Entities of Swedish Verdicts." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-206829.

Повний текст джерела
Анотація:
This master thesis investigated how well a multiclass support vector machine approach is at classifying a fixed number of interpersonal relations between extracted entities of people from Swedish verdicts. With the help of manually tagged extracted pairs of people entities called relations, a multiclass support vector machine was used to train and test the performance of the classification. Different features and parameters were tested to optimize the method, and for the final experiment, a micro precision and recall of 91.75% were found. For macro precision and recall, the result was 73.29% and 69.29% respectively. This resulted in an macro F score of 71.23% and micro F score of 91.75%. The results showed that the method worked for a few of the relation classes, but more balanced data would have been needed to answer the research question to a full extent.
Detta examensarbete utforskade hur bra en multiklass stödvektor- maskin är på att klassificera sociala relationer mellan extraherade personentiteter ur svenska domar. Med hjälp av manuellt taggade par av personentiteter kallade relationer, har en multiklass stödvektormaskin tränats och testats på att klassifiera dessa relationer. Olika attribut och parametrar har testats för att optimera metoden, och för det slutgiltiga exprimentet har ett resultat på 91.75% för båda mikro precision och återkallning beräknats. För makro precision och återkallning har ett resultat på 73.29% respektive 69.29% beräknats. Detta resulterade i ett makro F värde på 71.23% och ett mikro F värde på 91.75%. Resultaten visade att metoden fungerade för några av relationsklasserna men mer balanserat data skulle ha behövts för att forskningsfrågan skulle kunna besvara helt.
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Wang, Wei. "Unsupervised Information Extraction From Text - Extraction and Clustering of Relations between Entities." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00998390.

Повний текст джерела
Анотація:
Unsupervised information extraction in open domain gains more and more importance recently by loosening the constraints on the strict definition of the extracted information and allowing to design more open information extraction systems. In this new domain of unsupervised information extraction, this thesis focuses on the tasks of extraction and clustering of relations between entities at a large scale. The objective of relation extraction is to discover unknown relations from texts. A relation prototype is first defined, with which candidates of relation instances are initially extracted with a minimal criterion. To guarantee the validity of the extracted relation instances, a two-step filtering procedures is applied: the first step with filtering heuristics to remove efficiently large amount of false relations and the second step with statistical models to refine the relation candidate selection. The objective of relation clustering is to organize extracted relation instances into clusters so that their relation types can be characterized by the formed clusters and a synthetic view can be offered to end-users. A multi-level clustering procedure is design, which allows to take into account the massive data and diverse linguistic phenomena at the same time. First, the basic clustering groups similar relation instances by their linguistic expressions using only simple similarity measures on a bag-of-word representation for relation instances to form high-homogeneous basic clusters. Second, the semantic clustering aims at grouping basic clusters whose relation instances share the same semantic meaning, dealing with more particularly phenomena such as synonymy or more complex paraphrase. Different similarities measures, either based on resources such as WordNet or distributional thesaurus, at the level of words, relation instances and basic clusters are analyzed. Moreover, a topic-based relation clustering is proposed to consider thematic information in relation clustering so that more precise semantic clusters can be formed. Finally, the thesis also tackles the problem of clustering evaluation in the context of unsupervised information extraction, using both internal and external measures. For the evaluations with external measures, an interactive and efficient way of building reference of relation clusters proposed. The application of this method on a newspaper corpus results in a large reference, based on which different clustering methods are evaluated.
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Savelev, Sergey U. "Extracts of salvia species : relation to potential cognitive therapy." Thesis, University of Newcastle Upon Tyne, 2003. http://hdl.handle.net/10443/608.

Повний текст джерела
Анотація:
BACKGROUND: Dementia is a neurodegenerative disease of the brain associated with cognitive and memory impairments. Despite recognition of several types of dementia, the Alzheimer type is the most studied and understood. The cholinergic theory of Alzheimer's disease led to the development of licensed drugs based on the inhibition of the enzyme acetylcholinesterase. Extracts of Salvia (sage) species have been reported to have cholinergic activities relevant to the treatment of Alzheimer's disease. AIMS: Lack of information on a chemical fingerprint of the extracts responsible for inhibition of the enzymes butyrylcholinesterase and acetylcholinesterase prompted this in vitro investigation of sage species for anti-cholinesterase activity. Cholinergic receptor binding activity, inhibition of ß-secretase, and a pro-inflammatory cytokine suppressive activity of extracts of sage species were also studied as relevant treatment targets. METHODS: The extracts were obtained by methods of supercritical fluid extraction using 1,1,1,2-tetrafluoroethane (Phytosol A) and steam distillation. Dose-dependant inhibition of human cholinesterases by the extracts and constituents was determined using the method of Ellman, while inhibition of ß-secretase via a fluorometric method. The nicotinic acetylcholine receptors binding activity was measured as an amount of [3H]-nicotine displaced from human acetylcholine receptors, whereas the muscarinic activity was assessed using the displacement of [3H]-scopolamine. Determination of interleukin 8 inhibitory activity by the extracts was performed via a quantitative sandwich enzyme immunoassay using a commercially available kit. RESULTS: Inhibition of butyrylcholinesterase by the Phytosol extracts of S. apiana, S. fruticosa and S. officinalis var. purpurea was non-competitive. In contrast, inhibition of acetylcholinesterase by S. officinalis var. purpurea oil was competitive. S. corrugata extract was the most potent inhibitor of acetylcholinesterase with an IC50 value of 0.009±0.004 mg ml", while S. officinalis var. purpurea oil was the most active inhibitor of butyrylcholinesterase with an IC50 value of 0.015±0.004 mg ml''. Time dependent increase in inhibition of butyrylcholinesterase by steam distilled oils of S. fruticosa and S. officinalis var. "purpurea" was also evident. IC50 values decreased from 0.15±0.007 and 0.14±0.007 mg ml-1 with 5 minutes to 0.035±0.016 and 0.06±0.018 mg ml-1 with 90 minutes incubation time respectively. Phytosol A extracts were more potent than steam distilled oils with respect to anti-cholinesterase activity. Minor synergy in inhibition of bovine acetylcholinesterase was apparent in 1,8-cineole/a-pinene and 1,8- cineole/caryophyllene oxide combinations, whereas a combination of camphor and 1,8- cineole was antagonistic. Oil of S. apiana displaced [3H]-nicotine from human nicotinic acetylcholine receptors and [3H]-scopolamine from muscarinic acetylcholine receptors in a dose dependent manner with IC50 values of 0.02 mg ml" and IC50 0.1 mg ml" respectively. This oil also showed a modest suppression of interleukin 8 secretions from goblet cells. None of the tested oils and constituents had anti-ß secretase activity. CONCLUSION: These findings demonstrate that the cholinergic activity of the extracts results from a complex interaction between their constituents. Thus, inhibition of acetylcholinesterase is mainly due to the activity of the main constituents with some degree of synergy, whereas anti-butyrylcholinesterase activity is down to major synergistic interactions and identification of a chemical fingerprint responsible for the overall activity is therefore challenging. A synergistic combination of extracts or their standardised fractions with multiple activities is may be a candidate for clinical trials in Alzheimer's disease.
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Conrath, Juliette. "Unsupervised extraction of semantic relations using discourse information." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30202/document.

Повний текст джерела
Анотація:
La compréhension du langage naturel repose souvent sur des raisonnements de sens commun, pour lesquels la connaissance de relations sémantiques, en particulier entre prédicats verbaux, peut être nécessaire. Cette thèse porte sur la problématique de l'utilisation d'une méthode distributionnelle pour extraire automatiquement les informations sémantiques nécessaires à ces inférences de sens commun. Des associations typiques entre des paires de prédicats et un ensemble de relations sémantiques (causales, temporelles, de similarité, d'opposition, partie/tout) sont extraites de grands corpus, par l'exploitation de la présence de connecteurs du discours signalant typiquement ces relations. Afin d'apprécier ces associations, nous proposons plusieurs mesures de signifiance inspirées de la littérature ainsi qu'une mesure novatrice conçue spécifiquement pour évaluer la force du lien entre les deux prédicats et la relation. La pertinence de ces mesures est évaluée par le calcul de leur corrélation avec des jugements humains, obtenus par l'annotation d'un échantillon de paires de verbes en contexte discursif. L'application de cette méthodologie sur des corpus de langue française et anglaise permet la construction d'une ressource disponible librement, Lecsie (Linked Events Collection for Semantic Information Extraction). Celle-ci est constituée de triplets: des paires de prédicats associés à une relation; à chaque triplet correspondent des scores de signifiance obtenus par nos mesures.Cette ressource permet de dériver des représentations vectorielles de paires de prédicats qui peuvent être utilisées comme traits lexico-sémantiques pour la construction de modèles pour des applications externes. Nous évaluons le potentiel de ces représentations pour plusieurs applications. Concernant l'analyse du discours, les tâches de la prédiction d'attachement entre unités du discours, ainsi que la prédiction des relations discursives spécifiques les reliant, sont explorées. En utilisant uniquement les traits provenant de notre ressource, nous obtenons des améliorations significatives pour les deux tâches, par rapport à plusieurs bases de référence, notamment des modèles utilisant d'autres types de représentations lexico-sémantiques. Nous proposons également de définir des ensembles optimaux de connecteurs mieux adaptés à des applications sur de grands corpus, en opérant une réduction de dimension dans l'espace des connecteurs, au lieu d'utiliser des groupes de connecteurs composés manuellement et correspondant à des relations prédéfinies. Une autre application prometteuse explorée dans cette thèse concerne les relations entre cadres sémantiques (semantic frames, e.g. FrameNet): la ressource peut être utilisée pour enrichir cette structure par des relations potentielles entre frames verbaux à partir des associations entre leurs verbes. Ces applications diverses démontrent les contributions prometteuses amenées par notre approche permettant l'extraction non supervisée de relations sémantiques
Natural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Al, Qady Mohammed Abdelrahman. "Concept relation extraction using natural language processing the CRISP technique /." [Ames, Iowa : Iowa State University], 2008.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Xavier, Clarissa Castell? "Learning non-verbal relations under open information extraction paradigm." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2014. http://tede2.pucrs.br/tede2/handle/tede/5275.

Повний текст джерела
Анотація:
Made available in DSpace on 2015-04-14T14:50:19Z (GMT). No. of bitstreams: 1 466321.pdf: 1994049 bytes, checksum: fbbeef81814a876679c25f4e015925f5 (MD5) Previous issue date: 2014-03-12
O paradigma Open Information Extraction - Open IE (Extra??o Aberta de Informa??es) de extra??o de rela??es trabalha com a identifica??o de rela??es n?o definidas previamente, buscando superar as limita??es impostas pelos m?todos tradicionais de Extra??o de Informa??es como a depend?ncia de dom?nio e a dif?cil escalabilidade. Visando estender o paradigma Open IE para que sejam extra?das rela??es n?o expressas por verbos a partir de textos em ingl?s, apresentamos CompIE, um componente que aprende rela??es expressas em compostos nominais (CNs), como (oil, extracted from, olive) - (?leo, extra?do da, oliva) - do composto nominal olive oil - ?leo de oliva, ou em pares do tipo adjetivo-substantivo (ASs), como (moon, that is, gorgeous) - (lua, que ?, linda) - do AS gorgeous moon (linda lua). A entrada do CompIE ? um arquivo texto, e sua sa?da ? um conjunto de triplas descrevendo rela??es bin?rias. Sua arquitetura ? composta por duas tarefas principais: Extrator de CNs e ASs (1) e Interpretador de CNs e ASs (2). A primeira tarefa gera uma lista de CNs e ASs a partir do corpus de entrada. A segunda tarefa realiza a interpreta??o dos CNs e ASs gerando as triplas que descrevem as rela??es extra?das do corpus. Para estudar a viabilidade da solu??o apresentada, realizamos uma avalia??o baseada em hip?teses. Um prot?tipo foi constru?do com o intuito de validar cada uma das hip?teses. Os resultados obtidos mostram que nossa solu??o alcan?a 89% de Precis?o e demonstram que o CompIE atinge sua meta de estender o paradigma Open IE extraindo rela??es expressas dentro dos CNs e ASs.
The Open Information Extraction (Open IE) is a relation extraction paradigm in which the target relationships cannot be specified in advance, and it aims to overcome the limitations imposed by traditional IE methods, such as domain-dependence and scalability. In order to extend Open IE to extract relationships that are not expressed by verbs from texts in English, we introduce CompIE, a component that learns relations expressed in noun compounds (NCs), such as (oil, extracted from, olive) from olive oil, or in adjectivenoun pairs (ANs), such as (moon, that is, gorgeous) from gorgeous moon. CompIE input is a text file, and the output is a set of triples describing binary relationships. The architecture comprises two main tasks: NCs and ANs Extraction (1) and NCs and ANs Interpretation (2). The first task generates a list of NCs and ANs from the input corpus. The second task performs the interpretation of NCs and ANs and generates the tuples that describe the relations extracted from the corpus. In order to study CompIE s feasibility, we perform an evaluation based on hypotheses. In order to implement the strategies to validate each hypothesis we have built a prototype. The results show that our solution achieves 89% Precision and demonstrate that CompIE reaches its goal of extending Open IE paradigm extracting relationships within NCs and ANs.
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Xavier, Clarissa Castellã. "Learning non-verbal relations under open information extraction paradigm." Pontifícia Universidade Católica do Rio Grande do Sul, 2014. http://hdl.handle.net/10923/7073.

Повний текст джерела
Анотація:
Made available in DSpace on 2015-03-17T02:01:01Z (GMT). No. of bitstreams: 1 000466321-Texto+Completo-0.pdf: 1994049 bytes, checksum: fbbeef81814a876679c25f4e015925f5 (MD5) Previous issue date: 2014
The Open Information Extraction (Open IE) is a relation extraction paradigm in which the target relationships cannot be specified in advance, and it aims to overcome the limitations imposed by traditional IE methods, such as domain-dependence and scalability. In order to extend Open IE to extract relationships that are not expressed by verbs from texts in English, we introduce CompIE, a component that learns relations expressed in noun compounds (NCs), such as (oil, extracted from, olive) from olive oil, or in adjectivenoun pairs (ANs), such as (moon, that is, gorgeous) from gorgeous moon. CompIE input is a text file, and the output is a set of triples describing binary relationships. The architecture comprises two main tasks: NCs and ANs Extraction (1) and NCs and ANs Interpretation (2). The first task generates a list of NCs and ANs from the input corpus. The second task performs the interpretation of NCs and ANs and generates the tuples that describe the relations extracted from the corpus. In order to study CompIE’s feasibility, we perform an evaluation based on hypotheses. In order to implement the strategies to validate each hypothesis we have built a prototype. The results show that our solution achieves 89% Precision and demonstrate that CompIE reaches its goal of extending Open IE paradigm extracting relationships within NCs and ANs.
O paradigma Open Information Extraction - Open IE (Extração Aberta de Informações) de extração de relações trabalha com a identificação de relações não definidas previamente, buscando superar as limitações impostas pelos métodos tradicionais de Extração de Informações como a dependência de domínio e a difícil escalabilidade. Visando estender o paradigma Open IE para que sejam extraídas relações não expressas por verbos a partir de textos em inglês, apresentamos CompIE, um componente que aprende relações expressas em compostos nominais (CNs), como (oil, extracted from, olive) - (óleo, extraído da, oliva) - do composto nominal olive oil - óleo de oliva, ou em pares do tipo adjetivo-substantivo (ASs), como (moon, that is, gorgeous) - (lua, que é, linda) - do AS gorgeous moon (linda lua). A entrada do CompIE é um arquivo texto, e sua saída é um conjunto de triplas descrevendo relações binárias. Sua arquitetura é composta por duas tarefas principais: Extrator de CNs e ASs (1) e Interpretador de CNs e ASs (2). A primeira tarefa gera uma lista de CNs e ASs a partir do corpus de entrada. A segunda tarefa realiza a interpretação dos CNs e ASs gerando as triplas que descrevem as relações extraídas do corpus. Para estudar a viabilidade da solução apresentada, realizamos uma avaliação baseada em hipóteses. Um protótipo foi construído com o intuito de validar cada uma das hipóteses. Os resultados obtidos mostram que nossa solução alcança 89% de Precisão e demonstram que o CompIE atinge sua meta de estender o paradigma Open IE extraindo relações expressas dentro dos CNs e ASs.
Стилі APA, Harvard, Vancouver, ISO та ін.
23

ASSIS, PEDRO HENRIQUE RIBEIRO DE. "DISTANT SUPERVISION FOR RELATION EXTRACTION USING ONTOLOGY CLASS HIERARCHY-BASED FEATURES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24296@1.

Повний текст джерела
Анотація:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Extração de relacionamentos é uma etapa chave para o problema de identificação de uma estrutura em um texto em formato de linguagem natural. Em geral, estruturas são compostas por entidades e relacionamentos entre elas. As propostas de solução com maior sucesso aplicam aprendizado de máquina supervisionado a corpus anotados à mão para a criação de classificadores de alta precisão. Embora alcancem boa robustez, corpus criados à mão não são escaláveis por serem uma alternativa de grande custo. Neste trabalho, nós aplicamos um paradigma alternativo para a criação de um número considerável de exemplos de instâncias para classificação. Tal método é chamado de supervisão à distância. Em conjunto com essa alternativa, usamos ontologias da Web semântica para propor e usar novas características para treinar classificadores. Elas são baseadas na estrutura e semântica descrita por ontologias onde recursos da Web semântica são definidos. O uso de tais características tiveram grande impacto na precisão e recall dos nossos classificadores finais. Neste trabalho, aplicamos nossa teoria em um corpus extraído da Wikipedia. Alcançamos uma alta precisão e recall para um número considerável de relacionamentos.
Relation extraction is a key step for the problem of rendering a structure from natural language text format. In general, structures are composed by entities and relationships among them. The most successful approaches on relation extraction apply supervised machine learning on hand-labeled corpus for creating highly accurate classifiers. Although good robustness is achieved, hand-labeled corpus are not scalable due to the expensive cost of its creation. In this work we apply an alternative paradigm for creating a considerable number of examples of instances for classification. Such method is called distant supervision. Along with this alternative approach we adopt Semantic Web ontologies to propose and use new features for training classifiers. Those features are based on the structure and semantics described by ontologies where Semantic Web resources are defined. The use of such features has a great impact on the precision and recall of our final classifiers. In this work, we apply our theory on corpus extracted from Wikipedia. We achieve a high precision and recall for a considerable number of relations.
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Achouri, Abdelghani. "Extraction de relations d'associations maximales dans les textes : représentation graphique." Thèse, Université du Québec à Trois-Rivières, 2012. http://depot-e.uqtr.ca/6132/1/030374207.pdf.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Califf, Mary Elaine. "Relational learning techniques for natural language information extraction /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Lord, Dale. "Relational Database for Visual Data Management." International Foundation for Telemetering, 2005. http://hdl.handle.net/10150/604893.

Повний текст джерела
Анотація:
ITC/USA 2005 Conference Proceedings / The Forty-First Annual International Telemetering Conference and Technical Exhibition / October 24-27, 2005 / Riviera Hotel & Convention Center, Las Vegas, Nevada
Often it is necessary to retrieve segments of video with certain characteristics, or features, from a large archive of footage. This paper discusses how image processing algorithms can be used to automatically create a relational database, which indexes the video archive. This feature extraction can be performed either upon acquisition or in post processing. The database can then be queried to quickly locate and recover video segments with certain specified key features
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Karoui, Lobna. "Extraction contextuelle d'ontologie par fouille de données." Paris 11, 2008. http://www.theses.fr/2008PA112220.

Повний текст джерела
Анотація:
L’objectif de cette thèse est d’automatiser au maximum le processus de construction d’une ontologie à partir de pages web, en étudiant notamment l’impact que peut avoir la fouille de données dans une telle tâche. Pour construire l’ontologie, nous avons exploité la structure HTML du document étudié afin de pouvoir bien définir le contexte à mettre en œuvre. Ce dernier est structuré sous la forme d’une hiérarchie de contextes. Puis, nous avons défini un algorithme de clustering hiérarchique dédié à l’extraction de concepts ontologiques intitulé ‘ECO’ ; il est basé sur l’algorithme Kmeans et guidé par notre structure contextuelle. Cet algorithme génère une hiérarchie de classes de termes (concepts). En instaurant un mécanisme incrémental et en divisant récursivement les classes, l’algorithme ECO raffine le contexte de chaque classe de mots et améliore la qualité conceptuelle des clusters finaux et par conséquence des concepts extraites. L’interprétation sémantique des classes de termes par les experts ou les concepteurs de l’ontologie est une tâche difficile. Afin de la faciliter, nous avons proposé une méthodologie d’évaluation des concepts basée sur la richesse des documents web, l’interprétation sémantique, l’élicitation des connaissances et le concept de « contextualisation progressive ». Notre méthodologie définit trois critères révélateurs : « le degré de crédibilité », « le degré de cohésion » et le « degré d’éligibilité ». Elle a été appliquée pour évaluer les classes de termes (relations internes) mais pas les relations entre les classes (pas la hiérarchie de concepts). Notre objectif, également, était d’extraire des relations de différents types à partir de différentes analyses des textes et des relations existantes dans la hiérarchie de concepts. Pour cela, notre approche combine une méthode centrée autour du verbe, des analyses lexicales, syntaxiques et statistiques. Nous utilisons ces relations pour évaluer et enrichir la hiérarchie de concepts.
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Ryan, Russell J. (Russell John Wyatt). "Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66456.

Повний текст джерела
Анотація:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 67-69).
We address the problem of weakly-supervised relation extraction in hospital discharge summaries. Sentences with pre-identified concept types (for example: medication, test, problem, symptom) are labeled with the relationship between the concepts. We present a novel technique for weakly-supervised bootstrapping of a classifier for this task: Groundtruth Budgeting. In the case of highly-overlapping, self-similar datasets as is the case with the 2010 i2b2/VA challenge corpus, the performance of classifiers on the minority classes is often poor. To address this we set aside a random portion of the groundtruth at the beginning of bootstrapping which will be gradually added as the classifier is bootstrapped. The classifier chooses groundtruth samples to be added by measuring the confidence of its predictions on them and choosing samples for which it has the least confident predictions. By adding samples in this fashion, the classifier is able to increase its coverage of the decision space while not adding too many majority-class examples. We evaluate this approach on the 2010 i2b2/VA challenge corpus containing of 477 patient discharge summaries and show that with a training corpus of 349 discharge summaries, budgeting 10% of the corpus achieves equivalent results to a bootstrapping classifier starting with the entire corpus. We compare our results to those of other papers published in the proceedings of the 2010 Fourth i2b2/VA Shared-Task and Workshop.
by Russell J. Ryan.
M.Eng.
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Marshman, Elizabeth. "The cause relation in biopharmaceutical corpora: English and French patterns for knowledge extraction." Thesis, University of Ottawa (Canada), 2002. http://hdl.handle.net/10393/6385.

Повний текст джерела
Анотація:
One of the most important aspects of a terminologist's work is extracting conceptual information about terms from texts. Because this task is so time-consuming, researchers are trying to develop tools which will extract conceptual information semi-automatically. Many of these tools are based on the use of linguistic indicators called knowledge patterns. This thesis aims to identify some knowledge patterns in English and French which indicate the conceptual relation of cause and effect. This relation, though not as widely studied as those of generic to specific or part to whole, is critical in many subject fields including medicine and pharmaceuticals. For this reason, our research focuses on biopharmaceutical texts. Our methodology involved building representative corpora in English and French, and then identifying possible knowledge patterns. The precision of the identified patterns was calculated in order to predict their possible effectiveness for semi-automatic knowledge extraction. We discuss some of the issues observed in the process of identifying these patterns, and those which might affect the subsequent implementation of the patterns for semi-automatic knowledge extraction. A brief interlinguistic comparison of the English and French patterns identified is also included. Our research shows that the subject field of biopharmaceuticals contains many potentially productive knowledge patterns for the cause relation. However, there are also many issues which must be taken into account when identifying patterns and developing knowledge extraction tools.
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Xavier, Clarissa [Verfasser]. "Learning Non-Verbal Relations Under Open Information Extraction Paradigm / Clarissa Xavier." Munich : GRIN Verlag, 2015. http://d-nb.info/1097578720/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Bourgeois, Thomas C. "English Relative Clause Extraction: A Syntactic and Semantic Approach." University of Arizona Linguistics Circle, 1989. http://hdl.handle.net/10150/226574.

Повний текст джерела
Анотація:
Within this paper we analyze the formation of and extraction from a specific type of noun phrase, namely that consisting of the definite article followed by a common noun modified by a relative clause, where the common noun can be the subject or the object of the modifying clause. Representative examples of this construction appear in Figure 1: (1) ( i ) . Sal knows the man Sid likes. (ii) . Sal knows the man who bought the carrot. The framework we assume here makes use of a system of functional syntactical and (corresponding) semantical types assigned to each item in the string. These types act upon each other in functor-argument fashion according to a small set of combinatory rules for building syntactic and semantic structure, adopted here without proof but not without comment. To emphasize the direct correspondence of the syntax/semantics relationship, we describe combinatory rules in terms of how they apply on both levels. For maximum clarity, data appear in the form of triplets consisting of the phonological unit (the word), the syntactic category, and the semantic representation. We present an example below: (2) 'bought; (N P\S)/N P; λoλs.B(o),(s)
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Vempala, Alakananda. "Extracting Temporally-Anchored Spatial Knowledge." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505146/.

Повний текст джерела
Анотація:
In my dissertation, I elaborate on the work that I have done to extract temporally-anchored spatial knowledge from text, including both intra- and inter-sentential knowledge. I also detail multiple approaches to infer spatial timeline of a person from biographies and social media. I present and analyze two strategies to annotate information regarding whether a given entity is or is not located at some location, and for how long with respect to an event. Specifically, I leverage semantic roles or syntactic dependencies to generate potential spatial knowledge and then crowdsource annotations to validate the potential knowledge. The resulting annotations indicate how long entities are or are not located somewhere, and temporally anchor this spatial information. I present an in-depth corpus analysis and experiments comparing the spatial knowledge generated by manipulating roles or dependencies. In my work, I also explore research methodologies that go beyond single sentences and extract spatio-temporal information from text. Spatial timelines refer to a chronological order of locations where a target person is or is not located. I present corpus and experiments to extract spatial timelines from Wikipedia biographies. I present my work on determining locations and the order in which they are actually visited by a person from their travel experiences. Specifically, I extract spatio-temporal graphs that capture the order (edges) of locations (nodes) visited by a person. Further, I detail my experiments that leverage both text and images to extract spatial timeline of a person from Twitter.
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Morsi, Youcef Ihab. "Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC019.

Повний текст джерела
Анотація:
Cette recherche porte sur le développement d’un outil de traitement automatique de la langue arabe standard moderne, au niveau morphologique et sémantique, avec comme objectif final l’extraction d’information dans le domaine de l’innovation technologique en entreprise. En ce qui concerne l’analyse morphologique, notre outil comprend plusieurs traitements successifs qui permettent d’étiqueter et de désambiguïser les occurrences dans les textes : une couche morphologique (Gibran 1.0), qui s’appuie sur les schèmes arabes comme traits distinctifs ; une couche contextuelle (Gibran 2.0), qui fait appel à des règles contextuelles ; et une troisième couche (Gibran 3.0) qui fait appel à un modèle d’apprentissage automatique. Notre méthodologie est évaluée sur le corpus annoté Arabic-PADT UD treebank. Les évaluations obtiennent une F-mesure de 0,92 et 0,90 pour les analyses morphologiques. Ces expérimentations montrent, entre autres, la possibilité d’améliorer une telle ressource par les analyses linguistiques. Cette approche nous a permis de développer un prototype d’extraction d’information autour de l’innovation technologique pour la langue arabe. Il s’appuie sur l’analyse morphologique et des patrons syntaxico-sémantiques. Cette thèse s’inscrit dans un parcours docteur-entrepreneur
This thesis focuses on the development of a tool for the automatic processing of Modern Standard Arabic, at the morphological and semantic levels, with the final objective of Information Extraction on technological innovations. As far as the morphological analysis is concerned, our tool includes several successive processing stages that allow to label and disambiguate occurrences in texts: a morphological layer (Gibran 1.0), which relies on Arabic pattern as distinctive features; a contextual layer (Gibran 2.0), which uses contextual rules; and a third layer (Gibran 3.0), which uses a machine learning model. Our methodology is evaluated using the annotated corpus Arabic-PADT UD treebank. The evaluations obtain an F-measure of 0.92 and 0.90 for the morphological analyses. These experiments demontrate the possibility of improving such a corpus through linguistic analyses. This approach allowed us to develop a prototype of information extraction on technological innovations for the Arabic language. It is based on the morphological analysis and syntaxico-semantic patterns. This thesis is part of a PhD-entrepreneur course
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Darud, Véronique. "Relations synthèse-structure-propriétés de polyuréthannes linéaires susceptibles d’être utilises comme liant de particules magnétiques." Lyon, INSA, 1988. http://www.theses.fr/1988ISAL0073.

Повний текст джерела
Анотація:
Cette étude vise à mettre au point un liant polyuréthanne thermoplastique, de type polyéther, utilisé à la fabrication de bande magnétique. Son rôle est de permettre le couchage des particules sur un support polymère par voie solvant et d'assurer à l ' ensemble, cohésion et résistance à l'abrasion. L’utilisation simultanée, de deux sortes d’allongeurs de chaîne, en proportion variable, le di-β. β’ -hydroxyéthylétherhydroquinone (HQEE) et le néopentylglycol (NPG), a permis de réaliser divers compromis « structure – propriétés », susceptibles de contribuer à résoudre les problèmes posés par l'utilisation de polyuréthannes ·commerciaux, préalablement caractérisés. Un traitement thermique adéquat, ou une réaction chimique avec un composé fluoré permet de modifier la structure et donc d'ajuster certaines propriétés de nos produits de synthèse (valeur de Tg, solubilité,. . . ). En amont de ces travaux, la mise' en évidence de l’influence des conditions de polycondensation sur les propriétés de ces matériaux microhétéro-phasés, nous a amenés à définir un protocole de synthèse précis , afin d'optimiser les propriétés finales du produit. Les formulations envisagées ont été également le support d ' études approfondies de polyuréthannes, intégrant séparément HQEE et NPG ; respectivement hétérophasés et monophasés, ils correspondent à deux conceptions très différentes de ces matériaux.
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Ruiz, Fabo Pablo. "Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE053/document.

Повний текст джерела
Анотація:
La recherche en Sciences humaines et sociales repose souvent sur de grandes masses de données textuelles, qu'il serait impossible de lire en détail. Le Traitement automatique des langues (TAL) peut identifier des concepts et des acteurs importants mentionnés dans un corpus, ainsi que les relations entre eux. Ces informations peuvent fournir un aperçu du corpus qui peut être utile pour les experts d'un domaine et les aider à identifier les zones du corpus pertinentes pour leurs questions de recherche. Pour annoter automatiquement des corpus d'intérêt en Humanités numériques, les technologies TAL que nous avons appliquées sont, en premier lieu, le liage d'entités (plus connu sous le nom de Entity Linking), pour identifier les acteurs et concepts du corpus ; deuxièmement, les relations entre les acteurs et les concepts ont été déterminées sur la base d'une chaîne de traitements TAL, qui effectue un étiquetage des rôles sémantiques et des dépendances syntaxiques, entre autres analyses linguistiques. La partie I de la thèse décrit l'état de l'art sur ces technologies, en soulignant en même temps leur emploi en Humanités numériques. Des outils TAL génériques ont été utilisés. Comme l'efficacité des méthodes de TAL dépend du corpus d'application, des développements ont été effectués, décrits dans la partie II, afin de mieux adapter les méthodes d'analyse aux corpus dans nos études de cas. La partie II montre également une évaluation intrinsèque de la technologie développée, avec des résultats satisfaisants. Les technologies ont été appliquées à trois corpus très différents, comme décrit dans la partie III. Tout d'abord, les manuscrits de Jeremy Bentham, un corpus de philosophie politique des 18e et 19e siècles. Deuxièmement, le corpus PoliInformatics, qui contient des matériaux hétérogènes sur la crise financière américaine de 2007--2008. Enfin, le Bulletin des Négociations de la Terre (ENB dans son acronyme anglais), qui couvre des sommets internationaux sur la politique climatique depuis 1995, où des traités comme le Protocole de Kyoto ou les Accords de Paris ont été négociés. Pour chaque corpus, des interfaces de navigation ont été développées. Ces interfaces utilisateur combinent les réseaux, la recherche en texte intégral et la recherche structurée basée sur des annotations TAL. À titre d'exemple, dans l'interface pour le corpus ENB, qui couvre des négociations en politique climatique, des recherches peuvent être effectuées sur la base d'informations relationnelles identifiées dans le corpus: les acteurs de la négociation ayant discuté un sujet concret en exprimant leur soutien ou leur opposition peuvent être recherchés. Le type de la relation entre acteurs et concepts est exploité, au-delà de la simple co-occurrence entre les termes du corpus. Les interfaces ont été évaluées qualitativement avec des experts de domaine, afin d'estimer leur utilité potentielle pour la recherche dans leurs domaines respectifs. Tout d'abord, il a été vérifié si les représentations générées pour le contenu des corpus sont en accord avec les connaissances des experts du domaine, pour déceler des erreurs d'annotation. Ensuite, nous avons essayé de déterminer si les experts pourraient être en mesure d'avoir une meilleure compréhension du corpus grâce à avoir utilisé les applications, par exemple, s'ils ont trouvé de l'évidence nouvelle pour leurs questions de recherche existantes, ou s'ils ont trouvé de nouvelles questions de recherche. On a pu mettre au jour des exemples où un gain de compréhension sur le corpus est observé grâce à l'interface dédiée au Bulletin des Négociations de la Terre, ce qui constitue une bonne validation du travail effectué dans la thèse. En conclusion, les points forts et faiblesses des applications développées ont été soulignés, en indiquant de possibles pistes d'amélioration en tant que travail futur
Social sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH.Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th-19th century corpus in political philosophy. Second, the PoliInformatics corpus, with heterogeneous materials about the American financial crisis of 2007-2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated.For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: the negotiation actors having discussed a given issue using verbs indicating support or opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms.The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts' domains. First, we payed attention to whether the corpus representations we created correspond to experts' knowledge of the corpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications' strengths and weaknesses were pointed out, outlining possible improvements as future work
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Ratkovic, Zorana. "Predicative Analysis for Information Extraction : application to the biology domain." Thesis, Paris 3, 2014. http://www.theses.fr/2014PA030110.

Повний текст джерела
Анотація:
L’abondance de textes dans le domaine biomédical nécessite le recours à des méthodes de traitement automatique pour améliorer la recherche d’informations précises. L’extraction d’information (EI) vise précisément à extraire de l’information pertinente à partir de données non-structurées. Une grande partie des méthodes dans ce domaine se concentre sur les approches d’apprentissage automatique, en ayant recours à des traitements linguistiques profonds. L’analyse syntaxique joue notamment un rôle important, en fournissant une analyse précise des relations entre les éléments de la phrase.Cette thèse étudie le rôle de l’analyse syntaxique en dépendances dans le cadre d’applications d’EI dans le domaine biomédical. Elle comprend l’évaluation de différents analyseurs ainsi qu’une analyse détaillée des erreurs. Une fois l’analyseur le plus adapté sélectionné, les différentes étapes de traitement linguistique pour atteindre une EI de haute qualité, fondée sur la syntaxe, sont abordés : ces traitements incluent des étapes de pré-traitement (segmentation en mots) et des traitements linguistiques de plus haut niveau (lié à la sémantique et à l’analyse de la coréférence). Cette thèse explore également la manière dont les différents niveaux de traitement linguistique peuvent être représentés puis exploités par l’algorithme d’apprentissage. Enfin, partant du constat que le domaine biomédical est en fait extrêmement diversifié, cette thèse explore l’adaptation des techniques à différents sous-domaines, en utilisant des connaissances et des ressources déjà existantes. Les méthodes et les approches décrites sont explorées en utilisant deux corpus biomédicaux différents, montrant comment les résultats d’IE sont utilisés dans des tâches concrètes
The abundance of biomedical information expressed in natural language has resulted in the need for methods to process this information automatically. In the field of Natural Language Processing (NLP), Information Extraction (IE) focuses on the extraction of relevant information from unstructured data in natural language. A great deal of IE methods today focus on Machine Learning (ML) approaches that rely on deep linguistic processing in order to capture the complex information contained in biomedical texts. In particular, syntactic analysis and parsing have played an important role in IE, by helping capture how words in a sentence are related. This thesis examines how dependency parsing can be used to facilitate IE. It focuses on a task-based approach to dependency parsing evaluation and parser selection, including a detailed error analysis. In order to achieve a high quality of syntax-based IE, different stages of linguistic processing are addressed, including both pre-processing steps (such as tokenization) and the use of complementary linguistic processing (such as the use of semantics and coreference analysis). This thesis also explores how the different levels of linguistics processing can be represented for use within an ML-based IE algorithm, and how the interface between these two is of great importance. Finally, biomedical data is very heterogeneous, encompassing different subdomains and genres. This thesis explores how subdomain-adaptationcan be achieved by using already existing subdomain knowledge and resources. The methods and approaches described are explored using two different biomedical corpora, demonstrating how the IE results are used in real-life tasks
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Gonzàlez, Pellicer Edgar. "Unsupervised learning of relation detection patterns." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/83906.

Повний текст джерела
Анотація:
L'extracció d'informació és l'àrea del processament de llenguatge natural l'objectiu de la qual és l'obtenir dades estructurades a partir de la informació rellevant continguda en fragments textuals. L'extracció d'informació requereix una quantitat considerable de coneixement lingüístic. La especificitat d'aquest coneixement suposa un inconvenient de cara a la portabilitat dels sistemes, ja que un canvi d'idioma, domini o estil té un cost en termes d'esforç humà. Durant dècades, s'han aplicat tècniques d'aprenentatge automàtic per tal de superar aquest coll d'ampolla de portabilitat, reduint progressivament la supervisió humana involucrada. Tanmateix, a mida que augmenta la disponibilitat de grans col·leccions de documents, esdevenen necessàries aproximacions completament nosupervisades per tal d'explotar el coneixement que hi ha en elles. La proposta d'aquesta tesi és la d'incorporar tècniques de clustering a l'adquisició de patrons per a extracció d'informació, per tal de reduir encara més els elements de supervisió involucrats en el procés En particular, el treball se centra en el problema de la detecció de relacions. L'assoliment d'aquest objectiu final ha requerit, en primer lloc, el considerar les diferents estratègies en què aquesta combinació es podia dur a terme; en segon lloc, el desenvolupar o adaptar algorismes de clustering adequats a les nostres necessitats; i en tercer lloc, el disseny de procediments d'adquisició de patrons que incorporessin la informació de clustering. Al final d'aquesta tesi, havíem estat capaços de desenvolupar i implementar una aproximació per a l'aprenentatge de patrons per a detecció de relacions que, utilitzant tècniques de clustering i un mínim de supervisió humana, és competitiu i fins i tot supera altres aproximacions comparables en l'estat de l'art.
Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant information contained in textual fragments. Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort. Machine learning techniques have been applied for decades so as to overcome this portability bottleneck¿progressively reducing the amount of involved human supervision. However, as the availability of large document collections increases, completely unsupervised approaches become necessary in order to mine the knowledge contained in them. The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third, devising pattern learning procedures which incorporated clustering information. By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable approaches in the state of the art.
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Chu, Timothy Sui-Tim. "Genealogy Extraction and Tree Generation from Free Form Text." DigitalCommons@CalPoly, 2017. https://digitalcommons.calpoly.edu/theses/1796.

Повний текст джерела
Анотація:
Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create an extensive tree. This thesis proposes a novel three part system which can automatically interpret free form text to extract relationships and produce a family tree compliant with GED- COM formatting. The first subsystem builds an extendable database of genealogical records that are systematically extracted from free form text. This corpus provides the tagged data for the second subsystem, which trains a Naı̈ve Bayes classifier to predict relationships from free form text by examining the types of relationships for pairs of entities and their associated feature vectors. The last subsystem accumulates extracted relationships into family trees. When a multiclass Naı̈ve Bayes classifier is used, the proposed system achieves an accuracy of 54%. When binary Naı̈ve Bayes classifiers are used, the proposed system achieves accuracies of 69% for the child to parent relationship classifier, 75% for the spousal relationship classifier, and 73% for the sibling relationship classifier.
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Tomczak, Jakub. "Algorithms for knowledge discovery using relation identification methods." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2563.

Повний текст джерела
Анотація:
In this work a coherent survey of problems connected with relational knowledge representation and methods for achieving relational knowledge representation were presented. Proposed approach was shown on three applications: economic case, biomedical case and benchmark dataset. All crucial definitions were formulated and three main methods for relation identification problem were shown. Moreover, for specific relational models and observations’ types different identification methods were presented.
Double Diploma Programme, polish supervisor: prof. Jerzy Świątek, Wrocław University of Technology
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Hakenberg, Jörg. "Mining relations from the biomedical literature." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2010. http://dx.doi.org/10.18452/16073.

Повний текст джерела
Анотація:
Textmining beschäftigt sich mit der automatisierten Annotierung von Texten und der Extraktion einzelner Informationen aus Texten, die dann für die Weiterverarbeitung zur Verfügung stehen. Texte können dabei kurze Zusammenfassungen oder komplette Artikel sein, zum Beispiel Webseiten und wissenschaftliche Artikel, umfassen aber auch textuelle Einträge in sonst strukturierten Datenbanken. Diese Dissertationsschrift bespricht zwei wesentliche Themen des biomedizinischen Textmining: die Extraktion von Zusammenhängen zwischen biologischen Entitäten ---das Hauptaugenmerk liegt dabei auf der Erkennung von Protein-Protein-Interaktionen---, und einen notwendigen Vorverarbeitungsschritt, die Erkennung von Proteinnamen. Diese Schrift beschreibt Ziele, Herausforderungen, sowie typische Herangehensweisen für alle wesentlichen Komponenten des biomedizinischen Textmining. Wir stellen eigene Methoden zur Erkennung von Proteinnamen sowie der Extraktion von Protein-Protein-Interaktionen vor. Zwei eigene Verfahren zur Erkennung von Proteinnamen werden besprochen, eines basierend auf einem Klassifikationsproblem, das andere basierend auf Suche in Wörterbüchern. Für die Extraktion von Interaktionen entwickeln wir eine Methode zur automatischen Annotierung großer Mengen von Text im Bezug auf Relationen; diese Annotationen werden dann zur Mustererkennung verwendet, um anschließend die gefundenen Muster auf neuen Text anwenden zu können. Um Muster zu erkennen, berechnen wir Ähnlichkeiten zwischen zuvor gefundenen Sätzen, die denselben Typ von Relation/Interaktion beschreiben. Diese Ähnlichkeiten speichern wir als sogenannte `consensus patterns''. Wir entwickeln eine Alignmentstrategie, die mehrschichtige Annotationen pro Position im Muster erlaubt. In Versuchen auf bekannten Benchmarks zeigen wir empirisch, dass unser vollautomatisches Verfahren Resultate erzielt, die vergleichbar sind mit existierenden Methoden, welche umfangreiche Eingriffe von Experten voraussetzen.
Text mining deals with the automated annotation of texts and the extraction of facts from textual data for subsequent analysis. Such texts range from short articles and abstracts to large documents, for instance web pages and scientific articles, but also include textual descriptions in otherwise structured databases. This thesis focuses on two key problems in biomedical text mining: relationship extraction from biomedical abstracts ---in particular, protein--protein interactions---, and a pre-requisite step, named entity recognition ---again focusing on proteins. This thesis presents goals, challenges, and typical approaches for each of the main building blocks in biomedical text mining. We present out own approaches for named entity recognition of proteins and relationship extraction of protein-protein interactions. For the first, we describe two methods, one set up as a classification task, the other based on dictionary-matching. For relationship extraction, we develop a methodology to automatically annotate large amounts of unlabeled data for relations, and make use of such annotations in a pattern matching strategy. This strategy first extracts similarities between sentences that describe relations, storing them as consensus patterns. We develop a sentence alignment approach that introduces multi-layer alignment, making use of multiple annotations per word. For the task of extracting protein-protein interactions, empirical results show that our methodology performs comparable to existing approaches that require a large amount of human intervention, either for annotation of data or creation of models.
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Pareti, Silvia. "Attribution : a computational approach." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14170.

Повний текст джерела
Анотація:
Our society is overwhelmed with an ever growing amount of information. Effective management of this information requires novel ways to filter and select the most relevant pieces of information. Some of this information can be associated with the source or sources expressing it. Sources and their relation to what they express affect information and whether we perceive it as relevant, biased or truthful. In news texts in particular, it is common practice to report third-party statements and opinions. Recognizing relations of attribution is therefore a necessary step toward detecting statements and opinions of specific sources and selecting and evaluating information on the basis of its source. The automatic identification of Attribution Relations has applications in numerous research areas. Quotation and opinion extraction, discourse and factuality have all partly addressed the annotation and identification of Attribution Relations. However, disjoint efforts have provided a partial and partly inaccurate picture of attribution. Moreover, these research efforts have generated small or incomplete resources, thus limiting the applicability of machine learning approaches. Existing approaches to extract Attribution Relations have focused on rule-based models, which are limited both in coverage and precision. This thesis presents a computational approach to attribution that recasts attribution extraction as the identification of the attributed text, its source and the lexical cue linking them in a relation. Drawing on preliminary data-driven investigation, I present a comprehensive lexicalised approach to attribution and further refine and test a previously defined annotation scheme. The scheme has been used to create a corpus annotated with Attribution Relations, with the goal of contributing a large and complete resource than can lay the foundations for future attribution studies. Based on this resource, I developed a system for the automatic extraction of attribution relations that surpasses traditional syntactic pattern-based approaches. The system is a pipeline of classification and sequence labelling models that identify and link each of the components of an attribution relation. The results show concrete opportunities for attribution-based applications.
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Alatrista-Salas, Hugo. "Extraction de relations spatio-temporelles à partir des données environnementales et de la santé." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00997539.

Повний текст джерела
Анотація:
Face à l'explosion des nouvelles technologies (mobiles, capteurs, etc.), de grandes quantités de données localisées dans l'espace et dans le temps sont désormais disponibles. Les bases de données associées peuvent être qualifiées de bases de données spatio-temporelles car chaque donnée est décrite par une information spatiale (e.g. une ville, un quartier, une rivière, etc.) et temporelle (p. ex. la date d'un événement). Cette masse de données souvent hétérogènes et complexes génère ainsi de nouveaux besoins auxquels les méthodes d'extraction de connaissances doivent pouvoir répondre (e.g. suivre des phénomènes dans le temps et l'espace). De nombreux phénomènes avec des dynamiques complexes sont ainsi associés à des données spatio-temporelles. Par exemple, la dynamique d'une maladie infectieuse peut être décrite par les interactions entre les humains et le vecteur de transmission associé ainsi que par certains mécanismes spatio-temporels qui participent à son évolution. La modification de l'un des composants de ce système peut déclencher des variations dans les interactions entre les composants et finalement, faire évoluer le comportement global du système.Pour faire face à ces nouveaux enjeux, de nouveaux processus et méthodes doivent être développés afin d'exploiter au mieux l'ensemble des données disponibles. Tel est l'objectif de la fouille de données spatio-temporelles qui correspond à l'ensemble de techniques et méthodes qui permettent d'obtenir des connaissances utiles à partir de gros volumes de données spatio-temporelles. Cette thèse s'inscrit dans le cadre général de la fouille de données spatio-temporelles et l'extraction de motifs séquentiels. Plus précisément, deux méthodes génériques d'extraction de motifs sont proposées. La première permet d'extraire des motifs séquentiels incluant des caractéristiques spatiales. Dans la deuxième, nous proposons un nouveau type de motifs appelé "motifs spatio-séquentiels". Ce type de motifs permet d'étudier l'évolution d'un ensemble d'événements décrivant une zone et son entourage proche. Ces deux approches ont été testées sur deux jeux de données associées à des phénomènes spatio-temporels : la pollution des rivières en France et le suivi épidémiologique de la dengue en Nouvelle Calédonie. Par ailleurs, deux mesures de qualité ainsi qu'un prototype de visualisation de motifs sont été également proposés pour accompagner les experts dans la sélection des motifs d'intérêts.
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Ferraro, Gabriela. "Towards deep content extraction from specialized discourse : the case of verbal relations in patent claims." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/84174.

Повний текст джерела
Анотація:
This thesis addresses the problem of the development of Natural Language Processing techniques for the extraction and generalization of compositional and functional relations from specialized written texts and, in particular, from patent claims. One of the most demanding tasks tackled in the thesis is, according to the state of the art, the semantic generalization of linguistic denominations of relations between object components and processes described in the texts. These denominations are usually verbal expressions or nominalizations that are too concrete to be used as standard labels in knowledge representation forms -as, for example, “A leads to B”, and “C provokes D”, where “leads to” and “provokes” both express, in abstract terms, a cause, such that in both cases “A CAUSE B” and “C CAUSE D” would be more appropriate. A semantic generalization of the relations allows us to achieve a higher degree of abstraction of the relationships between objects and processes described in the claims and reduces their number to a limited set that is oriented towards relations as commonly used in the generic field of knowledge representation.
Esta tesis se centra en el del desarrollo de tecnologías del Procesamiento del Lenguage Natural para la extracción y generalización de relaciones encontradas en textos especializados; concretamente en las reivindicaciones de patentes. Una de las tareas más demandadas de nuestro trabajo, desde el punto vista del estado de la cuestión, es la generalización de las denominaciones lingüísticas de las relaciones. Estas denominaciones, usualmente verbos, son demasiado concretas para ser usadas como etiquetas de relaciones en el contexto de la representación del conocimiento; por ejemplo, “A lleva a B”, “B es el resultado de A” están mejor representadas por “A causa B”. La generalización de relaciones permite reducir el n\'umero de relaciones a un conjunto limitado, orientado al tipo de relaciones utilizadas en el campo de la representación del conocimiento.
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Lefeuvre, Luce. "Analyse des marqueurs de relations conceptuelles en corpus spécialisé : recensement, évaluation et caractérisation en fonction du domaine et du genre textuel." Thesis, Toulouse 2, 2017. http://www.theses.fr/2017TOU20051.

Повний текст джерела
Анотація:
L’intérêt d’utiliser des marqueurs de relations conceptuelles pour élaborer des ressources terminologiques à maintes fois été souligné, car ils permettent de passer d’un triplet repéré en corpus comme « Terme1 – Marqueur – Terme2 », à un triplet interprété comme « Terme1 – Relation – Terme2 » permettant une représentation sous forme relationnelle des connaissances. Le passage d’un triplet à l’autre soulève néanmoins la question de la stabilité d’un tel lien, indépendamment de tout corpus. Dans cette thèse, nous étudions la variation du fonctionnement des candidats-marqueurs de relation en prenant en compte le domaine et le genre textuel. Pour cela, nous avons constitué la liste des marqueurs des relations d’hyperonymie, de méronymie, et de cause en français et avons analysé le fonctionnement de chacune des occurrences de ces candidats-marqueurs dans un corpus traitant de deux domaines (volcanologie et cancer du sein) et relevant de deux genres textuels (scientifique et vulgarisé). La description systématique des contextes comportant un candidat-marqueur nous a permis de mesurer la précision de chacun des candidats-marqueurs, c’est-à-dire sa capacité à indiquer la relation attendue. Les analyses menées démontrent finalement la pertinence d’intégrer ces paramètres dans la description linguistique des candidats-marqueurs de relations
The use of markers of conceptual relation for building terminological resources has been frequently emphasized. Those markers are used in corpora to detect “Term1 – marker – Term2” triple, which are then interpreted as “Term1 - Conceptual Relation – Term2” triple allowing to represent knowledge as a relational system model. The transition from one triple to another questions the stability of this link, regardless of corpora. In this thesis, we study the variation of the “candidate-markers” of relation taking into account the domain and the text genre. To this end, we identified the French markers for the hyperonym, the meronym and the causal relation, and systematically analyzed their functioning within corpora varying according to the domain (breast cancer vs. volcanology) and the text genre (popular science vs. specialized texts). For each context containing a candidate-marker, we evaluated the capacity of the candidate-marker to really indicate the required relation. Our researches attest to the relevance of taking into account the domain and the text genre when describing the functioning of conceptual relation markers
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Anderson, Emily. "States of extraction : impacts of taxation on statebuilding in Angola and Mozambique, 1975-2013." Thesis, London School of Economics and Political Science (University of London), 2014. http://etheses.lse.ac.uk/3071/.

Повний текст джерела
Анотація:
This PhD investigates the impacts of taxation on state capacity and accountability through comparative case studies of Angola and Mozambique between 1975 and 2013. Extremes of violence and economic dependency dominate the postcolonial histories of Angola and Mozambique. These cases provide an ideal setting for comparative analysis of how civil war and single resource dependence influence the links between taxation and statebuilding. The thesis demonstrates, in contrast to bellicist notions, that civil war did not strengthen the tax systems or create stronger states. Rather, transitions from the colonial capitalist regimes to socialism and then towards market capitalism, as well as the availability of autonomous income sources, were the central drivers of change in extractive processes. The research establishes taxation as both a critical explanation for development trajectories and a reflection of state capacity and accountability. Existing research on taxation and statebuilding in contemporary developing countries tends to treat tax as a catalyst for democracy, but I find that it provides political regimes with an equally powerful tool to expand power through neopatrimonial networks and consolidate control over the state. Analysis of the case studies concludes that, driven by extraverted elite accumulation strategies, vast oil resources in Angola and large-scale foreign aid in Mozambique worked similarly to disconnect state finances from society and undermine the potential links between revenue collection and redistribution, thereby reducing the possibility of enhanced state capacity or accountability.
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Akbik, Alan [Verfasser], Volker [Akademischer Betreuer] Markl, Hans [Gutachter] Uszkoreit, and Chris [Gutachter] Biemann. "Exploratory relation extraction in large multilingual data / Alan Akbik ; Gutachter: Hans Uszkoreit, Chris Biemann ; Betreuer: Volker Markl." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156177308/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Jean-Louis, Ludovic. "Approches supervisées et faiblement supervisées pour l'extraction d'événements et le peuplement de bases de connaissances." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00686811.

Повний текст джерела
Анотація:
La plus grande partie des informations disponibles librement sur le Web se présentent sous une forme textuelle, c'est-à-dire non-structurée. Dans un contexte comme celui de la veille, il est très utile de pouvoir présenter les informations présentes dans les textes sous une forme structurée en se focalisant sur celles jugées pertinentes vis-à-vis du domaine d'intérêt considéré. Néanmoins, lorsque l'on souhaite traiter ces informations de façon systématique, les méthodes manuelles ne sont pas envisageables du fait du volume important des données à considérer.L'extraction d'information s'inscrit dans la perspective de l'automatisation de ce type de tâches en identifiant dans des textes les informations concernant des faits (ou événements) afin de les stocker dans des structures de données préalablement définies. Ces structures, appelées templates (ou formulaires), agrègent les informations caractéristiques d'un événement ou d'un domaine d'intérêt représentées sous la forme d'entités nommées (nom de lieux, etc.).Dans ce contexte, le travail de thèse que nous avons mené s'attache à deux grandes problématiques : l'identification des informations liées à un événement lorsque ces informations sont dispersées à une échelle textuelle en présence de plusieurs occurrences d'événements de même type;la réduction de la dépendance vis-à-vis de corpus annotés pour la mise en œuvre d'un système d'extraction d'information.Concernant la première problématique, nous avons proposé une démarche originale reposant sur deux étapes. La première consiste en une segmentation événementielle identifiant dans un document les zones de texte faisant référence à un même type d'événements, en s'appuyant sur des informations de nature temporelle. Cette segmentation détermine ainsi les zones sur lesquelles le processus d'extraction doit se focaliser. La seconde étape sélectionne à l'intérieur des segments identifiés comme pertinents les entités associées aux événements. Elle conjugue pour ce faire une extraction de relations entre entités à un niveau local et un processus de fusion global aboutissant à un graphe d'entités. Un processus de désambiguïsation est finalement appliqué à ce graphe pour identifier l'entité occupant un rôle donné vis-à-vis d'un événement lorsque plusieurs sont possibles.La seconde problématique est abordée dans un contexte de peuplement de bases de connaissances à partir de larges ensembles de documents (plusieurs millions de documents) en considérant un grand nombre (une quarantaine) de types de relations binaires entre entités nommées. Compte tenu de l'effort représenté par l'annotation d'un corpus pour un type de relations donné et du nombre de types de relations considérés, l'objectif est ici de s'affranchir le plus possible du recours à une telle annotation tout en conservant une approche par apprentissage. Cet objectif est réalisé par le biais d'une approche dite de supervision distante prenant comme point de départ des exemples de relations issus d'une base de connaissances et opérant une annotation non supervisée de corpus en fonction de ces relations afin de constituer un ensemble de relations annotées destinées à la construction d'un modèle par apprentissage. Cette approche a été évaluée à large échelle sur les données de la campagne TAC-KBP 2010.
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Berrahou, Soumia Lilia. "Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS019/document.

Повний текст джерела
Анотація:
Aujourd'hui, la communauté scientifique a l'opportunité de partager des connaissances et d'accéder à de nouvelles informations à travers les documents publiés et stockés dans les bases en ligne du web. Dans ce contexte, la valorisation des données disponibles reste un défi majeur pour permettre aux experts de les réutiliser et les analyser afin de produire de la connaissance du domaine. Pour être valorisées, les données pertinentes doivent être extraites des documents puis structurées. Nos travaux s'inscrivent dans la problématique de la capitalisation des données expérimentales issues des articles scientifiques, sélectionnés dans des bases en ligne, afin de les réutiliser dans des outils d'aide à la décision. Les mesures expérimentales (par exemple, la perméabilité à l'oxygène d'un emballage ou le broyage d'une biomasse) réalisées sur différents objets d'études (par exemple, emballage ou procédé de bioraffinerie) sont représentées sous forme de relations n-aires dans une Ressource Termino-Ontologique (RTO). La RTO est modélisée pour représenter les relations n-aires en associant une partie terminologique et/ou linguistique aux ontologies afin d'établir une distinction claire entre la manifestation linguistique (le terme) et la notion qu'elle dénote (le concept). La thèse a pour objectif de proposer une contribution méthodologique d'extraction automatique ou semi-automatique d'arguments de relations n-aires provenant de documents textuels afin de peupler la RTO avec de nouvelles instances. Les méthodologies proposées exploitent et adaptent conjointement des approches de Traitement automatique de la Langue (TAL) et de fouille de données, le tout s'appuyant sur le support sémantique apporté par la RTO de domaine. De manière précise, nous cherchons, dans un premier temps, à extraire des termes, dénotant les concepts d'unités de mesure, réputés difficiles à identifier du fait de leur forte variation typographique dans les textes. Après la localisation de ces derniers par des méthodes de classification automatique, les variants d'unités sont identifiés en utilisant des mesures d'édition originales. La seconde contribution méthodologique de nos travaux repose sur l'adaptation et la combinaison de méthodes de fouille de données (extraction de motifs et règles séquentiels) et d'analyse syntaxique pour identifier les instances d'arguments de la relation n-aire recherchée
Today, a huge amount of data is made available to the research community through several web-based libraries. Enhancing data collected from scientific documents is a major challenge in order to analyze and reuse efficiently domain knowledge. To be enhanced, data need to be extracted from documents and structured in a common representation using a controlled vocabulary as in ontologies. Our research deals with knowledge engineering issues of experimental data, extracted from scientific articles, in order to reuse them in decision support systems. Experimental data can be represented by n-ary relations which link a studied object (e.g. food packaging, transformation process) with its features (e.g. oxygen permeability in packaging, biomass grinding) and capitalized in an Ontological and Terminological Ressource (OTR). An OTR associates an ontology with a terminological and/or a linguistic part in order to establish a clear distinction between the term and the notion it denotes (the concept). Our work focuses on n-ary relation extraction from scientific documents in order to populate a domain OTR with new instances. Our contributions are based on Natural Language Processing (NLP) together with data mining approaches guided by the domain OTR. More precisely, firstly, we propose to focus on unit of measure extraction which are known to be difficult to identify because of their typographic variations. We propose to rely on automatic classification of texts, using supervised learning methods, to reduce the search space of variants of units, and then, we propose a new similarity measure that identifies them, taking into account their syntactic properties. Secondly, we propose to adapt and combine data mining methods (sequential patterns and rules mining) and syntactic analysis in order to overcome the challenging process of identifying and extracting n-ary relation instances drowned in unstructured texts
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Singh, Dory. "Extraction des relations de causalité dans les textes économiques par la méthode de l’exploration contextuelle." Thesis, Paris 4, 2017. http://www.theses.fr/2017PA040155.

Повний текст джерела
Анотація:
La thèse décrit un processus d’extraction d’informations causales dans les textes économiques qui, contrairement à l’économétrie, se fonde essentiellement sur des ressources linguistiques. En effet, l’économétrie appréhende la notion causale selon des modèles mathématiques et statistiques qui aujourd’hui sont sujets à controverses. Aussi, notre démarche se propose de compléter ou appuyer les modèles économétriques. Il s’agit d’annoter automatiquement des segments textuels selon la méthode de l’exploration contextuelle (EC). L’EC est une stratégie linguistique et computationnelle qui vise à extraire des connaissances selon un point de vue. Par conséquent, cette contribution adopte le point de vue discursif de la causalité où les catégories sont structurées dans une carte sémantique permettant l’élaboration des règles abductives implémentées dans les systèmes EXCOM2 et SEMANTAS
The thesis describes a process of extraction of causal information, which contrary to econometric, is essentially based on linguistic knowledge. Econometric exploits mathematic or statistic models, which are now, subject of controversy. So, our approach intends to complete or to support the econometric models. It deals with to annotate automatically textual segments according to Contextual Exploration (CE) method. The CE is a linguistic and computational strategy aimed at extracting knowledge according to points of view. Therefore, this contribution adopts the discursive point of view of causality where the categories are structured in a semantic map. These categories allow to elaborate abductive rules implemented in the systems EXCOM2 and SEMANTAS
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Byrne, Kate. "Populating the Semantic Web : combining text and relational databases as RDF graphs." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3781.

Повний текст джерела
Анотація:
The Semantic Web promises a way of linking distributed information at a granular level by interconnecting compact data items instead of complete HTML pages. New data is gradually being added to the Semantic Web but there is a need to incorporate existing knowledge. This thesis explores ways to convert a coherent body of information from various structured and unstructured formats into the necessary graph form. The transformation work crosses several currently active disciplines, and there are further research questions that can be addressed once the graph has been built. Hybrid databases, such as the cultural heritage one used here, consist of structured relational tables associated with free text documents. Access to the data is hampered by complex schemas, confusing terminology and difficulties in searching the text effectively. This thesis describes how hybrid data can be unified by assembly into a graph. A major component task is the conversion of relational database content to RDF. This is an active research field, to which this work contributes by examining weaknesses in some existing methods and proposing alternatives. The next significant element of the work is an attempt to extract structure automatically from English text using natural language processing methods. The first claim made is that the semantic content of the text documents can be adequately captured as a set of binary relations forming a directed graph. It is shown that the data can then be grounded using existing domain thesauri, by building an upper ontology structure from these. A schema for cultural heritage data is proposed, intended to be generic for that domain and as compact as possible. Another hypothesis is that use of a graph will assist retrieval. The structure is uniform and very simple, and the graph can be queried even if the predicates (or edge labels) are unknown. Additional benefits of the graph structure are examined, such as using path length between nodes as a measure of relatedness (unavailable in a relational database where there is no equivalent concept of locality), and building information summaries by grouping the attributes of nodes that share predicates. These claims are tested by comparing queries across the original and the new data structures. The graph must be able to answer correctly queries that the original database dealt with, and should also demonstrate valid answers to queries that could not previously be answered or where the results were incomplete.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії