Дисертації з теми "Relation extractor"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Relation extractor".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Філоненко, О. В., Олена Петрівна Черних та Олександр Миколайович Шеін. "Фільтрування інтернет спаму за допомогою обробки природної мови". Thesis, Національний технічний університет "Харківський політехнічний інститут", 2017. http://repository.kpi.kharkov.ua/handle/KhPI-Press/43684.
Повний текст джерелаScheible, Silke. "Computational treatment of superlatives." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/4153.
Повний текст джерелаHachey, Benjamin. "Towards generic relation extraction." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3978.
Повний текст джерелаNUNES, THIAGO RIBEIRO. "BUILDING RELATION EXTRACTORS THROUGH DISTANT SUPERVISION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2012. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=21588@1.
Повний текст джерелаCONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Um problema conhecido no processo de construção de extratores de relações semânticas supervisionados em textos em linguagem natural é a disponibilidade de uma quantidade suficiente de exemplos positivos para um conjunto amplo de relações-alvo. Este trabalho apresenta uma abordagem supervisionada a distância para construção de extratores de relações a um baixo custo combinando duas das maiores fontes de informação estruturada e não estruturada disponíveis na Web, o DBpedia e a Wikipedia. O método implementado mapeia relações da ontologia do DBpedia de volta para os textos da Wikipedia para montar um conjunto amplo de exemplos contendo mais de 100.000 sentenças descrevendo mais de 90 relações do DBpedia para os idiomas Inglês e Português. Inicialmente, são extraídas sentenças dos artigos da Wikipedia candidatas a expressar relações do DBpedia. Após isso, esses dados são pré-processados e normalizados através da filtragem de sentenças irrelevantes. Finalmente, extraem-se características dos exemplos para treinamento e avaliação de extratores de relações utilizando SVM. Os experimentos realizados nos idiomas Inglês e Português, através de linhas de base, mostram as melhorias alcançadas quando combinados diferentes tipos de características léxicas, sintáticas e semânticas. Para o idioma Inglês, o extrator construído foi treinado em um corpus constituído de 90 relações com 42.471 exemplos de treinamento, atingindo 81.08 por cento de medida F1 em um conjunto de testes contendo 28.773 instâncias. Para Português, o extrator foi treinado em um corpus de 50 relações com 200 exemplos por relação, resultando em um valor de 81.91 por cento de medida F1 em um conjunto de testes contendo 18.333 instâncias. Um processo de Extração de Relações (ER) é constituído de várias etapas, que vão desde o pré-processamento dos textos até o treinamento e a avaliação de detectores de relações supervisionados. Cada etapa pode admitir a implementação de uma ou várias técnicas distintas. Portanto, além da abordagem, este trabalho apresenta, também, detalhes da arquitetura de um framework para apoiar a implementação e a realização de experimentos em um processo de ER.
A well known drawback in building machine learning semantic relation detectors for natural language is the availability of a large number of qualified training instances for the target relations. This work presents an automatic approach to build multilingual semantic relation detectors through distant supervision combining the two largest resources of structured and unstructured content available on the Web, the DBpedia and the Wikipedia resources. We map the DBpedia ontology back to the Wikipedia to extract more than 100.000 training instances for more than 90 DBpedia relations for English and Portuguese without human intervention. First, we mine the Wikipedia articles to find candidate instances for relations described at DBpedia ontology. Second, we preprocess and normalize the data filtering out irrelevant instances. Finally, we use the normalized data to construct SVM detectors. The experiments performed on the English and Portuguese baselines shows that the lexical and syntactic features extracted from Wikipedia texts combined with the semantic features extracted from DBpedia can significantly improve the performance of relation detectors. For English language, the SVM detector was trained in a corpus formed by 90 DBpedia relations and 42.471 training instances, achieving 81.08 per cent of F-Measure when applied to a test set formed by 28.773 instances. The Portuguese detector was trained with 50 DBpedia relations and 200 examples by relation, being evaluated in 81.91 per cent of F-Measure in a test set containing 18.333 instances. A Relation Extraction (RE) process has many distinct steps that usually begins with text pre-processing and finish with the training and the evaluation of relation detectors. Therefore, this works not only presents an RE approach but also an architecture of a framework that supports the implementation and the experiments of a RE process.
Minard, Anne-Lyse. "Extraction de relations en domaine de spécialité." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00777749.
Повний текст джерелаAugenstein, Isabelle. "Web relation extraction with distant supervision." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/13247/.
Повний текст джерелаJean-Louis, Ludovic. "Approches supervisées et faiblement supervisées pour l’extraction d’événements et le peuplement de bases de connaissances." Thesis, Paris 11, 2011. http://www.theses.fr/2011PA112288/document.
Повний текст джерелаThe major part of the information available on the web is provided in textual form, i.e. in unstructured form. In a context such as technology watch, it is useful to present the information extracted from a text in a structured form, reporting only the pieces of information that are relevant to the considered field of interest. Such processing cannot be performed manually at large scale, given the large amount of data available. The automated processing of this task falls within the Information extraction (IE) domain.The purpose of IE is to identify, within documents, pieces of information related to facts (or events) in order to store this information in predefined data structures. These structures, called templates, aggregate fact properties - often represented by named entities - concerning an event or an area of interest.In this context, the research performed in this thesis addresses two problems:identifying information related to a specific event, when the information is scattered across a text and several events of the same type are mentioned in the text;reducing the dependency to annotated corpus for the implementation of an Information Extraction system.Concerning the first problem, we propose an original approach that relies on two steps. The first step operates an event-based text segmentation, which identifies within a document the text segments on which the IE process shall focus to look for the entities associated with a given event. The second step focuses on template filling and aims at selecting, within the segments identified as relevant by the event-based segmentation, the entities that should be used as fillers, using a graph-based method. This method is based on a local extraction of relations between entities, that are merged in a relation graph. A disambiguation step is then performed on the graph to identify the best candidates to fill the information template.The second problem is treated in the context of knowledge base (KB) population, using a large collection of texts (several millions) from which the information is extracted. This extraction also concerns a large number of relation types (more than 40), which makes the manual annotation of the collection too expensive. We propose, in this context, a distant supervision approach in order to use learning techniques for this extraction, without the need of a fully annotated corpus. This distant supervision approach uses a set of relations from an existing KB to perform an unsupervised annotation of a collection, from which we learn a model for relation extraction. This approach has been evaluated at a large scale on the data from the TAC-KBP 2010 evaluation campaign
Afzal, Naveed. "Unsupervised relation extraction for e-learning applications." Thesis, University of Wolverhampton, 2011. http://hdl.handle.net/2436/299064.
Повний текст джерелаLoper, Edward (Edward Daniel) 1977. "Applying semantic relation extraction to information retrieval." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86521.
Повний текст джерелаImani, Mahsa. "Evaluating open relation extraction over conversational texts." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/45978.
Повний текст джерелаDhyani, Dushyanta Dhyani. "Boosting Supervised Neural Relation Extraction with Distant Supervision." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524095334803486.
Повний текст джерелаBai, Fan. "Structured Minimally Supervised Learning for Neural Relation Extraction." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu159666392917093.
Повний текст джерелаZhang, Shaomin. "Thematic knowledge extraction." Thesis, Nottingham Trent University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.272437.
Повний текст джерелаGranada, Roger Leitzke. "Evaluation of methods for taxonomic relation extraction from text." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2015. http://tede2.pucrs.br/tede2/handle/tede/7108.
Повний текст джерелаMade available in DSpace on 2016-12-26T16:34:57Z (GMT). No. of bitstreams: 1 TES_ROGER_LEITZKE_GRANADA_COMPLETO.pdf: 2483840 bytes, checksum: 8f81d3f0496d8fa8d3a1b013dfdf932b (MD5) Previous issue date: 2015-09-28
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES
Sistemas de informa??o modernos t?m mudado a ideia ?processamento de dados? para a ideia de ?processamento de conceitos?, assim, ao inv?s de processarem palavras, tais sistemas fazem o processamento de conceitos que cont?m ignificado e que compartilham contextos com outros contextos. Ontologias s?o normalmente utilizadas como uma estrutura que captura o conhecimento a cerca de uma certa ?rea, provendo conceitos e rela??es entre tais conceitos. Tradicionalmente, hierarquias de conceitos s?o constru?das manualmente por engenheiros do conhecimento ou especialistas do dom?nio. Entretanto, este tipo de constru??o sofre com diversas limita??es, tais como, cobertura e o alto custo de extens?o e manuten??o. Assim, se faz necess?ria a constru??o de tais estruturas automaticamente. O suporte (semi-)automatico no desenvolvimento de ontologias ? comumente referenciado como aprendizagem de ontologias e ? normalmente dividido em etapas, como identifica??o de conceitos, detec??o de rela??es hierarquicas e n?o hierarquicas, e extra??o de axiomas. ? razo?vel dizer que entre tais passos a fronteira est? no estabelecimento de hierarquias de conceitos, pois ? a espinha dorsal das ontologias e, por consequ?ncia, uma boa hierarquia de conceitos ? um recurso v?lido para v?rias aplica??es de ontologias. Hierarquias de conceitos s?o representadas por estruturas em ?rvore com relacionamentos de especializa??o/generaliza??o, onde conceitos nos n?veis mais baixos s?o mais espec?ficos e conceitos nos n?veis mais altos s?o mais gerais. A constru??o autom?tica de tais hierarquias ? uma tarefa complexa e desde a d?cada de 80 muitos trabalhos t?m proposto melhores formas para fazer a extra??o de rela??es entre conceitos. Estas propostas nunca foram contrastadas usando um mesmo conjunto de dados. Tal compara??o ? importante para ver se os m?todos s?o complementares ou incrementais, bem como se apresentam diferentes tend?ncias em rela??o ? precis?o e abrang?ncia, i.e., alguns podem ser bastante precisos e ter uma baixa abrang?ncia enquanto outros t?m uma abrang?ncia melhor por?m com uma baixa precis?o. Outro aspecto refere-se ? varia??o dos resultados em diferentes l?nguas. Esta tese avalia os m?todos utilizando m?tricas de hierarquias como densidade e profundidade, e m?tricas de evalia??o como precis?o e abrang?ncia. A avalia??o ? realizada utilizando o mesmo corpora, consistindo de textos paralelos e compar?veis em ingl?s e portugu?s. S?o realizadas avalia??es autom?tica e manual, sendo a sa?da de sete m?todos avaliados automaticamente e quatro manualmente. Os resultados d?o uma luz sobre a abrang?ncia dos m?todos que s?o utilizados no estado da arte de acordo com a literatura.
Modern information systems are changing the idea of ?data processing? to the idea of ?concept processing?, meaning that instead of processing words, such systems process semantic concepts which carry meaning and share contexts with other concepts. Ontology is commonly used as a structure that captures the knowledge about a certain area via providing concepts and relations between them. Traditionally, concept hierarchies have been built manually by knowledge engineers or domain experts. However, the manual construction of a concept hierarchy suffers from several limitations such as its coverage and the enormous costs of extension and maintenance. Furthermore, keeping up with a hand-crafted concept hierarchy along with the evolution of domain knowledge is an overwhelming task, being necessary to build concept hierarchies automatically. The (semi-)automatic support in ontology development is usually referred to as ontology learning. The ontology learning from texts is usually divided in steps, going from concepts identification, passing through hierarchy and non-hierarchy relations detection and, seldom, axiom extraction. It is reasonable to say that among these steps the current frontier is in the establishment of concept hierarchies, since this is the backbone of ontologies and, therefore, a good concept hierarchy is already a valuable resource for many ontology applications. A concept hierarchy is represented with a tree-structured form with specialization/generalization relations between concepts, in which lower-level concepts are more specific while higher-level are more general. The automatic construction of concept hierarchies from texts is a complex task and since the 1980 decade a large number of works have been proposing approaches to better extract relations between concepts. These different proposals have never been contrasted against each other on the same set of data and across different languages. Such comparison is important to see whether they are complementary or incremental, also we can see whether they present different tendencies towards recall and precision, i.e., some can be very precise but with very low recall and others can achieve better recall but low precision. Another aspect concerns to the variation of results for different languages. This thesis evaluates these different methods on the basis of hierarchy metrics such as density and depth, and evaluation metrics such as Recall and Precision. The evaluation is performed over the same corpora, which consist of English and Portuguese parallel and comparable texts. Both automatic and manual evaluations are presented. The output of seven methods are evaluated automatically and the output of four methods are evaluated manually. Results shed light over the comprehensive set of methods that are the state of the art according to the literature in the area.
Chauhan, Geeticka. "REflex: Flexible framework for Relation Extraction in multiple domains." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122694.
Повний текст джерелаThesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 81-89).
Relation Extraction (RE) refers to the problem of extracting semantic relationships between concepts in a given sentence, and is an important component of Natural Language Understanding (NLU). It has been popularly studied in both the general purpose as well as the medical domains, and researchers have explored the effectiveness of different neural network architectures. However, systematic comparison of methods for RE is difficult because many experiments in the field are not described precisely enough to be completely reproducible and many papers fail to report ablation studies that would highlight the relative contributions of their various combined techniques. As a result, there is a lack of consensus on techniques that will generalize to novel tasks, datasets and contexts. This thesis introduces a unifying framework for RE known as REflex, applied on 3 highly used datasets (from the general, biomedical and clinical domains), with the ability to be extendable to new datasets. REflex allows exploration of the effect of different modeling techniques, pre-processing, training methodologies and evaluation metrics on a dataset of choice. This work performs such a systematic exploration on the 3 datasets and reveals interesting insights from pre-processing and training methodologies that often go unreported in the literature. Other insights from this exploration help in providing recommendations for future research in RE. REflex has experimental as well as design goals. The experimental goals are in identification of sources of variability in results for the 3 datasets and provide the field with a strong baseline model to compare against for future improvements. The design goals are in identification of best practices for relation extraction and to be a guide for approaching new datasets.
by Geeticka Chauhan.
S.M.
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Dahlbom, Norgren Nils. "Relation Classification Between the Extracted Entities of Swedish Verdicts." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-206829.
Повний текст джерелаDetta examensarbete utforskade hur bra en multiklass stödvektor- maskin är på att klassificera sociala relationer mellan extraherade personentiteter ur svenska domar. Med hjälp av manuellt taggade par av personentiteter kallade relationer, har en multiklass stödvektormaskin tränats och testats på att klassifiera dessa relationer. Olika attribut och parametrar har testats för att optimera metoden, och för det slutgiltiga exprimentet har ett resultat på 91.75% för båda mikro precision och återkallning beräknats. För makro precision och återkallning har ett resultat på 73.29% respektive 69.29% beräknats. Detta resulterade i ett makro F värde på 71.23% och ett mikro F värde på 91.75%. Resultaten visade att metoden fungerade för några av relationsklasserna men mer balanserat data skulle ha behövts för att forskningsfrågan skulle kunna besvara helt.
Wang, Wei. "Unsupervised Information Extraction From Text - Extraction and Clustering of Relations between Entities." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00998390.
Повний текст джерелаSavelev, Sergey U. "Extracts of salvia species : relation to potential cognitive therapy." Thesis, University of Newcastle Upon Tyne, 2003. http://hdl.handle.net/10443/608.
Повний текст джерелаConrath, Juliette. "Unsupervised extraction of semantic relations using discourse information." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30202/document.
Повний текст джерелаNatural language understanding often relies on common-sense reasoning, for which knowledge about semantic relations, especially between verbal predicates, may be required. This thesis addresses the challenge of using a distibutional method to automatically extract the necessary semantic information for common-sense inference. Typical associations between pairs of predicates and a targeted set of semantic relations (causal, temporal, similarity, opposition, part/whole) are extracted from large corpora, by exploiting the presence of discourse connectives which typically signal these semantic relations. In order to appraise these associations, we provide several significance measures inspired from the literature as well as a novel measure specifically designed to evaluate the strength of the link between the two predicates and the relation. The relevance of these measures is evaluated by computing their correlations with human judgments, based on a sample of verb pairs annotated in context. The application of this methodology to French and English corpora leads to the construction of a freely available resource, Lecsie (Linked Events Collection for Semantic Information Extraction), which consists of triples: pairs of event predicates associated with a relation; each triple is assigned significance scores based on our measures. From this resource, vector-based representations of pairs of predicates can be induced and used as lexical semantic features to build models for external applications. We assess the potential of these representations for several applications. Regarding discourse analysis, the tasks of predicting attachment of discourse units, as well as predicting the specific discourse relation linking them, are investigated. Using only features from our resource, we obtain significant improvements for both tasks in comparison to several baselines, including ones using other representations of the pairs of predicates. We also propose to define optimal sets of connectives better suited for large corpus applications by performing a dimension reduction in the space of the connectives, instead of using manually composed groups of connectives corresponding to predefined relations. Another promising application pursued in this thesis concerns relations between semantic frames (e.g. FrameNet): the resource can be used to enrich this sparse structure by providing candidate relations between verbal frames, based on associations between their verbs. These diverse applications aim to demonstrate the promising contributions provided by our approach, namely allowing the unsupervised extraction of typed semantic relations
Al, Qady Mohammed Abdelrahman. "Concept relation extraction using natural language processing the CRISP technique /." [Ames, Iowa : Iowa State University], 2008.
Знайти повний текст джерелаXavier, Clarissa Castell? "Learning non-verbal relations under open information extraction paradigm." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2014. http://tede2.pucrs.br/tede2/handle/tede/5275.
Повний текст джерелаO paradigma Open Information Extraction - Open IE (Extra??o Aberta de Informa??es) de extra??o de rela??es trabalha com a identifica??o de rela??es n?o definidas previamente, buscando superar as limita??es impostas pelos m?todos tradicionais de Extra??o de Informa??es como a depend?ncia de dom?nio e a dif?cil escalabilidade. Visando estender o paradigma Open IE para que sejam extra?das rela??es n?o expressas por verbos a partir de textos em ingl?s, apresentamos CompIE, um componente que aprende rela??es expressas em compostos nominais (CNs), como (oil, extracted from, olive) - (?leo, extra?do da, oliva) - do composto nominal olive oil - ?leo de oliva, ou em pares do tipo adjetivo-substantivo (ASs), como (moon, that is, gorgeous) - (lua, que ?, linda) - do AS gorgeous moon (linda lua). A entrada do CompIE ? um arquivo texto, e sua sa?da ? um conjunto de triplas descrevendo rela??es bin?rias. Sua arquitetura ? composta por duas tarefas principais: Extrator de CNs e ASs (1) e Interpretador de CNs e ASs (2). A primeira tarefa gera uma lista de CNs e ASs a partir do corpus de entrada. A segunda tarefa realiza a interpreta??o dos CNs e ASs gerando as triplas que descrevem as rela??es extra?das do corpus. Para estudar a viabilidade da solu??o apresentada, realizamos uma avalia??o baseada em hip?teses. Um prot?tipo foi constru?do com o intuito de validar cada uma das hip?teses. Os resultados obtidos mostram que nossa solu??o alcan?a 89% de Precis?o e demonstram que o CompIE atinge sua meta de estender o paradigma Open IE extraindo rela??es expressas dentro dos CNs e ASs.
The Open Information Extraction (Open IE) is a relation extraction paradigm in which the target relationships cannot be specified in advance, and it aims to overcome the limitations imposed by traditional IE methods, such as domain-dependence and scalability. In order to extend Open IE to extract relationships that are not expressed by verbs from texts in English, we introduce CompIE, a component that learns relations expressed in noun compounds (NCs), such as (oil, extracted from, olive) from olive oil, or in adjectivenoun pairs (ANs), such as (moon, that is, gorgeous) from gorgeous moon. CompIE input is a text file, and the output is a set of triples describing binary relationships. The architecture comprises two main tasks: NCs and ANs Extraction (1) and NCs and ANs Interpretation (2). The first task generates a list of NCs and ANs from the input corpus. The second task performs the interpretation of NCs and ANs and generates the tuples that describe the relations extracted from the corpus. In order to study CompIE s feasibility, we perform an evaluation based on hypotheses. In order to implement the strategies to validate each hypothesis we have built a prototype. The results show that our solution achieves 89% Precision and demonstrate that CompIE reaches its goal of extending Open IE paradigm extracting relationships within NCs and ANs.
Xavier, Clarissa Castellã. "Learning non-verbal relations under open information extraction paradigm." Pontifícia Universidade Católica do Rio Grande do Sul, 2014. http://hdl.handle.net/10923/7073.
Повний текст джерелаThe Open Information Extraction (Open IE) is a relation extraction paradigm in which the target relationships cannot be specified in advance, and it aims to overcome the limitations imposed by traditional IE methods, such as domain-dependence and scalability. In order to extend Open IE to extract relationships that are not expressed by verbs from texts in English, we introduce CompIE, a component that learns relations expressed in noun compounds (NCs), such as (oil, extracted from, olive) from olive oil, or in adjectivenoun pairs (ANs), such as (moon, that is, gorgeous) from gorgeous moon. CompIE input is a text file, and the output is a set of triples describing binary relationships. The architecture comprises two main tasks: NCs and ANs Extraction (1) and NCs and ANs Interpretation (2). The first task generates a list of NCs and ANs from the input corpus. The second task performs the interpretation of NCs and ANs and generates the tuples that describe the relations extracted from the corpus. In order to study CompIE’s feasibility, we perform an evaluation based on hypotheses. In order to implement the strategies to validate each hypothesis we have built a prototype. The results show that our solution achieves 89% Precision and demonstrate that CompIE reaches its goal of extending Open IE paradigm extracting relationships within NCs and ANs.
O paradigma Open Information Extraction - Open IE (Extração Aberta de Informações) de extração de relações trabalha com a identificação de relações não definidas previamente, buscando superar as limitações impostas pelos métodos tradicionais de Extração de Informações como a dependência de domínio e a difícil escalabilidade. Visando estender o paradigma Open IE para que sejam extraídas relações não expressas por verbos a partir de textos em inglês, apresentamos CompIE, um componente que aprende relações expressas em compostos nominais (CNs), como (oil, extracted from, olive) - (óleo, extraído da, oliva) - do composto nominal olive oil - óleo de oliva, ou em pares do tipo adjetivo-substantivo (ASs), como (moon, that is, gorgeous) - (lua, que é, linda) - do AS gorgeous moon (linda lua). A entrada do CompIE é um arquivo texto, e sua saída é um conjunto de triplas descrevendo relações binárias. Sua arquitetura é composta por duas tarefas principais: Extrator de CNs e ASs (1) e Interpretador de CNs e ASs (2). A primeira tarefa gera uma lista de CNs e ASs a partir do corpus de entrada. A segunda tarefa realiza a interpretação dos CNs e ASs gerando as triplas que descrevem as relações extraídas do corpus. Para estudar a viabilidade da solução apresentada, realizamos uma avaliação baseada em hipóteses. Um protótipo foi construído com o intuito de validar cada uma das hipóteses. Os resultados obtidos mostram que nossa solução alcança 89% de Precisão e demonstram que o CompIE atinge sua meta de estender o paradigma Open IE extraindo relações expressas dentro dos CNs e ASs.
ASSIS, PEDRO HENRIQUE RIBEIRO DE. "DISTANT SUPERVISION FOR RELATION EXTRACTION USING ONTOLOGY CLASS HIERARCHY-BASED FEATURES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=24296@1.
Повний текст джерелаCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Extração de relacionamentos é uma etapa chave para o problema de identificação de uma estrutura em um texto em formato de linguagem natural. Em geral, estruturas são compostas por entidades e relacionamentos entre elas. As propostas de solução com maior sucesso aplicam aprendizado de máquina supervisionado a corpus anotados à mão para a criação de classificadores de alta precisão. Embora alcancem boa robustez, corpus criados à mão não são escaláveis por serem uma alternativa de grande custo. Neste trabalho, nós aplicamos um paradigma alternativo para a criação de um número considerável de exemplos de instâncias para classificação. Tal método é chamado de supervisão à distância. Em conjunto com essa alternativa, usamos ontologias da Web semântica para propor e usar novas características para treinar classificadores. Elas são baseadas na estrutura e semântica descrita por ontologias onde recursos da Web semântica são definidos. O uso de tais características tiveram grande impacto na precisão e recall dos nossos classificadores finais. Neste trabalho, aplicamos nossa teoria em um corpus extraído da Wikipedia. Alcançamos uma alta precisão e recall para um número considerável de relacionamentos.
Relation extraction is a key step for the problem of rendering a structure from natural language text format. In general, structures are composed by entities and relationships among them. The most successful approaches on relation extraction apply supervised machine learning on hand-labeled corpus for creating highly accurate classifiers. Although good robustness is achieved, hand-labeled corpus are not scalable due to the expensive cost of its creation. In this work we apply an alternative paradigm for creating a considerable number of examples of instances for classification. Such method is called distant supervision. Along with this alternative approach we adopt Semantic Web ontologies to propose and use new features for training classifiers. Those features are based on the structure and semantics described by ontologies where Semantic Web resources are defined. The use of such features has a great impact on the precision and recall of our final classifiers. In this work, we apply our theory on corpus extracted from Wikipedia. We achieve a high precision and recall for a considerable number of relations.
Achouri, Abdelghani. "Extraction de relations d'associations maximales dans les textes : représentation graphique." Thèse, Université du Québec à Trois-Rivières, 2012. http://depot-e.uqtr.ca/6132/1/030374207.pdf.
Повний текст джерелаCaliff, Mary Elaine. "Relational learning techniques for natural language information extraction /." Digital version accessible at:, 1998. http://wwwlib.umi.com/cr/utexas/main.
Повний текст джерелаLord, Dale. "Relational Database for Visual Data Management." International Foundation for Telemetering, 2005. http://hdl.handle.net/10150/604893.
Повний текст джерелаOften it is necessary to retrieve segments of video with certain characteristics, or features, from a large archive of footage. This paper discusses how image processing algorithms can be used to automatically create a relational database, which indexes the video archive. This feature extraction can be performed either upon acquisition or in post processing. The database can then be queried to quickly locate and recover video segments with certain specified key features
Karoui, Lobna. "Extraction contextuelle d'ontologie par fouille de données." Paris 11, 2008. http://www.theses.fr/2008PA112220.
Повний текст джерелаRyan, Russell J. (Russell John Wyatt). "Groundtruth budgeting : a novel approach to semi-supervised relation extraction in medical language." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66456.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (p. 67-69).
We address the problem of weakly-supervised relation extraction in hospital discharge summaries. Sentences with pre-identified concept types (for example: medication, test, problem, symptom) are labeled with the relationship between the concepts. We present a novel technique for weakly-supervised bootstrapping of a classifier for this task: Groundtruth Budgeting. In the case of highly-overlapping, self-similar datasets as is the case with the 2010 i2b2/VA challenge corpus, the performance of classifiers on the minority classes is often poor. To address this we set aside a random portion of the groundtruth at the beginning of bootstrapping which will be gradually added as the classifier is bootstrapped. The classifier chooses groundtruth samples to be added by measuring the confidence of its predictions on them and choosing samples for which it has the least confident predictions. By adding samples in this fashion, the classifier is able to increase its coverage of the decision space while not adding too many majority-class examples. We evaluate this approach on the 2010 i2b2/VA challenge corpus containing of 477 patient discharge summaries and show that with a training corpus of 349 discharge summaries, budgeting 10% of the corpus achieves equivalent results to a bootstrapping classifier starting with the entire corpus. We compare our results to those of other papers published in the proceedings of the 2010 Fourth i2b2/VA Shared-Task and Workshop.
by Russell J. Ryan.
M.Eng.
Marshman, Elizabeth. "The cause relation in biopharmaceutical corpora: English and French patterns for knowledge extraction." Thesis, University of Ottawa (Canada), 2002. http://hdl.handle.net/10393/6385.
Повний текст джерелаXavier, Clarissa [Verfasser]. "Learning Non-Verbal Relations Under Open Information Extraction Paradigm / Clarissa Xavier." Munich : GRIN Verlag, 2015. http://d-nb.info/1097578720/34.
Повний текст джерелаBourgeois, Thomas C. "English Relative Clause Extraction: A Syntactic and Semantic Approach." University of Arizona Linguistics Circle, 1989. http://hdl.handle.net/10150/226574.
Повний текст джерелаVempala, Alakananda. "Extracting Temporally-Anchored Spatial Knowledge." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1505146/.
Повний текст джерелаMorsi, Youcef Ihab. "Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC019.
Повний текст джерелаThis thesis focuses on the development of a tool for the automatic processing of Modern Standard Arabic, at the morphological and semantic levels, with the final objective of Information Extraction on technological innovations. As far as the morphological analysis is concerned, our tool includes several successive processing stages that allow to label and disambiguate occurrences in texts: a morphological layer (Gibran 1.0), which relies on Arabic pattern as distinctive features; a contextual layer (Gibran 2.0), which uses contextual rules; and a third layer (Gibran 3.0), which uses a machine learning model. Our methodology is evaluated using the annotated corpus Arabic-PADT UD treebank. The evaluations obtain an F-measure of 0.92 and 0.90 for the morphological analyses. These experiments demontrate the possibility of improving such a corpus through linguistic analyses. This approach allowed us to develop a prototype of information extraction on technological innovations for the Arabic language. It is based on the morphological analysis and syntaxico-semantic patterns. This thesis is part of a PhD-entrepreneur course
Darud, Véronique. "Relations synthèse-structure-propriétés de polyuréthannes linéaires susceptibles d’être utilises comme liant de particules magnétiques." Lyon, INSA, 1988. http://www.theses.fr/1988ISAL0073.
Повний текст джерелаRuiz, Fabo Pablo. "Concept-based and relation-based corpus navigation : applications of natural language processing in digital humanities." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE053/document.
Повний текст джерелаSocial sciences and Humanities research is often based on large textual corpora, that it would be unfeasible to read in detail. Natural Language Processing (NLP) can identify important concepts and actors mentioned in a corpus, as well as the relations between them. Such information can provide an overview of the corpus useful for domain-experts, and help identify corpus areas relevant for a given research question. To automatically annotate corpora relevant for Digital Humanities (DH), the NLP technologies we applied are, first, Entity Linking, to identify corpus actors and concepts. Second, the relations between actors and concepts were determined based on an NLP pipeline which provides semantic role labeling and syntactic dependencies among other information. Part I outlines the state of the art, paying attention to how the technologies have been applied in DH.Generic NLP tools were used. As the efficacy of NLP methods depends on the corpus, some technological development was undertaken, described in Part II, in order to better adapt to the corpora in our case studies. Part II also shows an intrinsic evaluation of the technology developed, with satisfactory results. The technologies were applied to three very different corpora, as described in Part III. First, the manuscripts of Jeremy Bentham. This is a 18th-19th century corpus in political philosophy. Second, the PoliInformatics corpus, with heterogeneous materials about the American financial crisis of 2007-2008. Finally, the Earth Negotiations Bulletin (ENB), which covers international climate summits since 1995, where treaties like the Kyoto Protocol or the Paris Agreements get negotiated.For each corpus, navigation interfaces were developed. These user interfaces (UI) combine networks, full-text search and structured search based on NLP annotations. As an example, in the ENB corpus interface, which covers climate policy negotiations, searches can be performed based on relational information identified in the corpus: the negotiation actors having discussed a given issue using verbs indicating support or opposition can be searched, as well as all statements where a given actor has expressed support or opposition. Relation information is employed, beyond simple co-occurrence between corpus terms.The UIs were evaluated qualitatively with domain-experts, to assess their potential usefulness for research in the experts' domains. First, we payed attention to whether the corpus representations we created correspond to experts' knowledge of the corpus, as an indication of the sanity of the outputs we produced. Second, we tried to determine whether experts could gain new insight on the corpus by using the applications, e.g. if they found evidence unknown to them or new research ideas. Examples of insight gain were attested with the ENB interface; this constitutes a good validation of the work carried out in the thesis. Overall, the applications' strengths and weaknesses were pointed out, outlining possible improvements as future work
Ratkovic, Zorana. "Predicative Analysis for Information Extraction : application to the biology domain." Thesis, Paris 3, 2014. http://www.theses.fr/2014PA030110.
Повний текст джерелаThe abundance of biomedical information expressed in natural language has resulted in the need for methods to process this information automatically. In the field of Natural Language Processing (NLP), Information Extraction (IE) focuses on the extraction of relevant information from unstructured data in natural language. A great deal of IE methods today focus on Machine Learning (ML) approaches that rely on deep linguistic processing in order to capture the complex information contained in biomedical texts. In particular, syntactic analysis and parsing have played an important role in IE, by helping capture how words in a sentence are related. This thesis examines how dependency parsing can be used to facilitate IE. It focuses on a task-based approach to dependency parsing evaluation and parser selection, including a detailed error analysis. In order to achieve a high quality of syntax-based IE, different stages of linguistic processing are addressed, including both pre-processing steps (such as tokenization) and the use of complementary linguistic processing (such as the use of semantics and coreference analysis). This thesis also explores how the different levels of linguistics processing can be represented for use within an ML-based IE algorithm, and how the interface between these two is of great importance. Finally, biomedical data is very heterogeneous, encompassing different subdomains and genres. This thesis explores how subdomain-adaptationcan be achieved by using already existing subdomain knowledge and resources. The methods and approaches described are explored using two different biomedical corpora, demonstrating how the IE results are used in real-life tasks
Gonzàlez, Pellicer Edgar. "Unsupervised learning of relation detection patterns." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/83906.
Повний текст джерелаInformation extraction is the natural language processing area whose goal is to obtain structured data from the relevant information contained in textual fragments. Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort. Machine learning techniques have been applied for decades so as to overcome this portability bottleneck¿progressively reducing the amount of involved human supervision. However, as the availability of large document collections increases, completely unsupervised approaches become necessary in order to mine the knowledge contained in them. The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third, devising pattern learning procedures which incorporated clustering information. By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable approaches in the state of the art.
Chu, Timothy Sui-Tim. "Genealogy Extraction and Tree Generation from Free Form Text." DigitalCommons@CalPoly, 2017. https://digitalcommons.calpoly.edu/theses/1796.
Повний текст джерелаTomczak, Jakub. "Algorithms for knowledge discovery using relation identification methods." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2563.
Повний текст джерелаDouble Diploma Programme, polish supervisor: prof. Jerzy Świątek, Wrocław University of Technology
Hakenberg, Jörg. "Mining relations from the biomedical literature." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2010. http://dx.doi.org/10.18452/16073.
Повний текст джерелаText mining deals with the automated annotation of texts and the extraction of facts from textual data for subsequent analysis. Such texts range from short articles and abstracts to large documents, for instance web pages and scientific articles, but also include textual descriptions in otherwise structured databases. This thesis focuses on two key problems in biomedical text mining: relationship extraction from biomedical abstracts ---in particular, protein--protein interactions---, and a pre-requisite step, named entity recognition ---again focusing on proteins. This thesis presents goals, challenges, and typical approaches for each of the main building blocks in biomedical text mining. We present out own approaches for named entity recognition of proteins and relationship extraction of protein-protein interactions. For the first, we describe two methods, one set up as a classification task, the other based on dictionary-matching. For relationship extraction, we develop a methodology to automatically annotate large amounts of unlabeled data for relations, and make use of such annotations in a pattern matching strategy. This strategy first extracts similarities between sentences that describe relations, storing them as consensus patterns. We develop a sentence alignment approach that introduces multi-layer alignment, making use of multiple annotations per word. For the task of extracting protein-protein interactions, empirical results show that our methodology performs comparable to existing approaches that require a large amount of human intervention, either for annotation of data or creation of models.
Pareti, Silvia. "Attribution : a computational approach." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14170.
Повний текст джерелаAlatrista-Salas, Hugo. "Extraction de relations spatio-temporelles à partir des données environnementales et de la santé." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2013. http://tel.archives-ouvertes.fr/tel-00997539.
Повний текст джерелаFerraro, Gabriela. "Towards deep content extraction from specialized discourse : the case of verbal relations in patent claims." Doctoral thesis, Universitat Pompeu Fabra, 2012. http://hdl.handle.net/10803/84174.
Повний текст джерелаEsta tesis se centra en el del desarrollo de tecnologías del Procesamiento del Lenguage Natural para la extracción y generalización de relaciones encontradas en textos especializados; concretamente en las reivindicaciones de patentes. Una de las tareas más demandadas de nuestro trabajo, desde el punto vista del estado de la cuestión, es la generalización de las denominaciones lingüísticas de las relaciones. Estas denominaciones, usualmente verbos, son demasiado concretas para ser usadas como etiquetas de relaciones en el contexto de la representación del conocimiento; por ejemplo, “A lleva a B”, “B es el resultado de A” están mejor representadas por “A causa B”. La generalización de relaciones permite reducir el n\'umero de relaciones a un conjunto limitado, orientado al tipo de relaciones utilizadas en el campo de la representación del conocimiento.
Lefeuvre, Luce. "Analyse des marqueurs de relations conceptuelles en corpus spécialisé : recensement, évaluation et caractérisation en fonction du domaine et du genre textuel." Thesis, Toulouse 2, 2017. http://www.theses.fr/2017TOU20051.
Повний текст джерелаThe use of markers of conceptual relation for building terminological resources has been frequently emphasized. Those markers are used in corpora to detect “Term1 – marker – Term2” triple, which are then interpreted as “Term1 - Conceptual Relation – Term2” triple allowing to represent knowledge as a relational system model. The transition from one triple to another questions the stability of this link, regardless of corpora. In this thesis, we study the variation of the “candidate-markers” of relation taking into account the domain and the text genre. To this end, we identified the French markers for the hyperonym, the meronym and the causal relation, and systematically analyzed their functioning within corpora varying according to the domain (breast cancer vs. volcanology) and the text genre (popular science vs. specialized texts). For each context containing a candidate-marker, we evaluated the capacity of the candidate-marker to really indicate the required relation. Our researches attest to the relevance of taking into account the domain and the text genre when describing the functioning of conceptual relation markers
Anderson, Emily. "States of extraction : impacts of taxation on statebuilding in Angola and Mozambique, 1975-2013." Thesis, London School of Economics and Political Science (University of London), 2014. http://etheses.lse.ac.uk/3071/.
Повний текст джерелаAkbik, Alan [Verfasser], Volker [Akademischer Betreuer] Markl, Hans [Gutachter] Uszkoreit, and Chris [Gutachter] Biemann. "Exploratory relation extraction in large multilingual data / Alan Akbik ; Gutachter: Hans Uszkoreit, Chris Biemann ; Betreuer: Volker Markl." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156177308/34.
Повний текст джерелаJean-Louis, Ludovic. "Approches supervisées et faiblement supervisées pour l'extraction d'événements et le peuplement de bases de connaissances." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00686811.
Повний текст джерелаBerrahou, Soumia Lilia. "Extraction d'arguments de relations n-aires dans les textes guidée par une RTO de domaine." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS019/document.
Повний текст джерелаToday, a huge amount of data is made available to the research community through several web-based libraries. Enhancing data collected from scientific documents is a major challenge in order to analyze and reuse efficiently domain knowledge. To be enhanced, data need to be extracted from documents and structured in a common representation using a controlled vocabulary as in ontologies. Our research deals with knowledge engineering issues of experimental data, extracted from scientific articles, in order to reuse them in decision support systems. Experimental data can be represented by n-ary relations which link a studied object (e.g. food packaging, transformation process) with its features (e.g. oxygen permeability in packaging, biomass grinding) and capitalized in an Ontological and Terminological Ressource (OTR). An OTR associates an ontology with a terminological and/or a linguistic part in order to establish a clear distinction between the term and the notion it denotes (the concept). Our work focuses on n-ary relation extraction from scientific documents in order to populate a domain OTR with new instances. Our contributions are based on Natural Language Processing (NLP) together with data mining approaches guided by the domain OTR. More precisely, firstly, we propose to focus on unit of measure extraction which are known to be difficult to identify because of their typographic variations. We propose to rely on automatic classification of texts, using supervised learning methods, to reduce the search space of variants of units, and then, we propose a new similarity measure that identifies them, taking into account their syntactic properties. Secondly, we propose to adapt and combine data mining methods (sequential patterns and rules mining) and syntactic analysis in order to overcome the challenging process of identifying and extracting n-ary relation instances drowned in unstructured texts
Singh, Dory. "Extraction des relations de causalité dans les textes économiques par la méthode de l’exploration contextuelle." Thesis, Paris 4, 2017. http://www.theses.fr/2017PA040155.
Повний текст джерелаThe thesis describes a process of extraction of causal information, which contrary to econometric, is essentially based on linguistic knowledge. Econometric exploits mathematic or statistic models, which are now, subject of controversy. So, our approach intends to complete or to support the econometric models. It deals with to annotate automatically textual segments according to Contextual Exploration (CE) method. The CE is a linguistic and computational strategy aimed at extracting knowledge according to points of view. Therefore, this contribution adopts the discursive point of view of causality where the categories are structured in a semantic map. These categories allow to elaborate abductive rules implemented in the systems EXCOM2 and SEMANTAS
Byrne, Kate. "Populating the Semantic Web : combining text and relational databases as RDF graphs." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3781.
Повний текст джерела