Dissertations / Theses: 'Graph extraction'

1

Dandala, Bharath. "Graph-Based Keyphrase Extraction Using Wikipedia." Thesis, University of North Texas, 2010. https://digital.library.unt.edu/ark:/67531/metadc67939/.

Full text

Abstract:

Keyphrases describe a document in a coherent and simple way, giving the prospective reader a way to quickly determine whether the document satisfies their information needs. The pervasion of huge amount of information on Web, with only a small amount of documents have keyphrases extracted, there is a definite need to discover automatic keyphrase extraction systems. Typically, a document written by human develops around one or more general concepts or sub-concepts. These concepts or sub-concepts should be structured and semantically related with each other, so that they can form the meaningful representation of a document. Considering the fact, the phrases or concepts in a document are related to each other, a new approach for keyphrase extraction is introduced that exploits the semantic relations in the document. For measuring the semantic relations between concepts or sub-concepts in the document, I present a comprehensive study aimed at using collaboratively constructed semantic resources like Wikipedia and its link structure. In particular, I introduce a graph-based keyphrase extraction system that exploits the semantic relations in the document and features such as term frequency. I evaluated the proposed system using novel measures and the results obtained compare favorably with previously published results on established benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

2

Qian, Yujie. "A graph-based framework for information extraction." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122765.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 43-45).
Most modern Information Extraction (IE) systems are implemented as sequential taggers and only model local dependencies. Non-local and non-sequential context is, however, a valuable source of information to improve predictions. In this thesis, we introduce a graph-based framework (GraphIE) that operates over a graph representing a broad set of dependencies between textual units (i.e. words or sentences). The algorithm propagates information between connected nodes through graph convolutions, generating a richer representation that can be exploited to improve word-level predictions. Evaluation on three different tasks -- namely textual, social media and visual information extraction -- shows that GraphlE consistently outperforms the state-of-the-art sequence tagging model by a significant margin.
by Yujie Qian.
S.M.
S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

3

Huang, Zan, Wingyan Chung, and Hsinchun Chen. "A Graph Model for E-Commerce Recommender Systems." Wiley Periodicals, Inc, 2004. http://hdl.handle.net/10150/105683.

Full text

Abstract:

Artificial Intelligence Lab, Department of MIS, University of Arizona
Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customersâ preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.

APA, Harvard, Vancouver, ISO, and other styles

4

Haugeard, Jean-Emmanuel. "Extraction et reconnaissance de primitives dans les façades de Paris à l'aide d'appariement de graphes." Thesis, Cergy-Pontoise, 2010. http://www.theses.fr/2010CERG0497.

Full text

Abstract:

Cette dernière décennie, la modélisation des villes 3D est devenue l'un des enjeux de la recherche multimédia et un axe important en reconnaissance d'objets. Dans cette thèse nous nous sommes intéressés à localiser différentes primitives, plus particulièrement les fenêtres, dans les façades de Paris. Dans un premier temps, nous présentons une analyse des façades et des différentes propriétés des fenêtres. Nous en déduisons et proposons ensuite un algorithme capable d'extraire automatiquement des hypothèses de fenêtres. Dans une deuxième partie, nous abordons l'extraction et la reconnaissance des primitives à l'aide d'appariement de graphes de contours. En effet une image de contours est lisible par l'oeil humain qui effectue un groupement perceptuel et distingue les entités présentes dans la scène. C'est ce mécanisme que nous avons cherché à reproduire. L'image est représentée sous la forme d'un graphe d'adjacence de segments de contours, valué par des informations d'orientation et de proximité des segments de contours. Pour la mise en correspondance inexacte des graphes, nous proposons plusieurs variantes d'une nouvelle similarité basée sur des ensembles de chemins tracés sur les graphes, capables d'effectuer les groupements des contours et robustes aux changements d'échelle. La similarité entre chemins prend en compte la similarité des ensembles de segments de contours et la similarité des régions définies par ces chemins. La sélection des images d'une base contenant un objet particulier s'effectue à l'aide d'un classifieur SVM ou kppv. La localisation des objets dans l'image utilise un système de vote à partir des chemins sélectionnés par l'algorithme d'appariement
This last decade, modeling of 3D city became one of the challenges of multimedia search and an important focus in object recognition. In this thesis we are interested to locate various primitive, especially the windows, in the facades of Paris. At first, we present an analysis of the facades and windows properties. Then we propose an algorithm able to extract automatically window candidates. In a second part, we discuss about extraction and recognition primitives using graph matching of contours. Indeed an image of contours is readable by the human eye, which uses perceptual grouping and makes distinction between entities present in the scene. It is this mechanism that we have tried to replicate. The image is represented as a graph of adjacency of segments of contours, valued by information orientation and proximity to edge segments. For the inexact matching of graphs, we propose several variants of a new similarity based on sets of paths, able to group several contours and robust to scale changes. The similarity between paths takes into account the similarity of sets of segments of contours and the similarity of the regions defined by these paths. The selection of images from a database containing a particular object is done using a KNN or SVM classifier

APA, Harvard, Vancouver, ISO, and other styles

5

Nguyen, Quan M. Eng (Quan T. ). Massachusetts Institute of Technology. "Parallel and scalable neural image segmentation for connectome graph extraction." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/100644.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Title as it appears in MIT Commencement Exercises program, June 5, 2015: Connectomics project : performance engineering neural image segmentation. Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 77-79).
Segmentation of images, the process of grouping together pixels of the same object, is one of the major challenges in connectome extraction. Since connectomics data consist of large quantity of digital information generated by the electron microscope, there is a necessity for a highly scalable system that performs segmentation. To date, the state-of-the-art segmentation libraries such as GALA and NeuroProof lack parallel capability to be run on multicore machines in a distributed setting in order to achieve the scalability desired. Employing many performance engineering techniques, I parallelize a pipeline that uses the existing segmentation algorithms as building blocks to perform segmentation on EM grayscale images. For an input image stack of dimensions 1024 x 1024 x 100, the parallel segmentation program achieves a speedup of 5.3 counting I/O and 9.4 not counting I/O running on an 18-core machine. The program has become I/O bound, which is a better fit to run on a distributed computing framework. In this thesis, the contribution includes coming up with parallel algorithms for constructing a regional adjacency graph from labeled pixels and agglomerating an over-segmentation to obtain the final segmentation. The agglomeration process in particular is challenging to parallelize because most graph-based segmentation libraries entail very complex dependency. This has led many people to believe that the process is inherently sequential. However, I found a way to get good speedup by sacrificing some segmentation quality. It turns out that one could trade o a negligible amount in quality for a large gain in parallelism.
by Quan Nguyen.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

6

Florescu, Corina Andreea. "SurfKE: A Graph-Based Feature Learning Framework for Keyphrase Extraction." Thesis, University of North Texas, 2019. https://digital.library.unt.edu/ark:/67531/metadc1538730/.

Full text

Abstract:

Current unsupervised approaches for keyphrase extraction compute a single importance score for each candidate word by considering the number and quality of its associated words in the graph and they are not flexible enough to incorporate multiple types of information. For instance, nodes in a network may exhibit diverse connectivity patterns which are not captured by the graph-based ranking methods. To address this, we present a new approach to keyphrase extraction that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that our model, which uses phrase graph representations in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.

APA, Harvard, Vancouver, ISO, and other styles

7

Shah, Faaiz Hussain. "Gradual Pattern Extraction from Property Graphs." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS025/document.

Full text

Abstract:

Les bases de données orientées graphes (NoSQL par exemple) permettent de gérer des données dans lesquelles les liens sont importants et des requêtes complexes sur ces données à l’aide d’un environnement dédié offrant un stockage et des traitements spécifiquement destinés à la structure de graphe. Un graphe de propriété dans un environnement NoSQL est alors vu comme un graphe orienté étiqueté dans lequel les étiquettes des nœuds et les relations sont des ensembles d’attributs (propriétés) de la forme (clé:valeur). Cela facilite la représentation de données et de connaissances sous la forme de graphes. De nombreuses applications réelles de telles bases de données sont actuellement connues dans le monde des réseaux sociaux, mais aussi des systèmes de recommandation, de la détection de fraudes, du data-journalisme (pour les panama papers par exemple). De telles structures peuvent cependant être assimilées à des bases NoSQL semi-structurées dans lesquelles toutes les propriétés ne sont pas présentes partout, ce qui conduit à des valeurs non présentes de manière homogène, soit parce que la valeur n’est pas connue (l’âge d’une personne par exemple) ou parce qu’elle n’est pas applicable (l’année du service militaire d’une femme par exemple dans un pays et à une époque à laquelle les femmes ne le faisaient pas). Cela gêne alors les algorithmes d’extraction de connaissance qui ne sont pas tous robustes aux données manquantes. Des approches ont été proposées pour remplacer les données manquantes et permettre aux algorithmes d’être appliqués. Cependant,nous considérons que de telles approches ne sont pas satisfaisantes car elles introduisent un biais ou même des erreurs quand aucune valeur n’était applicable. Dans nos travaux, nous nous focalisons sur l’extraction de motifs graduels à partir de telles bases de données. Ces motifs permettent d’extraire automatiquement les informations corrélées. Une première contribution est alors de définir quels sont les motifs pouvant être extraits à partir de telles bases de données. Nous devons, dans un deuxième temps, étendre les travaux existant dans la littérature pour traiter les valeurs manquantes dans les bases de données graphe, comme décrit ci-dessus. L’application de telles méthodes est alors rendue difficile car les propriétés classiquement appliquées en fouille de données (anti-monotonie) ne sont plus valides. Nous proposons donc une nouvelle approche qui est testée sur des données réelles et synthétiques. Une première forme de motif est extrait à partir des propriétés des nœuds et est étendue pour prendre en compte les relations entre nœuds. Enfin, notre approche est étendue au cas des motifs graduels flous afin de mieux prendre en compte la nature imprécise des connaissances présentes et à extraire. Les expérimentations sur des bases synthétiques ont été menées grâce au développement d’un générateur de bases de données de graphes de propriétés synthétiques. Nous en montrons les résultats en termes de temps calcul et consommation mémoire ainsi qu’en nombre de motifs générés
Graph databases (NoSQL oriented graph databases) provide the ability to manage highly connected data and complex database queries along with the native graph-storage and processing. A property graph in a NoSQL graph engine is a labeled directed graph composed of nodes connected through relationships with a set of attributes or properties in the form of (key:value) pairs. It facilitates to represent the data and knowledge that are in form of graphs. Practical applications of graph database systems have been seen in social networks, recommendation systems, fraud detection, and data journalism, as in the case for panama papers. Often, we face the issue of missing data in such kind of systems. In particular, these semi-structured NoSQL databases lead to a situation where some attributes (properties) are filled-in while other ones are not available, either because they exist but are missing (for instance the age of a person that is unknown) or because they are not applicable for a particular case (for instance the year of military service for a girl in countries where it is mandatory only for boys). Therefore, some keys can be provided for some nodes and not for other ones. In such a scenario, when we want to extract knowledge from these new generation database systems, we face the problem of missing data that arise need for analyzing them. Some approaches have been proposed to replace missing values so as to be able to apply data mining techniques. However, we argue that it is not relevant to consider such approaches so as not to introduce biases or errors. In our work, we focus on the extraction of gradual patterns from property graphs that provide end-users with tools for mining correlations in the data when there exist missing values. Our approach requires first to define gradual patterns in the context of NoSQL property graph and then to extend existing algorithms so as to treat the missing values, because anti-monotonicity of the support can not be considered anymore in a simple manner. Thus, we introduce a novel approach for mining gradual patterns in the presence of missing values and we test it on real and synthetic data. Further to this work, we present our approach for mining such graphs in order to extract frequent gradual patterns in the form of ``the more/less $A_1$,..., the more/less $A_n$" where $A_i$ are information from the graph, should it be from the nodes or from the relationships. In order to retrieve more valuable patterns, we consider fuzzy gradual patterns in the form of ``The more/less the A_1 is F_1,...,the more/less the A_n is F_n" where A_i are attributes retrieved from the graph nodes or relationships and F_i are fuzzy descriptions. For this purpose, we introduce the definitions of such concepts, the corresponding method for extracting the patterns, and the experiments that we have led on synthetic graphs using a graph generator. We show the results in terms of time utilization, memory consumption and the number of patterns being generated

APA, Harvard, Vancouver, ISO, and other styles

8

Sánchez, Yagüe Mónica. "Information extraction and validation of CDFG in NoGap." Thesis, Linköpings universitet, Datorteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93905.

Full text

Abstract:

A Control Data Flow Graph (CDFG) is a Directed Acyclic Graph (DAG) in which a node can be either an operation node or a control node. The target of this kind of graph is to capture allt he control and data flow information of the original hardware description while preserving the various dependencies. This kind of graph is generated by Novel Generator of Accelerators and Processors (NoGap), a design automation tool for Application Specific Instruction-set Processor (ASIP) and accelerator design developed by Per Karlström from the Department of Electrical Engineering of Linköping University. The aim of this project is to validate the graph, check if it fulfills the requirements of its definition. If it does not, it is considered an error and the running process will be aborted. Moreover, useful information will be extracted from the graph for futute work.

APA, Harvard, Vancouver, ISO, and other styles

9

Lilliehöök, Hampus. "Extraction of word senses from bilingual resources using graph-based semantic mirroring." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91880.

Full text

Abstract:

In this thesis we retrieve semantic information that exists implicitly in bilingual data. We gather input data by repeatedly applying the semantic mirroring procedure. The data is then represented by vectors in a large vector space. A resource of synonym clusters is then constructed by performing K-means centroid-based clustering on the vectors. We evaluate the result manually, using dictionaries, and against WordNet, and discuss prospects and applications of this method.
I det här arbetet utvinner vi semantisk information som existerar implicit i tvåspråkig data. Vi samlar indata genom att upprepa proceduren semantisk spegling. Datan representeras som vektorer i en stor vektorrymd. Vi bygger sedan en resurs med synonymkluster genom att applicera K-means-algoritmen på vektorerna. Vi granskar resultatet för hand med hjälp av ordböcker, och mot WordNet, och diskuterar möjligheter och tillämpningar för metoden.

APA, Harvard, Vancouver, ISO, and other styles

10

Hamid, Fahmida. "Evaluation Techniques and Graph-Based Algorithms for Automatic Summarization and Keyphrase Extraction." Thesis, University of North Texas, 2016. https://digital.library.unt.edu/ark:/67531/metadc862796/.

Full text

Abstract:

Automatic text summarization and keyphrase extraction are two interesting areas of research which extend along natural language processing and information retrieval. They have recently become very popular because of their wide applicability. Devising generic techniques for these tasks is challenging due to several issues. Yet we have a good number of intelligent systems performing the tasks. As different systems are designed with different perspectives, evaluating their performances with a generic strategy is crucial. It has also become immensely important to evaluate the performances with minimal human effort. In our work, we focus on designing a relativized scale for evaluating different algorithms. This is our major contribution which challenges the traditional approach of working with an absolute scale. We consider the impact of some of the environment variables (length of the document, references, and system-generated outputs) on the performance. Instead of defining some rigid lengths, we show how to adjust to their variations. We prove a mathematically sound baseline that should work for all kinds of documents. We emphasize automatically determining the syntactic well-formedness of the structures (sentences). We also propose defining an equivalence class for each unit (e.g. word) instead of the exact string matching strategy. We show an evaluation approach that considers the weighted relatedness of multiple references to adjust to the degree of disagreements between the gold standards. We publish the proposed approach as a free tool so that other systems can use it. We have also accumulated a dataset (scientific articles) with a reference summary and keyphrases for each document. Our approach is applicable not only for evaluating single-document based tasks but also for evaluating multiple-document based tasks. We have tested our evaluation method for three intrinsic tasks (taken from DUC 2004 conference), and in all three cases, it correlates positively with ROUGE. Based on our experiments for DUC 2004 Question-Answering task, it correlates with the human decision (extrinsic task) with 36.008% of accuracy. In general, we can state that the proposed relativized scale performs as well as the popular technique (ROUGE) with flexibility for the length of the output. As part of the evaluation we have also devised a new graph-based algorithm focusing on sentiment analysis. The proposed model can extract units (e.g. words or sentences) from the original text belonging either to the positive sentiment-pole or to the negative sentiment-pole. It embeds both (positive and negative) types of sentiment-flow into a single text-graph. The text-graph is composed with words or phrases as nodes, and their relations as edges. By recursively calling two mutually exclusive relations the model builds the final rank of the nodes. Based on the final rank, it splits two segments from the article: one with highly positive sentiment and the other with highly negative sentiments. The output of this model was tested with the non-polar TextRank generated output to quantify how much of the polar summaries actually covers the fact along with sentiment.

APA, Harvard, Vancouver, ISO, and other styles

11

Gamal, Eldin Ahmed. "Point process and graph cut applied to 2D and 3D object extraction." Nice, 2011. http://www.theses.fr/2011NICE4107.

Full text

Abstract:

L’objectif de cette thèse est de développer une nouvelle approche de détection d’objets 3D à partir d’une image 2D, prenant en compte les occultations et les phénomènes de perspective. Cette approche est fondée sur la théorie des processus ponctuels marqués, qui a fait ses preuves dans la solution de plusieurs problèmes en imagerie haute résolution. Le travail de la thèse est structuré en deux parties. Dans la première partie, nous proposons une nouvelle méthode probabiliste pour gérer les occultations et les effets de perspective. Le modèle proposé est fondé sur la simulation d’une scène 3D utilisant OpenGL sur une carte graphique (GPU). C’est une méthode orientée objet, intégrée dans le cadre d’un processus ponctuel marqué. Nous l’appliquons pour l’estimation de la taille d’une colonie de manchots, là où nous modélisons certaines configurations candidat composé d’objet 3D s’appuyant sur le plan réel. Une densité de Gibbs est définie sur l’espace des configurations, qui prend en compte des informations a priori et sur les données. Pour une configuration proposée, la scène est projetée sur le plan image, et les configurations sont modifiées jusqu’à convergence. Pour évaluer une configuration proposée, nous mesurons la similarité entre l’image projetée de la configuration projetée et l’image réelle, définissant ainsi le terme d’attache aux données et l’a priori pénalisant les recouvrements entre objets. Nous avons introduit des modifications dans l’algorithme d’optimisation pour prendre en compte les nouvelles dépendances qui existent dans notre modèle 3D. Nous proposons une nouvelle méthode d’optimisation appelée « Naissances et Coupe multiples » (Multiple Births and Cut » (MBC) en anglais). Cette méthode combine à la fois la nouvelle méthode d’optimisation « Naissance et mort multiples » (MBD) et les « Graph-Cut ». Les méthodes MBC et MBD sont utilisées pour l’optimisation d’un processus ponctuel marqué. Nous avons comparé les algorithmes MBC et MBD montrant que les principaux avantages de notre algorithme nouvellement proposé sont la réduction du nombre de paramètres, la vitesse de convergence et de la qualité des résultats obtenus. Nous avons validé notre algorithme sur le problème de dénombrement des flamants roses dans une colonie
The topic of this thesis is to develop a novel approach for 3D object detection from a 2D image. This approach takes into consideration the occlusions and the perspective effects. This work has been embedded in a marked point process framework, proved to be efficient for solving many challenging problems dealing with high resolution images. The accomplished work during the thesis can be presented in two parts : In the first part, we propose a novel probabilistic approach to handle occlusions and perspective effects. The proposed method is based on 3D scene simulation on the GPU using OpenGL. It is an object based method embedded in a marked point process framework. We apply it for the size estimation of a penguin colony, where we model a penguin colony as an unknown number of 3D objects. The main idea of the proposed approach is to sample some candidate configurations consisting of 3D objects lying on the real plane. A Gibbs energy is define on the configuration space, which takes into account both prior and data information. The proposed configurations are projected onto the image plane, and the configurations are modified until convergence. To evaluate a proposed configuration, we measure the similarity between the projected image of the proposed configuration and the real image, by defining a data term and a prior term which penalize objects overlapping. We introduced modifications to the optimization algorithm to take into account new dependencies that exists in our 3D model. In the second part, we propose a new optimization method which we call “Multiple Births and Cut” (MBC). It combines the recently developed optimization algorithm Multiple Births and Deaths (MBD) and the Graph-Cut. MBD and MBC optimization methods are applied for the optimization of a marked point process. We compared the MBC to the MBD algorithms showing that the main advantage of our newly proposed algorithm is the reduction of the number of parameters, the speed of convergence and the quality of the obtained results. We validated our algorithm on the counting problem of flamingos in a colony

APA, Harvard, Vancouver, ISO, and other styles

12

Ajamlou, Kevin, and Max Sonebäck. "Multimodal Convolutional Graph Neural Networks for Information Extraction from Visually Rich Documents." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445457.

Full text

Abstract:

Monotonous and repetitive tasks consume a lot of time and resources in businesses today and the incentive to fully or partially automate said tasks, in order to relieve office workers and increase productivity in the industry, is therefore high. One such task is to process and extract information from Visually Rich Documents (VRD:s), e.g., documents where the visual attributes contain important information about the contents of the document. A lot of recent studies have focused on information extraction from invoices, where graph based convolutional nerual networks have shown a lot of promise for extracting relevant entities. By modelling the invoice as a graph, the text of the invoice can be modelled as nodes and the topological relationship between nodes, i.e., the visual representation of the document, can be preserved by connecting the nodes through edges. The idea is then to propagate the features of neighboring nodes to each other in order to find meaningful patterns for distinct entities in the document, based on both the features of the node itself as well as the features of its neighbors. This master thesis aims to investigate, analyze and compare the performances of state-of-the-art multimodal graph based convolutional neural networks, as well as evaluate how well the models generalize across unseen invoice templates. Three models, with two different model architecture designs, have been trained with either underlying ChebNet or GCN convolutional layers. Two of these models have been re-trained, and compared to their predecessors, using the over-smoothing combatting technique DropEdge. All models have been tested on two datasets - one containing both seen and unseen templates and a subset of the previous dataset, containing only invoices with unseen templates. The results show that multimodal graph based convolutional neural networks are a viable option for information extraction from invoices and that the models built in this thesis show great potential to generalize across unseen invoice templates. Moreover, due to an inherent sparse nature of graphs modeled from invoices, DropEdge does not yield an overall better performance for the models.

APA, Harvard, Vancouver, ISO, and other styles

13

Lee, Mark de Jersey. "A graph theoretic approach to region and edge extraction in image signal processing." Thesis, Imperial College London, 1987. http://hdl.handle.net/10044/1/47037.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Wu, Christopher James. "SKEWER: Sentiment Knowledge Extraction With Entity Recognition." DigitalCommons@CalPoly, 2016. https://digitalcommons.calpoly.edu/theses/1615.

Full text

Abstract:

The California state legislature introduces approximately 5,000 new bills each legislative session. While the legislative hearings are recorded on video, the recordings are not easily accessible to the public. The lack of official transcripts or summaries also increases the effort required to gain meaningful insight from those recordings. Therefore, the news media and the general population are largely oblivious to what transpires during legislative sessions. Digital Democracy, a project started by the Cal Poly Institute for Advanced Technology and Public Policy, is an online platform created to bring transparency to the California legislature. It features a searchable database of state legislative committee hearings, with each hearing accompanied by a transcript that was generated by an internal transcription tool. This thesis presents SKEWER, a pipeline for building a spoken-word knowledge graph from those transcripts. SKEWER utilizes a number of natural language processing tools to extract named entities, phrases, and sentiments from the transcript texts and aggregates the results of those tools into a graph database. The resulting graph can be queried to discover knowledge regarding the positions of legislators, lobbyists, and the general public towards specific bills or topics, and how those positions are expressed in committee hearings. Several case studies are presented to illustrate the new knowledge that can be acquired from the knowledge graph.

APA, Harvard, Vancouver, ISO, and other styles

15

Lalithsena, Sarasi. "Domain-specific Knowledge Extraction from the Web of Data." Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527202092744638.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Rahman, Md Rashedur. "Knowledge Base Population based on Entity Graph Analysis." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS092/document.

Full text

Abstract:

Le peuplement de base de connaissance (KBP) est une tâche importante qui présente de nombreux défis pour le traitement automatique des langues. L'objectif de cette tâche est d'extraire des connaissances de textes et de les structurer afin de compléter une base de connaissances. Nous nous sommes intéressés à la reconnaissance de relations entre entités. L'extraction de relations (RE) entre une paire de mentions d'entités est une tâche difficile en particulier pour les relations en domaine ouvert. Généralement, ces relations sont extraites en fonction des informations lexicales et syntaxiques au niveau de la phrase. Cependant, l'exploitation d'informations globales sur les entités n'a pas encore été explorée. Nous proposons d'extraire un graphe d'entités du corpus global et de calculer des caractéristiques sur ce graphe afin de capturer des indices des relations entre paires d'entités. Pour évaluer la pertinence des fonctionnalités proposées, nous les avons testées sur une tâche de validation de relation dont le but est de décider l'exactitude de relations extraites par différents systèmes. Les résultats expérimentaux montrent que les caractéristiques proposées conduisent à améliorer les résultats de l'état de l'art
Knowledge Base Population (KBP) is an important and challenging task specially when it has to be done automatically. The objective of KBP task is to make a collection of facts of the world. A Knowledge Base (KB) contains different entities, relationships among them and various properties of the entities. Relation extraction (RE) between a pair of entity mentions from text plays a vital role in KBP task. RE is also a challenging task specially for open domain relations. Generally, relations are extracted based on the lexical and syntactical information at the sentence level. However, global information about known entities has not been explored yet for RE task. We propose to extract a graph of entities from the overall corpus and to compute features on this graph that are able to capture some evidence of holding relationships between a pair of entities. In order to evaluate the relevance of the proposed features, we tested them on a task of relation validation which examines the correctness of relations that are extracted by different RE systems. Experimental results show that the proposed features lead to outperforming the state-of-the-art system

APA, Harvard, Vancouver, ISO, and other styles

17

Jen, Chun-Heng. "Exploring Construction of a Company Domain-Specific Knowledge Graph from Financial Texts Using Hybrid Information Extraction." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291107.

Full text

Abstract:

Companies do not exist in isolation. They are embedded in structural relationships with each other. Mapping a given company’s relationships with other companies in terms of competitors, subsidiaries, suppliers, and customers are key to understanding a company’s major risk factors and opportunities. Conventionally, obtaining and staying up to date with this key knowledge was achieved by reading financial news and reports by highly skilled manual labor like a financial analyst. However, with the development of Natural Language Processing (NLP) and graph databases, it is now possible to systematically extract and store structured information from unstructured data sources. The current go-to method to effectively extract information uses supervised machine learning models, which require a large amount of labeled training data. The data labeling process is usually time-consuming and hard to get in a domain-specific area. This project explores an approach to construct a company domain-specific Knowledge Graph (KG) that contains company-related entities and relationships from the U.S. Securities and Exchange Commission (SEC) 10-K filings by combining a pre-trained general NLP with rule-based patterns in Named Entity Recognition (NER) and Relation Extraction (RE). This approach eliminates the time-consuming data-labeling task in the statistical approach, and by evaluating ten 10-k filings, the model has the overall Recall of 53.6%, Precision of 75.7%, and the F1-score of 62.8%. The result shows it is possible to extract company information using the hybrid methods, which does not require a large amount of labeled training data. However, the project requires the time-consuming process of finding lexical patterns from sentences to extract company-related entities and relationships.
Företag existerar inte som isolerade organisationer. De är inbäddade i strukturella relationer med varandra. Att kartlägga ett visst företags relationer med andra företag när det gäller konkurrenter, dotterbolag, leverantörer och kunder är nyckeln till att förstå företagets huvudsakliga riskfaktorer och möjligheter. Det konventionella sättet att hålla sig uppdaterad med denna viktiga kunskap var genom att läsa ekonomiska nyheter och rapporter från högkvalificerad manuell arbetskraft som till exempel en finansanalytiker. Men med utvecklingen av ”Natural Language Processing” (NLP) och grafdatabaser är det nu möjligt att systematiskt extrahera och lagra strukturerad information från ostrukturerade datakällor. Den nuvarande metoden för att effektivt extrahera information använder övervakade maskininlärningsmodeller som kräver en stor mängd märkta träningsdata. Datamärkningsprocessen är vanligtvis tidskrävande och svår att få i ett domänspecifikt område. Detta projekt utforskar ett tillvägagångssätt för att konstruera en företagsdomänspecifikt ”Knowledge Graph” (KG) som innehåller företagsrelaterade enheter och relationer från SEC 10-K-arkivering genom att kombinera en i förväg tränad allmän NLP med regelbaserade mönster i ”Named Entity Recognition” (NER) och ”Relation Extraction” (RE). Detta tillvägagångssätt eliminerar den tidskrävande datamärkningsuppgiften i det statistiska tillvägagångssättet och genom att utvärdera tio SEC 10-K arkiv har modellen den totala återkallelsen på 53,6 %, precision på 75,7 % och F1-poängen på 62,8 %. Resultatet visar att det är möjligt att extrahera företagsinformation med hybridmetoderna, vilket inte kräver en stor mängd märkta träningsdata. Projektet kräver dock en tidskrävande process för att hitta lexikala mönster från meningar för att extrahera företagsrelaterade enheter och relationer.

APA, Harvard, Vancouver, ISO, and other styles

18

Afzal, Mansoor. "Graph-Based Visualization of Ontology-Based Competence Profiles for Research Collaboration." Thesis, Tekniska Högskolan, Högskolan i Jönköping, JTH. Forskningsmiljö Informationsteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-20123.

Full text

Abstract:

Information visualization can be valuable in a wide range of applications, it deals with abstract, non-spatial data and with the representation of data elements in a meaningful form irrespective of the size of the data, because sometimes visualization itself focuses on the certain key aspects of the data in the representation and thus it helps by providing ease for the goal oriented interpretation. Information visualization focuses on providing a spontaneous and deeper level of the understanding of the data. Research collaboration enhances sharing knowledge and also enhances an individual’s talent. New ideas are generated when knowledge is shared and transferred among each other. According to (He et al, 2009) Research collaboration has been considered as a phenomenon of growing importance for the researchers, also it should be encouraged and is considered to be a “good thing” among the researchers. The main purpose of this thesis work is to prepare a model for the competence profile visualization purpose. For this purpose the study of different visualization techniques that exist in the field of information visualization are discussed in this thesis work. The study and discussion about the visualization techniques motivates in selecting appropriate visualization techniques for the visualization of Ontology-based competence profiles for research collaboration purpose. A proof of concept is developed which shows how these visualization techniques are applied to visualize several components of competence profile.

APA, Harvard, Vancouver, ISO, and other styles

19

Ozturk, Gizem. "A Hybrid Veideo Recommendation System Based On A Graph Based Algorithm." Master's thesis, METU, 2010. http://etd.lib.metu.edu.tr/upload/12612624/index.pdf.

Full text

Abstract:

This thesis proposes the design, development and evaluation of a hybrid video recommendation system. The proposed hybrid video recommendation system is based on a graph algorithm called Adsorption. Adsorption is a collaborative filtering algorithm in which relations between users are used to make recommendations. Adsorption is used to generate the base recommendation list. In order to overcome the problems that occur in pure collaborative system, content based filtering is injected. Content based filtering uses the idea of suggesting similar items that matches user preferences. In order to use content based filtering, first, the base recommendation list is updated by removing weak recommendations. Following this, item similarities of the remaining list are calculated and new items are inserted to form the final recommendations. Thus, collaborative recommendations are empowered considering item similarities. Therefore, the developed hybrid system combines both collaborative and content based approaches to produce more effective suggestions.

APA, Harvard, Vancouver, ISO, and other styles

20

Jönsson, Mattias, and Lucas Borg. "How to explain graph-based semi-supervised learning for non-mathematicians?" Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20339.

Full text

Abstract:

Den stora mängden tillgänglig data på internet kan användas för att förbättra förutsägelser genom maskininlärning. Problemet är att sådan data ofta är i ett obehandlat format och kräver att någon manuellt bestämmer etiketter på den insamlade datan innan den kan användas av algoritmen. Semi-supervised learning (SSL) är en teknik där algoritmen använder ett fåtal förbehandlade exempel och därefter automatiskt bestämmer etiketter för resterande data. Ett tillvägagångssätt inom SSL är att representera datan i en graf, vilket kallas för graf-baserad semi-supervised learning (GSSL), och sedan hitta likheter mellan noderna i grafen för att automatiskt bestämma etiketter.Vårt mål i denna uppsatsen är att förenkla de avancerade processerna och stegen för att implementera en GSSL-algoritm. Vi kommer att gå igen grundläggande steg som hur utvecklingsmiljön ska installeras men även mer avancerade steg som data pre-processering och feature extraction. Feature extraction metoderna som uppsatsen använder sig av är bag-of-words (BOW) och term frequency-inverse document frequency (TF-IDF). Slutgiltligen presenterar vi klassificering av dokument med Label Propagation (LP) och Multinomial Naive Bayes (MNB) samt en detaljerad beskrivning över hur GSSL fungerar.Vi presenterar även prestanda för klassificering-algoritmerna genom att klassificera 20 Newsgroup datasetet med LP och MNB. Resultaten dokumenteras genom två olika utvärderingspoäng vilka är F1-score och accuracy. Vi gör även en jämförelse mellan MNB och LP med två olika typer av kärnor, KNN och RBF, på olika mängder av förbehandlade träningsdokument. Resultaten ifrån klassificering-algoritmerna visar att MNB är bättre på att klassificera datasetet än LP.
The large amount of available data on the web can be used to improve the predictions made by machine learning algorithms. The problem is that such data is often in a raw format and needs to be manually labeled by a human before it can be used by a machine learning algorithm. Semi-supervised learning (SSL) is a technique where the algorithm uses a few prepared samples to automatically prepare the rest of the data. One approach to SSL is to represent the data in a graph, also called graph-based semi-supervised learning (GSSL), and find similarities between the nodes for automatic labeling.Our goal in this thesis is to simplify the advanced processes and steps to implement a GSSL-algorithm. We will cover basic tasks such as setup of the developing environment and more advanced steps such as data preprocessing and feature extraction. The feature extraction techniques covered are bag-of-words (BOW) and term frequency-inverse document frequency (TF-IDF). Lastly, we present how to classify documents using Label Propagation (LP) and Multinomial Naive Bayes (MNB) with a detailed explanation of the inner workings of GSSL. We showcased the classification performance by classifying documents from the 20 Newsgroup dataset using LP and MNB. The results are documented using two different evaluation scores called F1-score and accuracy. A comparison between MNB and the LP-algorithm using two different types of kernels, KNN and RBF, was made on different amount of labeled documents. The results from the classification algorithms shows that MNB is better at classifying the data than LP.

APA, Harvard, Vancouver, ISO, and other styles

21

Singh, Maninder. "Using Machine Learning and Graph Mining Approaches to Improve Software Requirements Quality: An Empirical Investigation." Diss., North Dakota State University, 2019. https://hdl.handle.net/10365/29803.

Full text

Abstract:

Software development is prone to software faults due to the involvement of multiple stakeholders especially during the fuzzy phases (requirements and design). Software inspections are commonly used in industry to detect and fix problems in requirements and design artifacts, thereby mitigating the fault propagation to later phases where the same faults are harder to find and fix. The output of an inspection process is list of faults that are present in software requirements specification document (SRS). The artifact author must manually read through the reviews and differentiate between true-faults and false-positives before fixing the faults. The first goal of this research is to automate the detection of useful vs. non-useful reviews. Next, post-inspection, requirements author has to manually extract key problematic topics from useful reviews that can be mapped to individual requirements in an SRS to identify fault-prone requirements. The second goal of this research is to automate this mapping by employing Key phrase extraction (KPE) algorithms and semantic analysis (SA) approaches to identify fault-prone requirements. During fault-fixations, the author has to manually verify the requirements that could have been impacted by a fix. The third goal of my research is to assist the authors post-inspection to handle change impact analysis (CIA) during fault fixation using NL processing with semantic analysis and mining solutions from graph theory. The selection of quality inspectors during inspections is pertinent to be able to carry out post-inspection tasks accurately. The fourth goal of this research is to identify skilled inspectors using various classification and feature selection approaches. The dissertation has led to the development of automated solution that can identify useful reviews, help identify skilled inspectors, extract most prominent topics/keyphrases from fault logs; and help RE author during the fault-fixation post inspection.

APA, Harvard, Vancouver, ISO, and other styles

22

Severini, Nicola. "Analysis, Development and Experimentation of a Cognitive Discovery Pipeline for the Generation of Insights from Informal Knowledge." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21013/.

Full text

Abstract:

The purpose of this thesis project is to bring the application of Cognitive Discovery to an informal type of knowledge. Cognitive Discovery is a term coined by IBM Research to indicate a series of Information Extraction (IE) processes in order to build a knowledge graph capable of representing knowledge from highly unstructured data such as text. Cognitive Discovery is typically applied to a type of formal knowledge, i.e. of the documented text such as academic papers, business reports, patents, etc. While informal knowledge is provided, for example, by recording a conversation within a meeting or through a Power Point presentation, therefore a type of knowledge not formally defined. The idea behind the project is the same as that of the original Cognitive Discovery project, that is the processing of natural language in order to build a knowledge graph that can be interrogated in different ways. This knowledge graph will have an architecture that will depend on the use case, but tends to be a network of entity nodes connected to each other through a certain semantic relationship and to a certain type of nodes containing structural data such as a paragraph, an image or a slide from a presentation. The creation of this graph requires a series of steps, a data processing pipeline that starting from the raw data (in the specific case of the prototype the audio file of the conversation) a series of features are extracted and processed such as entities, semantic relationships between entities, main concepts etc. Once the graph has been created, it is necessary to define an engine for querying and / or generating insights from the knowledge graph; in general the graph database infrastructure also provides a language for querying the graph, however to make the application usable even for those who do not have the technical knowledge necessary to learn the query language, a component has been defined to process the natural language query to query the graph.

APA, Harvard, Vancouver, ISO, and other styles

23

Viana, do Espírito Santo Ilísio. "Inspection automatisée d’assemblages mécaniques aéronautiques par vision artificielle : une approche exploitant le modèle CAO." Thesis, Ecole nationale des Mines d'Albi-Carmaux, 2016. http://www.theses.fr/2016EMAC0022/document.

Full text

Abstract:

Les travaux présentés dans ce manuscrit s’inscrivent dans le contexte de l’inspection automatisée d’assemblages mécaniques aéronautiques par vision artificielle. Il s’agit de décider si l’assemblage mécanique a été correctement réalisé (assemblage conforme). Les travaux ont été menés dans le cadre de deux projets industriels. Le projet CAAMVis d’une part, dans lequel le capteur d’inspection est constitué d’une double tête stéréoscopique portée par un robot, le projet Lynx© d’autre part, dans lequel le capteur d’inspection est une caméra Pan/Tilt/Zoom (vision monoculaire). Ces deux projets ont pour point commun la volonté d’exploiter au mieux le modèle CAO de l’assemblage (qui fournit l’état de référence souhaité) dans la tâche d’inspection qui est basée sur l’analyse de l’image ou des images 2D fournies par le capteur. La méthode développée consiste à comparer une image 2D acquise par le capteur (désignée par « image réelle ») avec une image 2D synthétique, générée à partir du modèle CAO. Les images réelles et synthétiques sont segmentées puis décomposées en un ensemble de primitives 2D. Ces primitives sont ensuite appariées, en exploitant des concepts de la théorie de graphes, notamment l’utilisation d’un graphe biparti pour s’assurer du respect de la contrainte d’unicité dans le processus d’appariement. Le résultat de l’appariement permet de statuer sur la conformité ou la non-conformité de l’assemblage. L’approche proposée a été validée à la fois sur des données de simulation et sur des données réelles acquises dans le cadre des projets sus-cités
The work presented in this manuscript deals with automated inspection of aeronautical mechanical parts using computer vision. The goal is to decide whether a mechanical assembly has been assembled correctly i.e. if it is compliant with the specifications. This work was conducted within two industrial projects. On one hand the CAAMVis project, in which the inspection sensor consists of a dual stereoscopic head (stereovision) carried by a robot, on the other hand the Lynx© project, in which the inspection sensor is a single Pan/Tilt/Zoom camera (monocular vision). These two projects share the common objective of exploiting as much as possible the CAD model of the assembly (which provides the desired reference state) in the inspection task which is based on the analysis of the 2D images provided by the sensor. The proposed method consists in comparing a 2D image acquired by the sensor (referred to as "real image") with a synthetic 2D image generated from the CAD model. The real and synthetic images are segmented and then decomposed into a set of 2D primitives. These primitives are then matched by exploiting concepts from the graph theory, namely the use of a bipartite graph to guarantee the respect of the uniqueness constraint required in such a matching process. The matching result allows to decide whether the assembly has been assembled correctly or not. The proposed approach was validated on both simulation data and real data acquired within the above-mentioned projects

APA, Harvard, Vancouver, ISO, and other styles

24

Kim, Sungmin. "Community Detection in Directed Networks and its Application to Analysis of Social Networks." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1397571499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Ben, salah Imeen. "Extraction d'un graphe de navigabilité à partir d'un nuage de points 3D enrichis." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMR070/document.

Full text

Abstract:

Les caméras sont devenues de plus en plus communes dans les véhicules, les smartphones et les systèmes d'aide à la conduite ADAS (Advanced Driver Assistance Systèmes). Les domaines d'application de ces caméras dans le monde des systèmes intelligents de transport deviennent de plus en plus variés : la détection des piétons, les avertissements de franchissement de ligne, la navigation... La navigation basée sur la vision a atteint une certaine maturité durant ces dernières années grâce à l'utilisation de technologies avancées. Les systèmes de navigation basée sur la vision ont le considérable avantage de pouvoir utiliser directement les informations visuelles présentes dans l'environnement, sans devoir adapter le moindre élément de l'infrastructure. De plus, contrairement aux systèmes utilisant le GPS, ils peuvent être utilisés à l'extérieur ainsi qu'à l'intérieur des locaux et des bâtiments sans aucune perte de précision. C'est pour ces raisons que les systèmes basés sur la vision sont une bonne option car ils fournissent des informations très riches et précises sur l'environnement, qui peuvent être utilisées pour la navigation. Un axe important de recherche porte actuellement sur la cartographie qui représente une étape indispensable pour la navigation. Cette étape engendre une problématique de la gestion de la mémoire assez conséquente requise par ces systèmes en raison de la quantité d'informations importante collectées par chaque capteur. En effet, l'espace mémoire nécessaire pour accueillir la carte d'une petite ville se mesure en dizaines de GO voire des milliers lorsque l'on souhaite couvrir des espaces de grandes dimensions. Cela rend impossible son intégration dans un système mobile tel que les smartphones, les véhicules, les vélos ou les robots. Le défi serait donc de développer de nouveaux algorithmes permettant de diminuer au maximum la taille de la mémoire nécessaire pour faire fonctionner ce système de localisation par vision. C'est dans ce contexte que se situe notre projet qui consiste à développer un nouveau système capable de résumer une carte 3D qui contient des informations visuelles collectées par plusieurs capteurs. Le résumé sera un ensemble des vues sphériques permettant de garder le même niveau de visibilité dans toutes les directions. Cela permettrait aussi de garantir, à moindre coût, un bon niveau de précision et de rapidité lors de la navigation. La carte résumant l'environnement sera constituée d'un ensemble d'informations géométriques, photométriques et sémantiques
Cameras have become increasingly common in vehicles, smart phones, and advanced driver assistance systems. The areas of application of these cameras in the world of intelligent transportation systems are becoming more and more varied : pedestrian detection, line crossing detection, navigation ... Vision-based navigation has reached a certain maturity in recent years through the use of advanced technologies. Vision-based navigation systems have the considerable advantage of being able to directly use the visual information already existing in the environment without having to adapt any element of the infrastructure. In addition, unlike systems using GPS, they can be used outdoors and indoors without any loss of precision. This guarantees the superiority of these systems based on computer vision. A major area of {research currently focuses on mapping, which represents an essential step for navigation. This step generates a problem of memory management quite substantial required by these systems because of the huge amount of information collected by each sensor. Indeed, the memory space required to accommodate the map of a small city is measured in tens of GB or even thousands when one wants to cover large spaces. This makes impossible to integrate this map into a mobile system such as smartphones , cameras embedded in vehicles or robots. The challenge would be to develop new algorithms to minimize the size of the memory needed to operate this navigation system using only computer vision. It's in this context that our project consists in developing a new system able to summarize a3D map resulting from the visual information collected by several sensors. The summary will be a set of spherical views allow to keep the same level of visibility in all directions. It would also guarantee, at a lower cost, a good level of precision and speed during navigation. The summary map of the environment will contain geometric, photometric and semantic information

APA, Harvard, Vancouver, ISO, and other styles

26

Ngo, Duy Hoa. "Enhancing Ontology Matching by Using Machine Learning, Graph Matching and Information Retrieval Techniques." Thesis, Montpellier 2, 2012. http://www.theses.fr/2012MON20096/document.

Full text

Abstract:

Ces dernières années, les ontologies ont suscité de nombreux travaux dans le domaine du web sémantique. Elles sont utilisées pour fournir le vocabulaire sémantique permettant de rendre la connaissance du domaine disponible pour l'échange et l'interprétation au travers des systèmes d'information. Toutefois, en raison de la nature décentralisée du web sémantique, les ontologies sont très hétérogènes. Cette hétérogénéité provoque le problème de la variation de sens ou ambiguïté dans l'interprétation des entités et, par conséquent, elle empêche le partage des connaissances du domaine. L'alignement d'ontologies, qui a pour but la découverte des correspondances sémantiques entre des ontologies, devient une tâche cruciale pour résoudre ce problème d'hétérogénéité dans les applications du web sémantique. Les principaux défis dans le domaine de l'alignement d'ontologies ont été décrits dans des études récentes. Parmi eux, la sélection de mesures de similarité appropriées ainsi que le réglage de la configuration de leur combinaison sont connus pour être des problèmes fondamentaux que la communauté doit traiter. En outre, la vérification de la cohérence sémantique des correspondances est connue pour être une tâche importante. Par ailleurs, la difficulté du problème augmente avec la taille des ontologies. Pour faire face à ces défis, nous proposons dans cette thèse une nouvelle approche, qui combine différentes techniques issues des domaines de l'apprentissage automatique, d'appariement de graphes et de recherche d'information en vue d'améliorer la qualité de l'alignement d'ontologies. En effet, nous utilisons des techniques de recherche d'information pour concevoir de nouvelles mesures de similarité efficaces afin de comparer les étiquettes et les profils d'entités de contexte au niveau des entités. Nous appliquons également une méthode d'appariement de graphes appelée propagation de similarité au niveau de la structure qui découvre effectivement des correspondances en exploitant des informations structurelles des entités. Pour combiner les mesures de similarité au niveau des entités, nous transformons la tâche de l'alignement d'ontologie en une tâche de classification de l'apprentissage automatique. Par ailleurs, nous proposons une méthode dynamique de la somme pondérée pour combiner automatiquement les correspondances obtenues au niveau des entités et celles obtenues au niveau de la structure. Afin d'écarter les correspondances incohérentes, nous avons conçu une nouvelle méthode de filtrage sémantique. Enfin, pour traiter le problème de l'alignement d'ontologies à large échelle, nous proposons deux méthodes de sélection des candidats pour réduire l'espace de calcul.Toutes ces contributions ont été mises en œuvre dans un prototype nommé YAM++. Pour évaluer notre approche, nous avons utilisé des données du banc d'essai de la compétition OAEI : Benchmark, Conference, Multifarm, Anatomy, Library and Large Biomedical Ontologies. Les résultats expérimentaux montrent que les méthodes proposées sont très efficaces. De plus, en comparaison avec les autres participants à la compétition OAEI, YAM++ a montré sa compétitivité et a acquis une position de haut rang
In recent years, ontologies have attracted a lot of attention in the Computer Science community, especially in the Semantic Web field. They serve as explicit conceptual knowledge models and provide the semantic vocabularies that make domain knowledge available for exchange and interpretation among information systems. However, due to the decentralized nature of the semantic web, ontologies are highlyheterogeneous. This heterogeneity mainly causes the problem of variation in meaning or ambiguity in entity interpretation and, consequently, it prevents domain knowledge sharing. Therefore, ontology matching, which discovers correspondences between semantically related entities of ontologies, becomes a crucial task in semantic web applications.Several challenges to the field of ontology matching have been outlined in recent research. Among them, selection of the appropriate similarity measures as well as configuration tuning of their combination are known as fundamental issues that the community should deal with. In addition, verifying the semantic coherent of the discovered alignment is also known as a crucial task. Furthermore, the difficulty of the problem grows with the size of the ontologies. To deal with these challenges, in this thesis, we propose a novel matching approach, which combines different techniques coming from the fields of machine learning, graph matching and information retrieval in order to enhance the ontology matching quality. Indeed, we make use of information retrieval techniques to design new effective similarity measures for comparing labels and context profiles of entities at element level. We also apply a graph matching method named similarity propagation at structure level that effectively discovers mappings by exploring structural information of entities in the input ontologies. In terms of combination similarity measures at element level, we transform the ontology matching task into a classification task in machine learning. Besides, we propose a dynamic weighted sum method to automatically combine the matching results obtained from the element and structure level matchers. In order to remove inconsistent mappings, we design a new fast semantic filtering method. Finally, to deal with large scale ontology matching task, we propose two candidate selection methods to reduce computational space.All these contributions have been implemented in a prototype named YAM++. To evaluate our approach, we adopt various tracks namely Benchmark, Conference, Multifarm, Anatomy, Library and Large BiomedicalOntologies from the OAEI campaign. The experimental results show that the proposed matching methods work effectively. Moreover, in comparison to other participants in OAEI campaigns, YAM++ showed to be highly competitive and gained a high ranking position

APA, Harvard, Vancouver, ISO, and other styles

27

Bui, Quang Anh. "Vers un système omni-langage de recherche de mots dans des bases de documents écrits homogènes." Thesis, La Rochelle, 2015. http://www.theses.fr/2015LAROS010/document.

Full text

Abstract:

Notre thèse a pour objectif la construction d’un système omni-langage de recherche de mots dans les documents numérisés. Nous nous plaçons dans le contexte où le contenu du document est homogène (ce qui est le cas pour les documents anciens où l’écriture est souvent bien soignée et mono-scripteur) et la connaissance préalable du document (le langage, le scripteur, le type d’écriture, le tampon, etc.) n’est pas connue. Grâce à ce système, l'utilisateur peut composer librement et intuitivement sa requête et il peut rechercher des mots dans des documents homogènes de n’importe quel langage, sans détecter préalablement une occurrence du mot à rechercher. Le point clé du système que nous proposons est les invariants, qui sont les formes les plus fréquentes dans la collection de documents. Pour le requêtage, l’utilisateur pourra créer le mot à rechercher en utilisant les invariants (la composition des requêtes), grâce à une interface visuelle. Pour la recherche des mots, les invariants peuvent servir à construire des signatures structurelles pour représenter les images de mots. Nous présentons dans cette thèse la méthode pour extraire automatiquement les invariants à partir de la collection de documents, la méthode pour évaluer la qualité des invariants ainsi que les applications des invariants à la recherche de mots et à la composition des requêtes
The objective of our thesis is to build an omni-language word retrieval system for scanned documents. We place ourselves in the context where the content of documents is homogenous and the prior knowledge about the document (the language, the writer, the writing style, etc.) is not known. Due to this system, user can freely and intuitively compose his/her query. With the query created by the user, he/she can retrieve words in homogenous documents of any language, without finding an occurrence of the word to search. The key of our proposed system is the invariants, which are writing pieces that frequently appeared in the collection of documents. The invariants can be used in query making process in which the user selects and composes appropriate invariants to make the query. They can be also used as structural descriptor to characterize word images in the retrieval process. We introduce in this thesis our method for automatically extracting invariants from document collection, our evaluation method for evaluating the quality of invariants and invariant’s applications in the query making process as well as in the retrieval process

APA, Harvard, Vancouver, ISO, and other styles

28

Christiansen, Cameron Smith. "Data Acquisition from Cemetery Headstones." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3383.

Full text

Abstract:

Data extraction from engraved text is discussed rarely, and nothing in the open literature discusses data extraction from cemetery headstones. Headstone images present unique challenges such as engraved or embossed characters (causing inner-character shadows), low contrast with the background, and significant noise due to inconsistent stone texture and weathering. Current systems for extracting text from outdoor environments (billboards, signs, etc.) make assumptions (i.e. clean and/or consistently-textured background and text) that fail when applied to the domain of engraved text. Additionally, the ability to extract the data found on headstones is of great historical value. This thesis describes a novel and efficient feature-based text zoning and segmentation method for the extraction of noisy text from a highly textured engraved medium. Additionally, the usefulness of constraining a problem to a specific domain is demonstrated. The transcriptions of images zoned and segmented through the proposed system result in a precision of 55% compared to 1% precision without zoning, a 62% recall compared to 39%, an F-measure of 58% compared to 2%, and an error rate of 77% compared to 8303%.

APA, Harvard, Vancouver, ISO, and other styles

29

Dumitrescu, Stefan Daniel. "L' extraction d'information des sources de données non structurées et semi-structurées." Toulouse 3, 2011. http://thesesups.ups-tlse.fr/1555/.

Full text

Abstract:

L'objectif de la thèse: Dans le contexte des dépôts de connaissances de grandes dimensions récemment apparues, on exige l'investigation de nouvelles méthodes innovatrices pour résoudre certains problèmes dans le domaine de l'Extraction de l'Information (EI), tout comme dans d'autres sous-domaines apparentés. La thèse débute par un tour d'ensemble dans le domaine de l'Extraction de l'Information, tout en se concentrant sur le problème de l'identification des entités dans des textes en langage naturel. Cela constitue une démarche nécessaire pour tout système EI. L'apparition des dépôts de connaissances de grandes dimensions permet le traitement des sous-problèmes de désambigüisation au Niveau du Sens (WSD) et La Reconnaissance des Entités dénommées (NER) d'une manière unifiée. Le premier système implémenté dans cette thèse identifie les entités (les noms communs et les noms propres) dans un texte libre et les associe à des entités dans une ontologie, pratiquement, tout en les désambigüisant. Un deuxième système implémenté, inspiré par l'information sémantique contenue dans les ontologies, essaie, également, l'utilisation d'une nouvelle méthode pour la solution du problème classique de classement de texte, obtenant de bons résultats
Thesis objective: In the context of recently developed large scale knowledge sources (general ontologies), investigate possible new approaches to major areas of Information Extraction (IE) and related fields. The thesis overviews the field of Information Extraction and focuses on the task of entity recognition in natural language texts, a required step for any IE system. Given the availability of large knowledge resources in the form of semantic graphs, an approach that treats the sub-tasks of Word Sense Disambiguation and Named Entity Recognition in a unified manner is possible. The first implemented system using this approach recognizes entities (words, both common and proper nouns) from free text and assigns them ontological classes, effectively disambiguating them. A second implemented system, inspired by the semantic information contained in the ontologies, also attempts a new approach to the classic problem of text classification, showing good results

APA, Harvard, Vancouver, ISO, and other styles

30

Carlassare, Giulio. "Similarità semantica e clustering di concetti della letteratura medica rappresentati con language model e knowledge graph di eventi." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23138/.

Full text

Abstract:

Sul web è presente una grande quantità di informazioni principalmente in formato testuale e la diffusione dei social network ne ha incrementato la produzione. La mancanza di struttura rende difficile l'utilizzo della conoscenza contenuta, generalmente espressa da fatti rappresentabili come relazioni (due entità legate da un predicato) o eventi (in cui una parola esprime una semantica relativa anche a molte entità). La ricerca sta muovendo recentemente il proprio interesse verso i Knowledge Graph che permettono di codificare la conoscenza in un grafo dove i nodi rappresentano le entità e gli archi indicano le relazioni fra di esse. Nonostante al momento la loro costruzione richieda molto lavoro manuale, i recenti passi nel campo del Natural Language Understanding offrono strumenti sempre più sofisticati: in particolare, i language model basati su transformer sono la base di molte soluzioni per l'estrazione automatica di conoscenza dal testo. I temi trattati in questa tesi hanno applicazione diretta nell'ambito delle malattie rare: la scarsa disponibilità di informazioni ha portato alla nascita di comunità di pazienti sul web, in cui si scambiano pareri di indubbia rilevanza sulla propria esperienza. Catturare la "voce dei pazienti" può essere molto importante per far conoscere ai medici la visione che i diretti interessati hanno della malattia. Il caso di studio affrontato riguarda una specifica malattia rara, l'acalasia esofagea e il dataset di post pubblicati in un gruppo Facebook ad essa dedicato. Si propone una struttura modulare di riferimento, poi implementata con metodologie precedentemente analizzate. Viene infine presentata una soluzione in cui le interazioni in forma di eventi, estratte anche con l'utilizzo di un language model, vengono rappresentate efficacemente in uno spazio vettoriale che ne rispecchia il contenuto semantico dove è possibile effettuare clustering, calcolarne la similarità e di conseguenza aggregarli in un unico knowledge graph.

APA, Harvard, Vancouver, ISO, and other styles

31

Oliveira, Junior Marcos Antonio de. "Especificação e análise de sistemas através de gramática de grafos." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/142128.

Full text

Abstract:

O crescimento da complexidade e do tamanho dos sistemas computacionais atuais suscitou um aumento na dificuldade de extração e especificação de modelos formais desses sistemas, tornando essa atividade cada vez mais dispendiosa, tanto em tempo quanto em custo. Modelos são utilizados em diversas técnicas da Engenharia de Software, com o intuito de auxiliar em processos que compreendem desde o desenvolvimento de novos softwares, até reconstrução de um sistema a partir de software legado, passando pela realização de manutenção de um software em operação. Portanto, é necessário que essas abstrações sejam confiáveis e representem fielmente o software real. Nesse sentido, a adoção de métodos formais para a construção e análise de modelos computacionais é crescente e motivada, principalmente, pela confiabilidade que os formalismos matemáticos agregam aos modelos. No entanto, a utilização de métodos formais geralmente demanda um alto investimento de recursos humanos e, consequentemente, financeiros, uma vez que a utilização de tais formalismos é condicionada ao estudo profundo de sua fundamentação matemática. Considerando-se a extensa aplicabilidade de modelos em diversas subáreas da Ciência da Computação e as vantagens advindas da utilização de métodos formais para especificar sistemas, é interessante identificar métodos e ferramentas existentes para automatizar os processos de extração e análises de modelos, em conjunto com a adoção de formalismos que possam ser utilizados por profissionais da computação que atuam na indústria de software. Dessa forma, é estimulada nesse trabalho a utilização do formalismo de Gramática de Grafos, um método formal que diferencia-se dos demais por ser intuitivo e possuir uma representação visual gráfica, o que facilita a sua compreensão e não exige um conhecimento avançado sobre o formalismo. Primeiramente, é proposta uma abordagem para a extração de modelos em Gramática de Grafos a partir de código-fonte, extraindo informações de execuções de código Java anotado. Em seguida, é apresentada uma metodologia existente para extração e análise de Gramática de Grafos a partir de Casos de Uso, juntamente com um estudo empírico realizado a fim de validar a metodologia. Por fim, são propostas possíveis verificações adicionais, a fim de extender as análises dessa metodologia. Com isso, busca-se a obtenção de modelos, descritos através do formalismo de grafos, a partir de artefatos criados nos dois pólos do processo de desenvolvimento de software, antes e depois da implementação, no sentido de viabilizar futuras comparações, no contexto de verificação de software.
The growing size and complexity of current computer systems leading to an increase in the difficulty of extraction and specification of formal models of such systems, making it increasingly expensive activity, both in time and in cost. Models are used in various techniques of software engineering in order to assist in processes that range from the development of new software, to rebuild a system from legacy software, passing for performing maintenance of software in operation. Therefore, it is necessary that these abstractions are reliable and faithfully represent the actual software. In this sense, the adoption of formal methods for the construction and analysis of models is growing and motivated mainly by the reliability that the mathematical formalism add to models. However, the use of formal methods generally demands a high investment in human resources and hence financial, since the use of such formalism is conditioned to the deep study of its mathematical foundation. Considering the extensive applicability of models in various subfields of computer science and the benefits arising from the use of formal methods for specifying systems, it is interesting to identify existing methods and tools to automate the process of extracting models, in addition to the adoption of formalism that can be used by computer professionals working in the software industry. Thus, we encourage the use of the Graph Grammar formalism, a formal method that differs from others because it is intuitive and has a graphical visual representation, making it easy to understand and does not require an advanced knowledge of the formalism. First, we propose an approach for extracting models from source code in Graph Grammar, getting information of executions of annotated Java code. Then an existing methodology for extraction and analysis of Graph Grammar from Use Cases is presented, along with an empirical study to validate the methodology. Finally, we propose possible additional checks in order to extend the analysis of this methodology. Thus, this work aims to extract models, described by the formalism of graphs, from artifacts created in the two poles of the software development process, before and after implementation, in order to allow future comparisons, in the context of software verification.

APA, Harvard, Vancouver, ISO, and other styles

32

Nguyen, Thanh-Khoa. "Image segmentation and extraction based on pixel communities." Thesis, La Rochelle, 2019. http://www.theses.fr/2019LAROS035.

Full text

Abstract:

La segmentation d’images est devenue une tâche indispensable largement utilisée dans plusieurs applications de traitement d’images, notamment la détection d’objets, le suivi d’objets, l’assistance automatique à la conduite et les systèmes de contrôle du trafic, etc. La littérature regorge d’algorithmes permettant de réaliser des tâches de segmentation d’images. Ces méthodes peuvent être divisées en groupes principaux en fonction des approches sous-jacentes, telles que la segmentation d'images basée sur les régions, la classification basée sur les caractéristiques de l'image, les approches basées sur les graphes et la segmentation d'images basée sur les réseaux de neurones. Récemment, l'analyse de réseaux sociaux a proposé de nombreuses théories et méthodologies. En particulier, des techniques de segmentation d’images basées sur des algorithmes de détection de communautés ont été proposées et forment une famille d'approches visible dans la littérature. Dans cette thèse, nous proposons un nouveau cadre pour la segmentation d'images basée sur la détection de communautés. Si l'idée de base d'utiliser le domaine de l'analyse des réseaux sociaux dans la segmentation de l'image est tout à fait séduisante, la manière dont les algorithmes de détection de communautés peuvent être appliqués efficacement à la segmentation d'images est un sujet qui continue à interroger. L’apport de cette thèse est un effort pour construire de manière pertinente des meilleurs réseaux complexes en fonction de l'application, des méthodes utilisées pour la détection de communautés et pour proposer de nouvelles méthodes pour agréger les régions homogènes afin de produire de bonnes segmentations d’images.Par ailleurs, nous proposons également un système de recherche d’images par le contenu (content-based image retrieval) utilisant les mêmes caractéristiques que celles obtenues par les processus de segmentation d’images. Le moteur de recherche d'images proposé fonctionne pour des images de scènes naturelles et permet de rechercher les images les plus similaires à l'image requête. Ce moteur de recherche d’images par le contenu repose sur l’utilisation des régions extraites comme mots visuels dans le modèle Bag-of-Visual-Words. Ceci permet de valider la généricité de notre approche de segmentation d’images à partir de réseaux complexes et son utilisation dans plusieurs domaines d'applications liés au traitement d’images et de vision par ordinateur. Nos méthodes ont été testées sur plusieurs jeux de données et évaluées en utilisant différentes mesures classiques de la qualité d'une segmentation. Les méthodes proposées produisent des segmentations d'image dont la qualité est supérieure à l'état de l'art
Image segmentation has become an indispensable task that is widely employed in several image processing applications including object detection, object tracking, automatic driver assistance, and traffic control systems, etc. The literature abounds with algorithms for achieving image segmentation tasks. These methods can be divided into some main groups according to the underlying approaches, such as Region-based image segmentation, Feature-based clustering, Graph-based approaches and Artificial Neural Network-based image segmentation. Recently, complex networks have mushroomed both theories and applications as a trend of developments. Hence, image segmentation techniques based on community detection algorithms have been proposed and have become an interesting discipline in the literature. In this thesis, we propose a novel framework for community detection based image segmentation. The idea that brings social networks analysis domain into image segmentation quite satisfies with most authors and harmony in those researches. However, how community detection algorithms can be applied in image segmentation efficiently is a topic that has challenged researchers for decades. The contribution of this thesis is an effort to construct best complex networks for applying community detection and proposal novel agglomerate methods in order to aggregate homogeneous regions producing good image segmentation results. Besides, we also propose a content based image retrieval system using the same features than the ones obtained by the image segmentation processes. The proposed image search engine for real images can implement to search the closest similarity images with query image. This content based image retrieval relies on the incorporation of our extracted features into Bag-of-Visual-Words model. This is one of representative applications denoted that image segmentation benefits several image processing and computer visions applications. Our methods have been tested on several data sets and evaluated by many well-known segmentation evaluation metrics. The proposed methods produce efficient image segmentation results compared to the state of the art

APA, Harvard, Vancouver, ISO, and other styles

33

Althuru, Dharan Kumar Reddy. "Distributed Local Trust Propagation Model and its Cloud-based Implementation." Wright State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=wright1400649603.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Desai, Urvashi. "Student Interaction Network Analysis on Canvas LMS." Miami University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=miami1588339724934746.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Quantin, Matthieu. "Proposition de chaînage des connaissances historiques et patrimoniales Approche multi-échelles et multi-critères de corpus textuels." Thesis, Ecole centrale de Nantes, 2018. http://www.theses.fr/2018ECDN0014.

Full text

Abstract:

Les humanités défient les capacités du numérique depuis 60 ans. Les années 90 marquent une rupture, énonçant l'espoir d'une interprétation qualitative automatique de données interopérables devenues « connaissances ». Depuis 2010, une vague de désillusion ternit ces perspectives, le foisonnement des humanités numériques s'intensifie. Au coeur de ce paysage complexe, nous proposons une méthode implémentée produisant différentes vues d'un corpus textuel pour (1) l'analyse en interaction avec les connaissances qualitatives de l'historien et (2) la documentation numérique en contexte patrimonial (musée, site) auprès d'un public avisé. Les vues du corpus sont des graphes multiples pondérés, les documents sont des sommets liés par des arêtes renseignant les proximités sémantiques, temporelles et spatiales. Cette méthode vise à co-créer des connaissances historiques. À l'utopie d'une modélisation des connaissances qualitatives de l'historien, nous préférons l'heuristique pragmatique: l'interprétation de quantifications du corpus suscite l'émergence de nouvelles certitudes et hypothèses. Par ailleurs, notre approche (type OLAP) ouvre des parcours et accès personnalisés à chaque usager pour la documentation/analyse du patrimoine numérique voire 3D. Plusieurs cas d'étude valident les méthodes proposées et ouvrent des perspectives d'applications industrielles
Humanities challenges computer sciences since 60 years. The 90’s marks a break, announcing qualitative analysis and interpretation of interoperable data, which became «knowledge». Since 2010, a disillusionment tarnishes the prospects, Digital Hmanities diversity increases. At the core of this complex background, we propose an implemented method producing various «views» of textual corpus in History. This views enable (1) interactive analysis with qualitative knowledge of the historian and (2) digital documentation of heritage on site (e.g. museum) for an advanced visitor. Corpus views are weighted multi graphs. Documents are vertices linked by edges. Each edge contains semantic, temporal or spatial proximity information. This method aims at co-creating historical knowledge. Facing the utopian modeling of qualitative knowledge in history, we designed a pragmatic process : the historian analyses quantitative data of a known corpus, this generates new hypothesis and certainties. Our approach (OLAP like) chart paths and customized access for each user to digital heritage documentation. These paths may meet 3D heritage data. Several use cases validate the proposed method and open perspectives of industrial application

APA, Harvard, Vancouver, ISO, and other styles

36

Brauer, Falk. "Extraktion und Identifikation von Entitäten in Textdaten im Umfeld der Enterprise Search." Phd thesis, Universität Potsdam, 2010. http://opus.kobv.de/ubp/volltexte/2011/5140/.

Full text

Abstract:

Die automatische Informationsextraktion (IE) aus unstrukturierten Texten ermöglicht völlig neue Wege, auf relevante Informationen zuzugreifen und deren Inhalte zu analysieren, die weit über bisherige Verfahren zur Stichwort-basierten Dokumentsuche hinausgehen. Die Entwicklung von Programmen zur Extraktion von maschinenlesbaren Daten aus Texten erfordert jedoch nach wie vor die Entwicklung von domänenspezifischen Extraktionsprogrammen. Insbesondere im Bereich der Enterprise Search (der Informationssuche im Unternehmensumfeld), in dem eine große Menge von heterogenen Dokumenttypen existiert, ist es oft notwendig ad-hoc Programm-module zur Extraktion von geschäftsrelevanten Entitäten zu entwickeln, die mit generischen Modulen in monolithischen IE-Systemen kombiniert werden. Dieser Umstand ist insbesondere kritisch, da potentiell für jeden einzelnen Anwendungsfall ein von Grund auf neues IE-System entwickelt werden muss. Die vorliegende Dissertation untersucht die effiziente Entwicklung und Ausführung von IE-Systemen im Kontext der Enterprise Search und effektive Methoden zur Ausnutzung bekannter strukturierter Daten im Unternehmenskontext für die Extraktion und Identifikation von geschäftsrelevanten Entitäten in Doku-menten. Grundlage der Arbeit ist eine neuartige Plattform zur Komposition von IE-Systemen auf Basis der Beschreibung des Datenflusses zwischen generischen und anwendungsspezifischen IE-Modulen. Die Plattform unterstützt insbesondere die Entwicklung und Wiederverwendung von generischen IE-Modulen und zeichnet sich durch eine höhere Flexibilität und Ausdrucksmächtigkeit im Vergleich zu vorherigen Methoden aus. Ein in der Dissertation entwickeltes Verfahren zur Dokumentverarbeitung interpretiert den Daten-austausch zwischen IE-Modulen als Datenströme und ermöglicht damit eine weitgehende Parallelisierung von einzelnen Modulen. Die autonome Ausführung der Module führt zu einer wesentlichen Beschleu-nigung der Verarbeitung von Einzeldokumenten und verbesserten Antwortzeiten, z. B. für Extraktions-dienste. Bisherige Ansätze untersuchen lediglich die Steigerung des durchschnittlichen Dokumenten-durchsatzes durch verteilte Ausführung von Instanzen eines IE-Systems. Die Informationsextraktion im Kontext der Enterprise Search unterscheidet sich z. B. von der Extraktion aus dem World Wide Web dadurch, dass in der Regel strukturierte Referenzdaten z. B. in Form von Unternehmensdatenbanken oder Terminologien zur Verfügung stehen, die oft auch die Beziehungen von Entitäten beschreiben. Entitäten im Unternehmensumfeld haben weiterhin bestimmte Charakteristiken: Eine Klasse von relevanten Entitäten folgt bestimmten Bildungsvorschriften, die nicht immer bekannt sind, auf die aber mit Hilfe von bekannten Beispielentitäten geschlossen werden kann, so dass unbekannte Entitäten extrahiert werden können. Die Bezeichner der anderen Klasse von Entitäten haben eher umschreibenden Charakter. Die korrespondierenden Umschreibungen in Texten können variieren, wodurch eine Identifikation derartiger Entitäten oft erschwert wird. Zur effizienteren Entwicklung von IE-Systemen wird in der Dissertation ein Verfahren untersucht, das alleine anhand von Beispielentitäten effektive Reguläre Ausdrücke zur Extraktion von unbekannten Entitäten erlernt und damit den manuellen Aufwand in derartigen Anwendungsfällen minimiert. Verschiedene Generalisierungs- und Spezialisierungsheuristiken erkennen Muster auf verschiedenen Abstraktionsebenen und schaffen dadurch einen Ausgleich zwischen Genauigkeit und Vollständigkeit bei der Extraktion. Bekannte Regellernverfahren im Bereich der Informationsextraktion unterstützen die beschriebenen Problemstellungen nicht, sondern benötigen einen (annotierten) Dokumentenkorpus. Eine Methode zur Identifikation von Entitäten, die durch Graph-strukturierte Referenzdaten vordefiniert sind, wird als dritter Schwerpunkt untersucht. Es werden Verfahren konzipiert, welche über einen exakten Zeichenkettenvergleich zwischen Text und Referenzdatensatz hinausgehen und Teilübereinstimmungen und Beziehungen zwischen Entitäten zur Identifikation und Disambiguierung heranziehen. Das in der Arbeit vorgestellte Verfahren ist bisherigen Ansätzen hinsichtlich der Genauigkeit und Vollständigkeit bei der Identifikation überlegen.
The automatic information extraction (IE) from unstructured texts enables new ways to access relevant information and analyze text contents, which goes beyond existing technologies for keyword-based search in document collections. However, the development of systems for extracting machine-readable data from text still requires the implementation of domain-specific extraction programs. In particular in the field of enterprise search (the retrieval of information in the enterprise settings), in which a large amount of heterogeneous document types exists, it is often necessary to develop ad-hoc program-modules and to combine them with generic program components to extract by business relevant entities. This is particularly critical, as potentially for each individual application a new IE system must be developed from scratch. In this work we examine efficient methods to develop and execute IE systems in the context of enterprise search and effective algorithms to exploit pre-existing structured data in the business context for the extraction and identification of business entities in documents. The basis of this work is a novel platform for composition of IE systems through the description of the data flow between generic and application-specific IE modules. The platform supports in particular the development and reuse of generic IE modules and is characterized by a higher flexibility as compared to previous methods. A technique developed in this work interprets the document processing as data stream between IE modules and thus enables an extensive parallelization of individual modules. The autonomous execution of each module allows for a significant runtime improvement for individual documents and thus improves response times, e.g. for extraction services. Previous parallelization approaches focused only on an improved throughput for large document collections, e.g., by leveraging distributed instances of an IE system. Information extraction in the context of enterprise search differs for instance from the extraction from the World Wide Web by the fact that usually a variety of structured reference data (corporate databases or terminologies) is available, which often describes the relationships among entities. Furthermore, entity names in a business environment usually follow special characteristics: On the one hand relevant entities such as product identifiers follow certain patterns that are not always known beforehand, but can be inferred using known sample entities, so that unknown entities can be extracted. On the other hand many designators have a more descriptive character (concatenation of descriptive words). The respective references in texts might differ due to the diversity of potential descriptions, often making the identification of such entities difficult. To address IE applications in the presence of available structured data, we study in this work the inference of effective regular expressions from given sample entities. Various generalization and specialization heuristics are used to identify patterns at different syntactic abstraction levels and thus generate regular expressions which promise both high recall and precision. Compared to previous rule learning techniques in the field of information extraction, our technique does not require any annotated document corpus. A method for the identification of entities that are predefined by graph structured reference data is examined as a third contribution. An algorithm is presented which goes beyond an exact string comparison between text and reference data set. It allows for an effective identification and disambiguation of potentially discovered entities by exploitation of approximate matching strategies. The method leverages further relationships among entities for identification and disambiguation. The method presented in this work is superior to previous approaches with regard to precision and recall.

APA, Harvard, Vancouver, ISO, and other styles

37

Jguirim, Ines. "Modélisation et génération d'itinéraires contextuels d'activités urbaines dans la ville." Thesis, Brest, 2016. http://www.theses.fr/2016BRES0074/document.

Full text

Abstract:

La ville est une agrégation urbaine permettant d’offrir divers services à ses citadins. Elle constitue un système complexe qui dépend de plusieurs facteurs sociaux et économiques. La configuration de l’espace influence d’une manière importante l’accessibilité aux différentes fonctionnalités de la ville. L’analyse spatiale de la structure urbaine est réalisée sur les villes afin d’étudier les caractéristiques de l’espace et pouvoir évaluer son potentiel fonctionnel. L’enjeu de la thèse est de proposer une approche d’analyse spatiale qui prenne en compte les différents aspects structurels et sémantiques de la ville. Un modèle basé sur les graphes a été proposé pour représenter le réseau de transport multimodal de la ville qui garantit l’accessibilité aux différents points d’intérêt. Les super-réseaux ont été utilisés pour intégrer la possibilité d’un transfert intermodal dans le modèle de transport par des liens d’interdépendance entre les sous-graphes associés aux différents modes de transport. L’aspect temporel a été représenté dans le modèle par des attributs spécifiant les contraintes temporelles caractérisant le parcours de chaque noeud et chaque arc tels que le temps d’exploration, le temps d’attente et le temps requis pour les pénalités routières. L’aspect fonctionnel est introduit par le concept d’activité. Nous avons proposé un modèle conceptuel qui vise à modéliser les différents éléments contextuels qui peuvent affecter la planification et l’exécution des activités urbaines tels que le cadre spatio-temporel et le profil de l’utilisateur. Ce modèle conceptuel de données a été enrichi par un système de gestion de connaissances qui vise à représenter des informations sur les comportements des individus dans le cadre d’une activité selon les profils et le contexte spatio-temporel. Nous nous basons sur des données collectées dans le cadre d’une enquête de déplacement pour l’extraction de connaissances à l’aide d’algorithmes de classement et de recherche de motifs séquentiels. Les connaissances extraites sont représentées par un système de gestion de règles permettant la planification contextuelle de l’activité à partir d’un programme d’activité adapté à un profil donné, des itinéraires assurant la réalisation de l’activité sont générés en formant un réseau d’activité contextuel. L’algorithme de recherche d’itinéraires s’appuie sur l’algorithme A* qui permet, à travers une fonction heuristique, la réduction de la complexité de la recherche en prenant en compte l’aspect temporel de l’activité et la structure multimodale de réseau. L’expérimentation de l’approche a été réalisée sur quatre villes Françaises dans l’objectif de générer des réseaux thématiques associés aux différentes activités réalisées par des profils différents. L’aspect fonctionnel représenté dans ces réseaux fait l’objet d’une analyse spatiale qui consiste à étudier la configuration de l’espace tout en prenant en compte l’usage contextuel des utilisateurs. L’analyse est basée sur les opérateurs de centralité définis par la syntaxe spatiale ainsi que des opérateurs d’étude de couverture des réseaux thématiques originaux
The city is an urban aggregation allowing to offer diverse services to his city-dwellers. She establishes a complex system which depends on several social and economic factors. The configuration of the space influences in a important way the accessibility to the various features of the city. The spatial analysis of the urban structure is realized on cities to study the characteristics of the space and be able to estimate its functional potential. The aim of the thesis is to propose an approach to spatial analysis which takes into account the various structural and semantic aspects of the city. A model based on the graphs was proposed to represent the multimodal transport network of the city which guarantees the accessibility to the various points of interest. Super-networks were used to integrate the possibility of an intermodal transfer into the model of transport by links of interdependence between the sub-graphs associated to the various means of transportation. The temporal aspect was represented in the model by attributes specifying the temporal constraints characterizing the itinerary of every node and every edge such as the time of exploration, the waiting time and the time required for the road penalties. The functional aspect is introduced by the concept of activity. We proposed a conceptual model which aims to model the various contextual elements which can affect the planning and the execution of the urban activities such as the spatiotemporal frame and the profile of the user. This model was enriched by knowledge management which aims to represent information about individual behaviors. The extracted knowledge are represented by a management system of rules allowing the contextual planning of the activity

APA, Harvard, Vancouver, ISO, and other styles

38

Balzani, Lorenzo. "Verbalizzazione di eventi biomedici espressi nella letteratura scientifica: generazione controllata di linguaggio naturale da grafi di conoscenza mediante transformer text-to-text." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24286/.

Full text

Abstract:

Il periodo in cui viviamo rappresenta la cuspide di una forte e rapida evoluzione nella comprensione del linguaggio naturale, raggiuntasi prevalentemente grazie allo sviluppo di modelli neurali. Nell'ambito dell'information extraction, tali progressi hanno recentemente consentito di riconoscere efficacemente relazioni semantiche complesse tra entità menzionate nel testo, quali proteine, sintomi e farmaci. Tale task -- reso possibile dalla modellazione ad eventi -- è fondamentale in biomedicina, dove la crescita esponenziale del numero di pubblicazioni scientifiche accresce ulteriormente il bisogno di sistemi per l'estrazione automatica delle interazioni racchiuse nei documenti testuali. La combinazione di AI simbolica e sub-simbolica può consentire l'introduzione di conoscenza strutturata nota all'interno di language model, rendendo quest'ultimi più robusti, fattuali e interpretabili. In tale contesto, la verbalizzazione di grafi è uno dei task su cui si riversano maggiori aspettative. Nonostante l'importanza di tali contributi (dallo sviluppo di chatbot alla formulazione di nuove ipotesi di ricerca), ad oggi, risultano assenti contributi capaci di verbalizzare gli eventi biomedici espressi in letteratura, apprendendo il legame tra le interazioni espresse in forma a grafo e la loro controparte testuale. La tesi propone il primo dataset altamente comprensivo su coppie evento-testo, includendo diverse sotto-aree biomediche, quali malattie infettive, ricerca oncologica e biologia molecolare. Il dataset introdotto viene usato come base per l'addestramento di modelli generativi allo stato dell'arte sul task di verbalizzazione, adottando un approccio text-to-text e illustrando una tecnica formale per la codifica di grafi evento mediante testo aumentato. Infine, si dimostra la validità degli eventi per il miglioramento delle capacità di comprensione dei modelli neurali su altri task NLP, focalizzandosi su single-document summarization e multi-task learning.

APA, Harvard, Vancouver, ISO, and other styles

39

Döhling, Lars. "Extracting and Aggregating Temporal Events from Texts." Doctoral thesis, Humboldt-Universität zu Berlin, 2017. http://dx.doi.org/10.18452/18454.

Full text

Abstract:

Das Finden von zuverlässigen Informationen über gegebene Ereignisse aus großen und dynamischen Textsammlungen, wie dem Web, ist ein wichtiges Thema. Zum Beispiel sind Rettungsteams und Versicherungsunternehmen an prägnanten Fakten über Schäden nach Katastrophen interessiert, die heutzutage online in Web-Blogs, Zeitungsartikeln, Social Media etc. zu finden sind. Solche Fakten helfen, die erforderlichen Hilfsmaßnahmen zu bestimmen und unterstützen deren Koordination. Allerdings ist das Finden, Extrahieren und Aggregieren nützlicher Informationen ein hochkomplexes Unterfangen: Es erfordert die Ermittlung geeigneter Textquellen und deren zeitliche Einordung, die Extraktion relevanter Fakten in diesen Texten und deren Aggregation zu einer verdichteten Sicht auf die Ereignisse, trotz Inkonsistenzen, vagen Angaben und Veränderungen über die Zeit. In dieser Arbeit präsentieren und evaluieren wir Techniken und Lösungen für jedes dieser Probleme, eingebettet in ein vierstufiges Framework. Die angewandten Methoden beruhen auf Verfahren des Musterabgleichs, der Verarbeitung natürlicher Sprache und des maschinellen Lernens. Zusätzlich berichten wir über die Ergebnisse zweier Fallstudien, basierend auf dem Einsatz des gesamten Frameworks: Die Ermittlung von Daten über Erdbeben und Überschwemmungen aus Webdokumenten. Unsere Ergebnisse zeigen, dass es unter bestimmten Umständen möglich ist, automatisch zuverlässige und zeitgerechte Daten aus dem Internet zu erhalten.
Finding reliable information about given events from large and dynamic text collections, such as the web, is a topic of great interest. For instance, rescue teams and insurance companies are interested in concise facts about damages after disasters, which can be found today in web blogs, online newspaper articles, social media, etc. Knowing these facts helps to determine the required scale of relief operations and supports their coordination. However, finding, extracting, and condensing specific facts is a highly complex undertaking: It requires identifying appropriate textual sources and their temporal alignment, recognizing relevant facts within these texts, and aggregating extracted facts into a condensed answer despite inconsistencies, uncertainty, and changes over time. In this thesis, we present and evaluate techniques and solutions for each of these problems, embedded in a four-step framework. Applied methods are pattern matching, natural language processing, and machine learning. We also report the results for two case studies applying our entire framework: gathering data on earthquakes and floods from web documents. Our results show that it is, under certain circumstances, possible to automatically obtain reliable and timely data from the web.

APA, Harvard, Vancouver, ISO, and other styles

40

Soussi, Rania. "Querying and extracting heterogeneous graphs from structured data and unstrutured content." Phd thesis, Ecole Centrale Paris, 2012. http://tel.archives-ouvertes.fr/tel-00740663.

Full text

Abstract:

The present work introduces a set of solutions to extract graphs from enterprise data and facilitate the process of information search on these graphs. First of all we have defined a new graph model called the SPIDER-Graph, which models complex objects and permits to define heterogeneous graphs. Furthermore, we have developed a set of algorithms to extract the content of a database from an enterprise and to represent it in this new model. This latter representation allows us to discover relations that exist in the data but are hidden due to their poor compatibility with the classical relational model. Moreover, in order to unify the representation of all the data of the enterprise, we have developed a second approach which extracts from unstructured data an enterprise's ontology containing the most important concepts and relations that can be found in a given enterprise. Having extracted the graphs from the relational databases and documents using the enterprise ontology, we propose an approach which allows the users to extract an interaction graph between a set of chosen enterprise objects. This approach is based on a set of relations patterns extracted from the graph and the enterprise ontology concepts and relations. Finally, information retrieval is facilitated using a new visual graph query language called GraphVQL, which allows users to query graphs by drawing a pattern visually for the query. This language covers different query types from the simple selection and aggregation queries to social network analysis queries.

APA, Harvard, Vancouver, ISO, and other styles

41

Bahl, Gaétan. "Architectures deep learning pour l'analyse d'images satellite embarquée." Thesis, Université Côte d'Azur, 2022. https://tel.archives-ouvertes.fr/tel-03789667.

Full text

Abstract:

Les progrès des satellites d'observation de la Terre à haute résolution et la réduction des temps de revisite introduite par la création de constellations de satellites ont conduit à la création quotidienne de grandes quantités d'images (des centaines de Teraoctets par jour). Simultanément, la popularisation des techniques de Deep Learning a permis le développement d'architectures capables d'extraire le contenu sémantique des images. Bien que ces algorithmes nécessitent généralement l'utilisation de matériel puissant, des accélérateurs d'inférence IA de faible puissance ont récemment été développés et ont le potentiel d'être utilisés dans les prochaines générations de satellites, ouvrant ainsi la possibilité d'une analyse embarquée des images satellite. En extrayant les informations intéressantes des images satellite directement à bord, il est possible de réduire considérablement l'utilisation de la bande passante, du stockage et de la mémoire. Les applications actuelles et futures, telles que la réponse aux catastrophes, l'agriculture de précision et la surveillance du climat, bénéficieraient d'une latence de traitement plus faible, voire d'alertes en temps réel.Dans cette thèse, notre objectif est double : D'une part, nous concevons des architectures de Deep Learning efficaces, capables de fonctionner sur des périphériques de faible puissance, tels que des satellites ou des drones, tout en conservant une précision suffisante. D'autre part, nous concevons nos algorithmes en gardant à l'esprit l'importance d'avoir une sortie compacte qui peut être efficacement calculée, stockée, transmise au sol ou à d'autres satellites dans une constellation.Tout d'abord, en utilisant des convolutions séparables en profondeur et des réseaux neuronaux récurrents convolutionnels, nous concevons des réseaux neuronaux de segmentation sémantique efficaces avec un faible nombre de paramètres et une faible utilisation de la mémoire. Nous appliquons ces architectures à la segmentation des nuages et des forêts dans les images satellites. Nous concevons également une architecture spécifique pour la segmentation des nuages sur le FPGA d'OPS-SAT, un satellite lancé par l'ESA en 2019, et réalisons des expériences à bord à distance. Deuxièmement, nous développons une architecture de segmentation d'instance pour la régression de contours lisses basée sur une représentation à coefficients de Fourier, qui permet de stocker et de transmettre efficacement les formes des objets détectés. Nous évaluons la performance de notre méthode sur une variété de dispositifs informatiques à faible puissance. Enfin, nous proposons une architecture d'extraction de graphes routiers basée sur une combinaison de Fully Convolutional Networks et de Graph Neural Networks. Nous montrons que notre méthode est nettement plus rapide que les méthodes concurrentes, tout en conservant une bonne précision
The recent advances in high-resolution Earth observation satellites and the reduction in revisit times introduced by the creation of constellations of satellites has led to the daily creation of large amounts of image data hundreds of TeraBytes per day). Simultaneously, the popularization of Deep Learning techniques allowed the development of architectures capable of extracting semantic content from images. While these algorithms usually require the use of powerful hardware, low-power AI inference accelerators have recently been developed and have the potential to be used in the next generations of satellites, thus opening the possibility of onboard analysis of satellite imagery. By extracting the information of interest from satellite images directly onboard, a substantial reduction in bandwidth, storage and memory usage can be achieved. Current and future applications, such as disaster response, precision agriculture and climate monitoring, would benefit from a lower processing latency and even real-time alerts.In this thesis, our goal is two-fold: On the one hand, we design efficient Deep Learning architectures that are able to run on low-power edge devices, such as satellites or drones, while retaining a sufficient accuracy. On the other hand, we design our algorithms while keeping in mind the importance of having a compact output that can be efficiently computed, stored, transmitted to the ground or other satellites within a constellation.First, by using depth-wise separable convolutions and convolutional recurrent neural networks, we design efficient semantic segmentation neural networks with a low number of parameters and a low memory usage. We apply these architectures to cloud and forest segmentation in satellite images. We also specifically design an architecture for cloud segmentation on the FPGA of OPS-SAT, a satellite launched by ESA in 2019, and perform onboard experiments remotely. Second, we develop an instance segmentation architecture for the regression of smooth contours based on the Fourier coefficient representation, which allows detected object shapes to be stored and transmitted efficiently. We evaluate the performance of our method on a variety of low-power computing devices. Finally, we propose a road graph extraction architecture based on a combination of fully convolutional and graph neural networks. We show that our method is significantly faster than competing methods, while retaining a good accuracy

APA, Harvard, Vancouver, ISO, and other styles

42

Ozcan, Evren. "Ultrasound Assisted Extraction Of Phenolics From Grape Pomace." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12606908/index.pdf.

Full text

Abstract:

Grape pomace is a by-product of wineries. It is one of the most potent antioxidant sources due to its high phenolic content. In this thesis study, ultrasound assisted extraction of phenolic compounds from Merlot grape pomace has been studied. The effects of sonication time, subsequent extraction time in shaking water bath at 45°
C and composition of the solvent on extraction efficiency and recovery of phenolics were studied by response surface methodology. Folin-Ciocalteu colorimetric method was used to analyze effects of process parameters on the total phenolic content of the extracts. The best recovery (47.2 mg gallic acid equivalents of total phenolics per g of dried grape pomace) was obtained using 30 % aqueous ethanol and applying 6 minutes of sonication followed by 12 minutes of shaking in water bath at 45°
C.

APA, Harvard, Vancouver, ISO, and other styles

43

Raveaux, Romain. "Fouille de graphes et classification de graphes : application à l’analyse de plans cadastraux." Thesis, La Rochelle, 2010. http://www.theses.fr/2010LAROS311/document.

Full text

Abstract:

Les travaux présentés dans ce mémoire de thèse abordent sous différents angles très intéressants, un sujet vaste et ambitieux : l’interprétation de plans cadastraux couleurs.Dans ce contexte, notre approche se trouve à la confluence de différentes thématiques de recherche telles que le traitement du signal et des images, la reconnaissance de formes, l’intelligence artificielle et l’ingénierie des connaissances. En effet, si ces domaines scientifiques diffèrent dans leurs fondements, ils sont complémentaires et leurs apports respectifs sont indispensables pour la conception d’un système d’interprétation. Le centre du travail est le traitement automatique de documents cadastraux du 19e siècle. La problématique est traitée dans le cadre d'un projet réunissant des historiens, des géomaticiens et des informaticiens. D'une part nous avons considéré le problème sous un angle systémique, s'intéressant à toutes les étapes de la chaîne de traitements mais aussi avec un souci évident de développer des méthodologies applicables dans d'autres contextes. Les documents cadastraux ont été l'objet de nombreuses études mais nous avons su faire preuve d'une originalité certaine, mettant l'accent sur l'interprétation des documents et basant notre étude sur des modèles à base de graphes. Des propositions de traitements appropriés et de méthodologies ont été formulées. Le souci de comblé le gap sémantique entre l’image et l’interprétation a reçu dans le cas des plans cadastraux étudiés une réponse
This thesis tackles the problem of technical document interpretationapplied to ancient and colored cadastral maps. This subject is on the crossroadof different fields like signal or image processing, pattern recognition, artificial intelligence,man-machine interaction and knowledge engineering. Indeed, each of thesedifferent fields can contribute to build a reliable and efficient document interpretationdevice. This thesis points out the necessities and importance of dedicatedservices oriented to historical documents and a related project named ALPAGE.Subsequently, the main focus of this work: Content-Based Map Retrieval within anancient collection of color cadastral maps is introduced

APA, Harvard, Vancouver, ISO, and other styles

44

Zou, Le. "3D face recognition with wireless transportation." [College Station, Tex. : Texas A&M University, 2007. http://hdl.handle.net/1969.1/ETD-TAMU-1448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

de, Carvalho Gomes Pedro. "Sound Modular Extraction of Control Flow Graphs from Java Bytecode." Licentiate thesis, KTH, Teoretisk datalogi, TCS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-105275.

Full text

Abstract:

Control flow graphs (CFGs) are abstract program models that preserve the control flow information. They have been widely utilized for many static analyses in the past decades. Unfortunately, previous studies about the CFG construction from modern languages, such as Java, have either neglected advanced features that influence the control flow, or do not provide a correctness argument. This is a bearable issue for some program analyses, but not for formal methods, where the soundness of CFGs is a mandatory condition for the verification of safety-critical properties. Moreover, when developing open systems, i.e., systems in which at least one component is missing, one may want to extract CFGs to verify the available components. Soundness is even harder to achieve in this scenario, because of the unknown inter-dependencies involving missing components. In this work we present two variants of a CFG extraction algorithm from Java bytecode considering precise exceptional flow, which are sound w.r.t to the JVM behavior. The first algorithm extracts CFGs from fully-provided (closed) programs only. It proceeds in two phases. Initially the Java bytecode is translated into a stack-less intermediate representation named BIR, which provides explicit representation of exceptions, and is more compact than the original bytecode. Next, we define the transformation from BIR to CFGs, which, among other features, considers the propagation of uncaught exceptions within method calls. We then establish its correctness: the behavior of the extracted CFGs is shown to be a sound over-approximation of the behavior of the original programs. Thus, temporal safety properties that hold for the CFGs also hold for the program. We prove this by suitably combining the properties of the two transformations with those of a previous idealized CFG extraction algorithm, whose correctness has been proven directly. The second variant of the algorithm is defined for open systems. We generalize the extraction algorithm for closed systems for a modular set-up, and resolve inter-dependencies involving missing components by using user-provided interfaces. We establish its correctness by defining a refinement relation between open systems, which constrains the instantiation of missing components. We prove that if the relation holds, then the CFGs extracted from the components of the original open system are sound over-approximations of the CFGs for the same components in the refined system. Thus, temporal safety properties that hold for an open system also hold for closed systems that refine it. We have implemented both algorithms as the ConFlEx tool. It uses Sawja, an external library for the static analysis of Java bytecode, to transform bytecode into BIR, and to resolve virtual method calls. We have extended Sawja to support open systems, and improved its exception type analysis. Experimental results have shown that the algorithm for closed systems generates more precise CFGs than the modular algorithm. This was expected, due to the heavy over-approximations the latter has to perform to be sound. Also, both algorithms are linear in the number of bytecode instructions. Therefore, ConFlEx is efficient for the extraction of CFGs from either open, or closed Java bytecode programs.

QC 20121122

APA, Harvard, Vancouver, ISO, and other styles

46

de, Carvalho Gomes Pedro, and Attilio Picoco. "Sound Extraction of Control-Flow Graphs from open Java Bytecode Systems." KTH, Teoretisk datalogi, TCS, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-104076.

Full text

Abstract:

Formal verification techniques have been widely deployed as means to ensure the quality of software products. Unfortunately, they suffer with the combinatorial explosion of the state space. That is, programs have a large number of states, sometimes infinite. A common approach to alleviate the problem is to perform the verification over abstract models from the program. Control-flow graphs (CFG) are one of the most common models, and have been widely studied in the past decades. Unfortunately, previous works over modern programming languages, such as Java, have either neglected features that influence the control-flow, or do not provide a correctness argument about the CFG construction. This is an unbearable issue for formal verification, where soundness of CFGs is a mandatory condition for the verification of safety-critical properties. Moreover, one may want to extract CFGs from the available components of an open system. I.e., a system whose at least one of the components is missing. Soundness is even harder to achieve in this scenario, because of the unknown inter-dependences between software components. In the current work we present a framework to extract control-flow graphs from open Java Bytecode systems in a modular fashion. Our strategy requires the user to provide interfaces for the missing components. First, we present a formal definition of open Java bytecode systems. Next, we generalize a previous algorithm that performs the extraction of CFGs for closed programs to a modular set-up. The algorithm uses the user-provided interfaces to resolve inter-dependences involving missing components. Eventually the missing components will arrive, and the open system will become closed, and can execute. However, the arrival of a component may affect the soundness of CFGs which have been extracted previously. Thus, we define a refinement relation, which is a set of constraints upon the arrival of components, and prove that the relation guarantees the soundness of CFGs extracted with the modular algorithm. Therefore, the control-flow safety properties verified over the original CFGs still hold in the refined model. We implemented the modular extraction framework in the ConFlEx tool. Also, we have implemented the reusage from previous extractions, to enable the incremental extraction of a newly arrived component. Our technique performs substantial over-approximations to achieve soundness. Despite this, our test cases show that ConFlEx is efficient. Also, the extraction of the CFGs gets considerable speed-up by reusing results from previous analyses.

QC 20121029

Verification of Control-Flow Properties of Programs with Procedures(CVPP)

APA, Harvard, Vancouver, ISO, and other styles

47

Corrales, Moreno Margarita. "Optimal extraction and technological revalorisation of bioactive polyphenols from grape pomace." [S.l. : s.n.], 2008. http://digbib.ubka.uni-karlsruhe.de/volltexte/1000008298.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Amiri, Ramin. "Techno-economic evaluation of a polyphenols extraction process from grape seed." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24171/.

Full text

Abstract:

Caviro is one of the main wine producers in Italy. Caviro has two main headquarters in Emilia-Romagna. The one which is known as production plant is based in Faenza, while the other one where the bottling and storing is mainly carried out is in Forli. Wine and by-products are produced in Faenza. Then, produced wine is sent to the plant in Forli for storing, aging in case of some wines, analogical testing, and bottling for selling. Extraction of polyphenols from grape seeds has been investigated for couple of years. According to company’s goals and in compliance with circular economy goals of the company, the extraction process has been determined as a high added value by-product production. During my internship in the company, I had the opportunity to carry out extraction in lab-scale and pilot scale. In lab-scale experiments, 1 L Pyrex bottles were used as extraction chambers. Based on literature review, liquid to solid ratio of 4.75 has been applied. 850 mL of 50% v/v ethanol solution as extraction solution. The extraction cycle consists of two stages. Each stage lasts for four hours. After the first stage, solution is measured transferred to another Pyrex bottle. According to the same L/S ratio, fresh seeds has been added to the bottle to conduct the second stage of the extraction. The obtained solution was distilled in vacuum condition (-1barg) to recover the ethanol and concentrate the product. Exhausted seeds were washed to recover the adsorbed ethanol from seeds to optimize the process. By optimizing the process around 95% of inserted ethanol has been recovered. The washing solution has been used for 4 washing cycles to fulfil the recovery goals. By conducting the mass balance all obtained results were verified.

APA, Harvard, Vancouver, ISO, and other styles

49

Elliott, Paul Harrison 1979. "Extracting the K best solutions from a valued and-or acyclic graph." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/41540.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 117-118).
In this thesis, we are interested in solving a problem that arises in model-based programming, specifically in the estimation of the state a system described by a probabilistic model. Some model-based estimators, such as the MEXEC algorithm and the DNNF-based Belief State Estimation algorithm, use a valued and-or acyclic graph to represent the possible estimates. These algorithms specifically use a valued smooth deterministic decomposable negation normal form (sd-DNNF) representation, a type of and-or acyclic graph. Prior work has focused on extracting either all or only the best solution from the sd-DNNF. This work develops an efficient algorithm that is able to extract the k best solutions, where k is a parameter to the algorithm. For a graph with -E- edges, -V - nodes and -Ev- children per non-leaf node, the algorithm presented in this thesis has a time complexity of O(-E-k log k +-E- log -Ev-+-V -k log -Ev-) and a space complexity O(-E-k).
by Paul Harrison Elliott.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

50

Bendou, Mohamed. "Extraction de connaissances à partir des données à l'aide des réseaux bayésiens." Paris 11, 2003. http://www.theses.fr/2003PA112053.

Full text

Abstract:

L'objectif principal de cette thèse se focalise essentiellement sur la conception de nouveaux algorithmes d'apprentissage de réseaux Bayésiens, plus précis, plus efficaces, plus robustes en présence du bruit et, donc, plus adaptés aux tâches pratiques d'ECD. Partant de l'observation que la plupart des optima locaux dans l'espace des structures de réseaux bayésiens sont directement liés à l'existence des classes d'équivalence (ensembles de structures encodant les mêmes relations d'indépendance conditionnelle, représentées par des graphes semi-orientés), nous avons concentré une partie importante de nos recherches sur l'élaboration d'une nouvelle famille d'algorithmes d'apprentissage: EQ, qui explorent directement l'espace des classes d'équivalence, ainsi que sur le développement d'une "boîte à outils" théoriques et algorithmiques pour l'analyse et le traitement des graphes semi-orientés. Nous avons pu démontrer que les gains de précision significatifs apportés par ce type d'approche peuvent être obtenus tout en conservant des temps de calcul comparables à ceux des approches classiques. Ainsi, nous avons contribué au regain d'intérêt actuel pour l'apprentissage des classes d'équivalence de réseaux bayésiens (considéré pendant longtemps comme trop complexe par la communauté scientifique). Enfin, un autre volet de nos recherches a été consacré à l'analyse des effets du bruit présent dans les données sur l'apprentissage des réseaux Bayésiens. Nous avons analysé et expliqué l'augmentation de la complexité des réseaux Bayésiens appris à partir de données bruitées et montré que, contrairement aux sur-spécialisations classiques affectant les autres classes de méthodes d'apprentissage, ce phénomène est justifié théoriquement et bénéfique pour le pouvoir prédictif des modèles appris
The main objective of this thesis basically focuses on developing a new kind of learning algorithms of Bayésiens networks, more accurate, efficient and robust in presence of the noise and, therefore, adapted to KDD tasks. Since most of local optima in the space of networks bayésiens structures are caused directly by the existence of equivalence classes (sets of structures encoding the same conditional independence relations, represented by the partially oriented graphs), we concentrated important part of our researches on the development of a new family of learning algorithms: EQ. These algorithms directly explore the space of equivalence classes. We also developed theoretical and algorithmic tools for the analysis and the treatment of partially oriented graphs. We could demonstrate that a meaningful precision gains brought by this kind of approach can be obtained in a comparable time than the classical approaches. We, thus, contributed to the present interest renewal for the learning of equivalence classes of bayesian networks (considered for a long time as too complex by the scientific community). Finally, another aspect of our research has been dedicated to the analysis of noise effects in data on the learning of the Bayesians networks. We analyzed and explained the increase of the complexity of learned Bayesian networks learned from noisy data and shown that, unlike classical over-fitting which affects other classes of learning methods, this phenomenon is theoretically justified by the alteration of the conditional independence relations between the variables and is beneficial for the predictive power of the learned models

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Graph extraction'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles