Dissertations / Theses: 'Information extraction and fusion'

1

Ahmad, Muhammad Imran. "Feature extraction and information fusion in face and palmprint multimodal biometrics." Thesis, University of Newcastle upon Tyne, 2013. http://hdl.handle.net/10443/2128.

Full text

Abstract:

Multimodal biometric systems that integrate the biometric traits from several modalities are able to overcome the limitations of single modal biometrics. Fusing the information at an earlier level by consolidating the features given by different traits can give a better result due to the richness of information at this stage. In this thesis, three novel methods are derived and implemented on face and palmprint modalities, taking advantage of the multimodal biometric fusion at feature level. The benefits of the proposed method are the enhanced capabilities in discriminating information in the fused features and capturing all of the information required to improve the classification performance. Multimodal biometric proposed here consists of several stages such as feature extraction, fusion, recognition and classification. Feature extraction gathers all important information from the raw images. A new local feature extraction method has been designed to extract information from the face and palmprint images in the form of sub block windows. Multiresolution analysis using Gabor transform and DCT is computed for each sub block window to produce compact local features for the face and palmprint images. Multiresolution Gabor analysis captures important information in the texture of the images while DCT represents the information in different frequency components. Important features with high discrimination power are then preserved by selecting several low frequency coefficients in order to estimate the model parameters. The local features extracted are fused in a new matrix interleaved method. The new fused feature vector is higher in dimensionality compared to the original feature vectors from both modalities, thus it carries high discriminating power and contains rich statistical information. The fused feature vector also has larger data points in the feature space which is advantageous for the training process using statistical methods. The underlying statistical information in the fused feature vectors is captured using GMM where several numbers of modal parameters are estimated from the distribution of fused feature vector. Maximum likelihood score is used to measure a degree of certainty to perform recognition while maximum likelihood score normalization is used for classification process. The use of likelihood score normalization is found to be able to suppress an imposter likelihood score when the background model parameters are estimated from a pool of users which include statistical information of an imposter. The present method achieved the highest recognition accuracy 97% and 99.7% when tested using FERET-PolyU dataset and ORL-PolyU dataset respectively.

APA, Harvard, Vancouver, ISO, and other styles

2

Jin, Xiaoying. "Automatic extraction of man-made objects from high-resolution satellite imagery by information fusion." Diss., Columbia, Mo. : University of Missouri-Columbia, 2005. http://hdl.handle.net/10355/5816.

Full text

Abstract:

Thesis (Ph.D.)--University of Missouri-Columbia, 2005.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (November 15, 2006) Vita. Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

3

Arif-Uz-Zaman, Kazi. "Failure and maintenance information extraction methodology using multiple databases from industry: A new data fusion approach." Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/116354/1/Kazi_Arif-Uz-Zaman_Thesis.pdf.

Full text

Abstract:

This study develops a new method to identify a vital input, i.e. failure times of an asset, to reliability models from multiple but commonly-available industrial maintenance databases. A text mining approach is employed to extract useful features from unstructured free texts of different maintenance work records. The proposed method is further developed using Active Learning algorithms to improve the robustness of the results. The outcomes of this study can be used to develop advanced and applicable reliability models from historical maintenance databases, which were not effectively utilised before. Two industry case studies were conducted to justify the method.

APA, Harvard, Vancouver, ISO, and other styles

4

Thuillier, Etienne. "Extraction of mobility information through heterogeneous data fusion : a multi-source, multi-scale, and multi-modal problem." Thesis, Bourgogne Franche-Comté, 2017. http://www.theses.fr/2017UBFCA019.

Full text

Abstract:

Aujourd'hui c'est un fait, nous vivons dans un monde où les enjeux écologiques, économiques et sociétaux sont de plus en plus pressants. Au croisement des différentes lignes directrices envisagées pour répondre à ces problèmes, une vision plus précise de la mobilité humaine est un axe central et majeur, qui a des répercussions sur tous les domaines associés tels que le transport, les sciences sociales, l'urbanisme, les politiques d'aménagement, l'écologie, etc. C'est par ailleurs dans un contexte de contraintes budgétaires fortes que les principaux acteurs de la mobilité sur les territoires cherchent à rationaliser les services de transport, et les déplacements des individus. La mobilité humaine est donc un enjeu stratégique aussi bien pour les collectivités locales que pour les usagers, qu'il faut savoir observer, comprendre, et anticiper.Cette étude de la mobilité passe avant tout par une observation précise des déplacements des usagers sur les territoires. Aujourd'hui les acteurs de la mobilité se tournent principalement vers l'utilisation massive des données utilisateurs. L'utilisation simultanée de données multi-sources, multi-modales, et multi-échelles permet d'entrevoir de nombreuses possibilités, mais cette dernière présente des défis technologiques et scientifiques majeurs. Les modèles de mobilité présentés dans la littérature sont ainsi trop souvent axés sur des zones d'expérimentation limitées, en utilisant des données calibrées, etc. et leur application dans des contextes réels, et à plus large échelle est donc discutable. Nous identifions ainsi deux problématiques majeures qui permettent de répondre à ce besoin d'une meilleure connaissance de la mobilité humaine, mais également à une meilleure application de cette connaissance. La première problématique concerne l'extraction d'informations de mobilité à partir de la fusion de données hétérogènes. La seconde problématique concerne la pertinence de cette fusion dans un contexte réel, et à plus large échelle. Nous apportons différents éléments de réponses à ces problématiques dans cette thèse. Tout d'abord en présentant deux modèles de fusion de données, qui permettent une extraction d'informations pertinentes. Puis, en analysant l'application de ces deux modèles au sein du projet ANR Norm-Atis.Dans cette thèse, nous suivons finalement le développement de toute une chaine de processus. En commençant par une étude de la mobilité humaine, puis des modèles de mobilité, nous présentons deux modèles de fusion de données, et nous analysons leur pertinence dans un cas concret. Le premier modèle que nous proposons permet d'extraire 12 comportements types de mobilité. Il est basé sur un apprentissage non-supervisé de données issues de la téléphonie mobile. Nous validons nos résultats en utilisant des données officielles de l'INSEE, et nous déduisons de nos résultats, des comportements dynamiques qui ne peuvent pas être observés par les données de mobilité traditionnelles. Ce qui est une forte valeur-ajoutée de notre modèle. Le second modèle que nous proposons permet une désagrégation des flux de mobilité en six motifs de mobilité. Il se base sur un apprentissage supervisé des données issues d'enquêtes de déplacements ainsi que des données statiques de description du sursol. Ce modèle est appliqué par la suite aux données agrégés au sein du projet Norm-Atis. Les temps de calculs sont suffisamment performants pour permettre une application de ce modèle dans un contexte temps-réel
Today it is a fact that we live in a world where ecological, economic and societal issues are increasingly pressing. At the crossroads of the various guidelines envisaged to address these problems, a more accurate vision of human mobility is a central and major axis, which has repercussions on all related fields such as transport, social sciences, urban planning, management policies, ecology, etc. It is also in the context of strong budgetary constraints that the main actors of mobility on the territories seek to rationalize the transport services and the movements of individuals. Human mobility is therefore a strategic challenge both for local communities and for users, which must be observed, understood and anticipated.This study of mobility is based above all on a precise observation of the movements of users on the territories. Nowadays mobility operators are mainly focusing on the massive use of user data. The simultaneous use of multi-source, multi-modal, and multi-scale data opens many possibilities, but the latter presents major technological and scientific challenges. The mobility models presented in the literature are too often focused on limited experimental areas, using calibrated data, etc., and their application in real contexts and on a larger scale is therefore questionable. We thus identify two major issues that enable us to meet this need for a better knowledge of human mobility, but also to a better application of this knowledge. The first issue concerns the extraction of mobility information from heterogeneous data fusion. The second problem concerns the relevance of this fusion in a real context, and on a larger scale. These issues are addressed in this dissertation: the first, through two data fusion models that allow the extraction of mobility information, the second through the application of these fusion models within the ANR Norm-Atis project.In this thesis, we finally follow the development of a whole chain of processes. Starting with a study of human mobility, and then mobility models, we present two data fusion models, and we analyze their relevance in a concrete case. The first model we propose allows to extract 12 types of mobility behaviors. It is based on an unsupervised learning of mobile phone data. We validate our results using official data from the INSEE, and we infer from our results, dynamic behaviors that can not be observed through traditional mobility data. This is a strong added-value of our model. The second model operates a mobility flows decompositoin into six mobility purposes. It is based on a supervised learning of mobility surveys data and static data from the land use. This model is then applied to the aggregated data within the Norm-Atis project. The computing times are sufficiently powerful to allow an application of this model in a real-time context

APA, Harvard, Vancouver, ISO, and other styles

5

Foucard, Rémi. "Fusion multi-niveaux par boosting pour le tagging automatique." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0093/document.

Full text

Abstract:

Les tags constituent un outil très utile pour indexer des documents multimédias. Cette thèse de doctorat s’intéresse au tagging automatique, c’est à dire l’association automatique par un algorithme d’un ensemble de tags à chaque morceau. Nous utilisons des techniques de boosting pour réaliser un apprentissage prenant mieux en compte la richesse de l’information exprimée par la musique. Un algorithme de boosting est proposé, afin d’utiliser conjointement des descriptions de morceaux associées à des extraits de différentes durées. Nous utilisons cet algorithme pour fusionner de nouvelles descriptions, appartenant à différents niveaux d’abstraction. Enfin, un nouveau cadre d’apprentissage est proposé pour le tagging automatique, qui prend mieux en compte les subtilités des associations entre les tags et les morceaux
Tags constitute a very useful tool for multimedia document indexing. This PhD thesis deals with automatic tagging, which consists in associating a set of tags to each song automatically, using an algorithm. We use boosting techniques to design a learning which better considers the complexity of the information expressed by music. A boosting algorithm is proposed, which can jointly use song descriptions associated to excerpts of different durations. This algorithm is used to fuse new descriptions, which belong to different abstraction levels. Finally, a new learning framework is proposed for automatic tagging, which better leverages the subtlety ofthe information expressed by music

APA, Harvard, Vancouver, ISO, and other styles

6

Gulen, Elvan. "Fusing Semantic Information Extracted From Visual, Auditory And Textual Data Of Videos." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614582/index.pdf.

Full text

Abstract:

In recent years, due to the increasing usage of videos, manual information extraction is becoming insufficient to users. Therefore, extracting semantic information automatically turns out to be a serious requirement. Today, there exists some systems that extract semantic information automatically by using visual, auditory and textual data separately but the number of studies that uses more than one data source is very limited. As some studies on this topic have already shown, using multimodal video data for automatic information extraction ensures getting better results by guaranteeing increase in the accuracy of semantic information that is retrieved from visual, auditory and textual sources. In this thesis, a complete system which fuses the semantic information that is obtained from visual, auditory and textual video data is introduced. The fusion system carries out the following procedures
analyzing and uniting the semantic information that is extracted from multimodal data by utilizing concept interactions and consequently generating a semantic dataset which is ready to be stored in a database. Besides, experiments are conducted to compare results obtained from the proposed multimodal fusion operation with results obtained as an outcome of semantic information extraction from just one modality and other fusion methods. The results indicate that fusing all available information along with concept relations yields better results than any unimodal approaches and other traditional fusion methods in overall.

APA, Harvard, Vancouver, ISO, and other styles

7

Muhammad, Hanif Shehzad. "Feature selection and classifier combination: Application to the extraction of textual information in scene images." Paris 6, 2009. http://www.theses.fr/2009PA066521.

Full text

Abstract:

Dans cette thèse, nous avons traité le problème de la détection et de la localisation dans les images de scène. Notre système est composé de deux parties : le Détecteur de texte et le Localiseur de texte. Le détecteur de texte (une cascade de classifieurs boostés) emploie la méthode de dopage qui sélectionne et combine des descripteurs et des classifieurs faibles pertinents. Plus précisément, nous avons proposé une version régularisée de l’algorithme AdaBoost qui intègre la complexité (liée à la charge de calcul) des descripteurs et des classifieurs faibles dans la phase de sélection. Nous avons proposé des descripteurs hétérogènes pour coder l’information textuelle dans les images. Nos règles de classification appartiennent des différentes classes de classifieurs : discriminant, linéaire et non-linéaire, paramétrique et non-paramétrique. Le détecteur génère des régions candidates de texte qui servent d’entrées au localiseur de texte dont l’objectif est de trouver des rectangles englobants, autour des mots ou des lignes de texte dans l’image. Les résultats sur deux bases d’images difficiles montrent l’efficacité de notre approche.

APA, Harvard, Vancouver, ISO, and other styles

8

Skibinski, Sebastian [Verfasser], Heinrich [Akademischer Betreuer] Müller, and Uwe [Gutachter] Schwiegelshohn. "Extraction, localization, and fusion of collective vehicle data / Sebastian Skibinski ; Gutachter: Uwe Schwiegelshohn ; Betreuer: Heinrich Müller." Dortmund : Universitätsbibliothek Dortmund, 2019. http://d-nb.info/1191990192/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Skibinski, Sebastian [Verfasser], Heinrich Akademischer Betreuer] Müller, and Uwe [Gutachter] [Schwiegelshohn. "Extraction, localization, and fusion of collective vehicle data / Sebastian Skibinski ; Gutachter: Uwe Schwiegelshohn ; Betreuer: Heinrich Müller." Dortmund : Universitätsbibliothek Dortmund, 2019. http://d-nb.info/1191990192/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Apatean, Anca Ioana. "Contributions à la fusion des informations : application à la reconnaissance des obstacles dans les images visible et infrarouge." Phd thesis, INSA de Rouen, 2010. http://tel.archives-ouvertes.fr/tel-00621202.

Full text

Abstract:

Afin de poursuivre et d'améliorer la tâche de détection qui est en cours à l'INSA, nous nous sommes concentrés sur la fusion des informations visibles et infrarouges du point de vue de reconnaissance des obstacles, ainsi distinguer entre les véhicules, les piétons, les cyclistes et les obstacles de fond. Les systèmes bimodaux ont été proposées pour fusionner l'information à différents niveaux: des caractéristiques, des noyaux SVM, ou de scores SVM. Ils ont été pondérés selon l'importance relative des capteurs modalité pour assurer l'adaptation (fixe ou dynamique) du système aux conditions environnementales. Pour évaluer la pertinence des caractéristiques, différentes méthodes de sélection ont été testés par un PPV, qui fut plus tard remplacée par un SVM. Une opération de recherche de modèle, réalisée par 10 fois validation croisée, fournit le noyau optimisé pour SVM. Les résultats ont prouvé que tous les systèmes bimodaux VIS-IR sont meilleurs que leurs correspondants monomodaux.

APA, Harvard, Vancouver, ISO, and other styles

11

Viardot, Geoffroy. "Reconnaissance des formes en présence d'incertitude sur l'expertise : application à l'étude des phases d'activation transitoire du sommeil chez l'homme." Troyes, 2002. http://www.theses.fr/2002TROY0004.

Full text

Abstract:

L'élaboration d'une règle de décision à partir d'une base d'apprentissage suppose en général que l'étiquetage des données utilisées n'est pas entaché d'erreur, et consiste alors en un problème de régression de la variable à expliquer sur l'espace des observations. Cette situation idéale n'est cependant pas toujours réaliste. En effet, l'expertise humaine est en général entachée d'incertitude. On peut en particulier observer ce phénomène si plusieurs experts sont sollicités pour l'étiquetage des données et que leurs avis divergent, phénomène fréquemment rencontré par exemple dans le domaine du traitement des signaux biomédicaux. La littérature propose un certain nombre de travaux relatifs à la prise en compte d'avis divergents lors de l'élaboration d'une règle de décision. Les méthodes correspondantes reposent en général sur la détermination préalable du avis "de référence", obtenu de manière ad hoc à partir de l'ensemble des avis des divers experts. L'objectif de nos travaux est, d'une part, de proposer une solution au problème de la fusion d'avis divergents d'experts sans faire appel à la détermination d'un avis "de référence" et, d'autre part, de quantifier la pertinence des expertises individuelles. Le principe de la méthode présentée repose sur l'optimisation de l'information mutuelle entre les observations et une fonctionnelle des avis des experts. L'interprétation du résultat obtenu permet alors de quantifier le comportement de chacun des experts. Le résultat de la fusion peut enfin être utilisé en vue de l'élaboration d'une règle de décision. La méthode proposée est appliquée avec succès à l'étude d'un phénomène physiologique du sommeil : les phases d'activation transitoire. Elle permet à la fois d'obtenir un étiquetage unique des données et de caractériser conjointement les expertises et l'événement étudié
Decision rules design from training data generally assumes that true labels of data are available. Then design consists of a regression problem of the variables to explain on observation space measurements. Unfortunately this situation is not always realistic due to the frequently corruption of human expertise by uncertainty. This phenomenon clearly appears when data are labelled by a pool of experts who have differing opinions. This often happens in the field of biomedical signal processing. Many works dealing with the use of differing opinions when designing decision rules have been considered in the literature. The main approach consists in synthesizing a consensual opinion from the differing opinions. Our work proposes a new solution to this problem which allows to characterize each expert relevance. It consists in optimizing the mutual information between the observations and a function of the labels provided by the pool of experts. The analysis of results allows th characterize each expert behavior, and the resulting labels can be used to design decision rules. Our method has been successfully applied to phasic arousals chich are transient events visible in poysomnogram during human sleep

APA, Harvard, Vancouver, ISO, and other styles

12

König, Rikard. "Predictive Techniques and Methods for Decision Support in Situations with Poor Data Quality." Licentiate thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-3517.

Full text

Abstract:

Today, decision support systems based on predictive modeling are becoming more common, since organizations often collectmore data than decision makers can handle manually. Predictive models are used to find potentially valuable patterns in the data, or to predict the outcome of some event. There are numerous predictive techniques, ranging from simple techniques such as linear regression,to complex powerful ones like artificial neural networks. Complexmodels usually obtain better predictive performance, but are opaque and thus cannot be used to explain predictions or discovered patterns.The design choice of which predictive technique to use becomes even harder since no technique outperforms all others over a large set of problems. It is even difficult to find the best parameter values for aspecific technique, since these settings also are problem dependent.One way to simplify this vital decision is to combine several models, possibly created with different settings and techniques, into an ensemble. Ensembles are known to be more robust and powerful than individual models, and ensemble diversity can be used to estimate the uncertainty associated with each prediction.In real-world data mining projects, data is often imprecise, contain uncertainties or is missing important values, making it impossible to create models with sufficient performance for fully automated systems.In these cases, predictions need to be manually analyzed and adjusted.Here, opaque models like ensembles have a disadvantage, since theanalysis requires understandable models. To overcome this deficiencyof opaque models, researchers have developed rule extractiontechniques that try to extract comprehensible rules from opaquemodels, while retaining sufficient accuracy.This thesis suggests a straightforward but comprehensive method forpredictive modeling in situations with poor data quality. First,ensembles are used for the actual modeling, since they are powerful,robust and require few design choices. Next, ensemble uncertaintyestimations pinpoint predictions that need special attention from adecision maker. Finally, rule extraction is performed to support theanalysis of uncertain predictions. Using this method, ensembles can beused for predictive modeling, in spite of their opacity and sometimesinsufficient global performance, while the involvement of a decisionmaker is minimized.The main contributions of this thesis are three novel techniques that enhance the performance of the purposed method. The first technique deals with ensemble uncertainty estimation and is based on a successful approach often used in weather forecasting. The other twoare improvements of a rule extraction technique, resulting in increased comprehensibility and more accurate uncertainty estimations.

Sponsorship:

This work was supported by the Information Fusion Research

Program (www.infofusion.se) at the University of Skövde, Sweden, in

partnership with the Swedish Knowledge Foundation under grant

2003/0104.

APA, Harvard, Vancouver, ISO, and other styles

13

Döhling, Lars. "Extracting and Aggregating Temporal Events from Texts." Doctoral thesis, Humboldt-Universität zu Berlin, 2017. http://dx.doi.org/10.18452/18454.

Full text

Abstract:

Das Finden von zuverlässigen Informationen über gegebene Ereignisse aus großen und dynamischen Textsammlungen, wie dem Web, ist ein wichtiges Thema. Zum Beispiel sind Rettungsteams und Versicherungsunternehmen an prägnanten Fakten über Schäden nach Katastrophen interessiert, die heutzutage online in Web-Blogs, Zeitungsartikeln, Social Media etc. zu finden sind. Solche Fakten helfen, die erforderlichen Hilfsmaßnahmen zu bestimmen und unterstützen deren Koordination. Allerdings ist das Finden, Extrahieren und Aggregieren nützlicher Informationen ein hochkomplexes Unterfangen: Es erfordert die Ermittlung geeigneter Textquellen und deren zeitliche Einordung, die Extraktion relevanter Fakten in diesen Texten und deren Aggregation zu einer verdichteten Sicht auf die Ereignisse, trotz Inkonsistenzen, vagen Angaben und Veränderungen über die Zeit. In dieser Arbeit präsentieren und evaluieren wir Techniken und Lösungen für jedes dieser Probleme, eingebettet in ein vierstufiges Framework. Die angewandten Methoden beruhen auf Verfahren des Musterabgleichs, der Verarbeitung natürlicher Sprache und des maschinellen Lernens. Zusätzlich berichten wir über die Ergebnisse zweier Fallstudien, basierend auf dem Einsatz des gesamten Frameworks: Die Ermittlung von Daten über Erdbeben und Überschwemmungen aus Webdokumenten. Unsere Ergebnisse zeigen, dass es unter bestimmten Umständen möglich ist, automatisch zuverlässige und zeitgerechte Daten aus dem Internet zu erhalten.
Finding reliable information about given events from large and dynamic text collections, such as the web, is a topic of great interest. For instance, rescue teams and insurance companies are interested in concise facts about damages after disasters, which can be found today in web blogs, online newspaper articles, social media, etc. Knowing these facts helps to determine the required scale of relief operations and supports their coordination. However, finding, extracting, and condensing specific facts is a highly complex undertaking: It requires identifying appropriate textual sources and their temporal alignment, recognizing relevant facts within these texts, and aggregating extracted facts into a condensed answer despite inconsistencies, uncertainty, and changes over time. In this thesis, we present and evaluate techniques and solutions for each of these problems, embedded in a four-step framework. Applied methods are pattern matching, natural language processing, and machine learning. We also report the results for two case studies applying our entire framework: gathering data on earthquakes and floods from web documents. Our results show that it is, under certain circumstances, possible to automatically obtain reliable and timely data from the web.

APA, Harvard, Vancouver, ISO, and other styles

14

Kaytoue, Mehdi. "Traitement de données numériques par analyse formelle de concepts et structures de patrons." Phd thesis, Université Henri Poincaré - Nancy I, 2011. http://tel.archives-ouvertes.fr/tel-00599168.

Full text

Abstract:

Le sujet principal de cette thèse porte sur la fouille de données numériques et plus particulièrement de données d'expression de gènes. Ces données caractérisent le comportement de gènes dans diverses situations biologiques (temps, cellule, etc.). Un problème important consiste à établir des groupes de gènes partageant un même comportement biologique. Cela permet d'identifier les gènes actifs lors d'un processus biologique, comme par exemple les gènes actifs lors de la défense d'un organisme face à une attaque. Le cadre de la thèse s'inscrit donc dans celui de l'extraction de connaissances à partir de données biologiques. Nous nous proposons d'étudier comment la méthode de classification conceptuelle qu'est l'analyse formelle de concepts (AFC) peut répondre au problème d'extraction de familles de gènes. Pour cela, nous avons développé et expérimenté diverses méthodes originales en nous appuyant sur une extension peu explorée de l'AFC : les structures de patrons. Plus précisément, nous montrons comment construire un treillis de concepts synthétisant des familles de gènes à comportement similaire. L'originalité de ce travail est (i) de construire un treillis de concepts sans discrétisation préalable des données de manière efficace, (ii) d'introduire une relation de similarité entres les gènes et (iii) de proposer des ensembles minimaux de conditions nécessaires et suffisantes expliquant les regroupements formés. Les résultats de ces travaux nous amènent également à montrer comment les structures de patrons peuvent améliorer la prise de d écision quant à la dangerosité de pratiques agricoles dans le vaste domaine de la fusion d'information.

APA, Harvard, Vancouver, ISO, and other styles

15

Bujor, Florentin. "Extraction - fusion d'informations en imagerie radar multi-temporelle." Chambéry, 2004. http://www.theses.fr/2004CHAMS023.

Full text

Abstract:

Les travaux présentés dans cette thèse s'articulent autour de trois axes : deux axes méthodologiques, l'extraction ert la fusion d'informations en imagerie radar à synthèse d'ouverture (RSO) multi-temporelle, et un axe plus thématique, l'application des méthodes proposées à la détection de changements et d'objets géographiques stables. La stratégie "extraction-fusion" permet d'exploiter les données RSO multi-temporelles dans des contextes opérationnels tels que le suivi des déforestations ou la mise à jour de cartes dans des régions où l'imagerie satellitaire optique se heurte aux mauvaises conditions météorologiques. Les attributs développés dans l'axe extraction d'informations apportent une contribution originale à l'analyse des images RSO multi-temporelles. Ils s'appuient sur les paramètres existants pour les images mono-dates et les étendent au cas multi-temporel pour exploiter soit la redondance de l'information en vue d'améliorer la détection des structures stables, soit les modifications de la radiométrie pour détecter les changements survenus entre les acquisitions. Le second axe méthodologique consiste à fusionner les informations extraites sous forme d'attributs par une méthode adpatée au contexte de la élédétection spatiale. La collaboration avec des géophysiciens nous a conduit à développer une méthode de fusion interactive qui intègre la connaissance qu'ont les experts des zones recherchées et du comportement des attributs. Un système de fusion symbolique floue permet d'atteindre cet objectif au travers d'une interface graphique construite en IDL, langage du logiciel ENVI très répandu dans la communauté de la télédétection. Outre les fonctionnalités de réglage interactif des fonctions d'appartenance, l'interface dévelpppée incorpore un ensemble de fonctionnalités qui la rendent facilement utilisable pour des opérateurs appartenant aux domaines d'application. Sur l'axe thématique, trois exemples d'application illustrent l'approche extraction-fusion dans les deux principales directions d'exploitation de l'imagerie RSO multi-temporelle : la détection de changements et l'amélioration de la détection des structures stables. Les données utilisées sont des images RSO acquises dans des régions tropicales humides par les satellites ERS dont les archives importantes permettent aujourd'hui de disposer de nombreuses séries multi-temporelles. Les résultats obtenus mettent en valeur le potentiel des données RSO pour la détection de changements et l'intérêt d'une détection simultanée des discontinuités spatio-temporelles dans les volumes formés par les séries d'images. A terme, le développement et le tranfert vers les utilisateurs finaéux de "bibliothèques" d'attributs dédiés à l'extraction d'informations dans les séries d'images radar doit permettre une meilleure utilisation de ces données. Ce transfert doit également comporter des méthodes de fusion d'informations permettant de combiner un ensemble de paramètres sélectionnés par les experts. La méthode de fusion floue interactive proposée est une alternative à des méthodes de classification automatiques ou supervisées, qui permet d'intégrer les connaissances expertes et de laisser l'utilisateur final piloter la fusion vers une solution satifaisante
The work presented in this thesis is articulated around three axis : two methodological axis, information extraction and fusion in multi-temporal synthetic aperture radar (SAR) imagery, and one more thematic axis, the application of the proposed methods for change detection and for detection of geographical stable objects. The "extraction-fusion" strategy allows the use of multi-temporal SAR data in the operational context of deforestation monitoring or geographical map updating, in regions where satellite optical imagery meets bad weather conditions. The attributes developed in the information extraction axis bring an original contribution to the multi-temporal SAR image analysis. Based on the existing parameters for the single-date images, the attributes are extended to the multi-temporl case to exploit either the information redundancy for the detection improvement of the stable structures, or the radiometry modificatins to detect the changes occured between acquisistions. The second methodological axis consists in fusing the extracted information in the form of attributes by an original method designed for the special context of spce-borne remote sensing. The collaboration with geophysicists led us to develop an interactive fusion method which integrates the expert knowledge of the researched zones and about the behavior of the attributes. A symbolic fuzzy fusion system allows to reach this objective through a graphical user interface (GUI) built in IDL, the programming language of ENVI, a widespread software in the remote sensing community. In addition to the functionalities of interactive adjustment of the membership functions, the developed GUI incorporates a set of functionalities which make it user-friendly for operators belonging to the application fields

APA, Harvard, Vancouver, ISO, and other styles

16

Labský, Martin. "Information Extraction from Websites using Extraction Ontologies." Doctoral thesis, Vysoká škola ekonomická v Praze, 2002. http://www.nusl.cz/ntk/nusl-77102.

Full text

Abstract:

Automatic information extraction (IE) from various types of text became very popular during the last decade. Owing to information overload, there are many practical applications that can utilize semantically labelled data extracted from textual sources like the Internet, emails, intranet documents and even conventional sources like newspaper and magazines. Applications of IE exist in many areas of computer science: information retrieval systems, question answering or website quality assessment. This work focuses on developing IE methods and tools that are particularly suited to extraction from semi-structured documents such as web pages and to situations where available training data is limited. The main contribution of this thesis is the proposed approach of extended extraction ontologies. It attempts to combine extraction evidence from three distinct sources: (1) manually specified extraction knowledge, (2) existing training data and (3) formatting regularities that are often present in online documents. The underlying hypothesis is that using extraction evidence of all three types by the extraction algorithm can help improve its extraction accuracy and robustness. The motivation for this work has been the lack of described methods and tools that would exploit these extraction evidence types at the same time. This thesis first describes a statistically trained approach to IE based on Hidden Markov Models which integrates with a picture classification algorithm in order to extract product offers from the Internet, including textual items as well as images. This approach is evaluated using a bicycle sale domain. Several methods of image classification using various feature sets are described and evaluated as well. These trained approaches are then integrated in the proposed novel approach of extended extraction ontologies, which builds on top of the work of Embley [21] by exploiting manual, trained and formatting types of extraction evidence at the same time. The intended benefit of using extraction ontologies is a quick development of a functional IE prototype, its smooth transition to deployed IE application and the possibility to leverage the use of each of the three extraction evidence types. Also, since extraction ontologies are typically developed by adapting suitable domain ontologies and the ontology remains in center of the extraction process, the work related to the conversion of extracted results back to a domain ontology or schema is minimized. The described approach is evaluated using several distinct real-world datasets.

APA, Harvard, Vancouver, ISO, and other styles

17

Arpteg, Anders. "Intelligent semi-structured information extraction : a user-driven approach to information extraction /." Linköping : Dept. of Computer and Information Science, Univ, 2005. http://www.bibl.liu.se/liupubl/disp/disp2005/tek946s.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Swampillai, Kumutha. "Information extraction across sentences." Thesis, University of Sheffield, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.575468.

Full text

Abstract:

Most relation extraction systems identify relations by searching within- sentences (within-sentence relations). Such an approach excludes finding any relations that cross sentence boundaries (cross-sentence relations). This thesis quantifies the cross-sentence relations in two major information ex- traction corpora: ACE03 (9.4%) and MUC6 (27.4%), revealing the extent of this limitation. In response. a composite kernel approach to cross-sentence relation extraction is proposed which models relations using parse tree and fiat surface features. Support vector machine classifiers are trained using cross-sentential relations from the !vIUC6 corpus to determine the effective- ness of this approach. It was shown .that composite kernels are able to extract cross-sentential relations with f-measure scores of 0.512, 0.116 and 0.633 for PerOrg. PerPost and PostOrg models. respectively. Moreover. combining within-sentence and cross-sentence extraction models increases the number of relations correctly identified by 24% over within-sentence relation extraction alone.

APA, Harvard, Vancouver, ISO, and other styles

19

Tablan, Mihai Valentin. "Toward portable information extraction." Thesis, University of Sheffield, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.522379.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Leen, Gayle. "Context assisted information extraction." Thesis, University of the West of Scotland, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.446043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Sottovia, Paolo. "Information Extraction from data." Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/242992.

Full text

Abstract:

Data analysis is the process of inspecting, cleaning, extract, and modeling data with the intention of extracting useful information in order to support users in their decisions. With the advent of Big Data, data analysis was becoming more complicated due to the volume and variety of data. This process begins with the acquisition of the data and the selection of the data that is useful for the desiderata analysis. With such amount of data, also expert users are not able to inspect the data and understand if a dataset is suitable or not for their purposes. In this dissertation, we focus on five problems in the broad data analysis process to help users find insights from the data when they do not have enough knowledge about its data. First, we analyze the data description problem, where the user is looking for a description of the input dataset. We introduce data descriptions: a compact, readable and insightful formula of boolean predicates that represents a set of data records. Finding the best description for a dataset is computationally expensive and task-specific; we, therefore, introduce a set of metrics and heuristics for generating meaningful descriptions at an interactive performance. Secondly, we look at the problem of order dependency discovery, which discovers another kind of metadata that may help the user in the understanding of characteristics of a dataset. Our approach leverages the observation that discovering order dependencies can be guided by the discovery of a more specific form of dependencies called order compatibility dependencies. Thirdly, textual data encodes much hidden information. To allow this data to reach its full potential, there has been an increasing interest in extracting structural information from it. In this regard, we propose a novel approach for extracting events that are based on temporal co-reference among entities. We consider an event to be a set of entities that collectively experience relationships between them in a specific period of time. We developed a distributed strategy that is able to scale with the largest on-line encyclopedia available, Wikipedia. Then, we deal with the evolving nature of the data by focusing on the problem of finding synonymous attributes in evolving Wikipedia Infoboxes. Over time, several attributes have been used to indicate the same characteristic of an entity. This provides several issues when we are trying to analyze the content of different time periods. To solve it, we propose a clustering strategy that combines two contrasting distance metrics. We developed an approximate solution that we assess over 13 years of Wikipedia history by proving its flexibility and accuracy. Finally, we tackle the problem of identifying movements of attributes in evolving datasets. In an evolving environment, entities not only change their characteristics, but they sometimes exchange them over time. We proposed a strategy where we are able to discover those cases, and we also test our strategy on real datasets. We formally present the five problems that we validate both in terms of theoretical results and experimental evaluation, and we demonstrate that the proposed approaches efficiently scale with a large amount of data.

APA, Harvard, Vancouver, ISO, and other styles

22

Kaupp, Tobias. "Probabilistic Human-Robot Information Fusion." University of Sydney, 2008. http://hdl.handle.net/2123/2554.

Full text

Abstract:

PhD
This thesis is concerned with combining the perceptual abilities of mobile robots and human operators to execute tasks cooperatively. It is generally agreed that a synergy of human and robotic skills offers an opportunity to enhance the capabilities of today’s robotic systems, while also increasing their robustness and reliability. Systems which incorporate both human and robotic information sources have the potential to build complex world models, essential for both automated and human decision making. In this work, humans and robots are regarded as equal team members who interact and communicate on a peer-to-peer basis. Human-robot communication is addressed using probabilistic representations common in robotics. While communication can in general be bidirectional, this work focuses primarily on human-to-robot information flow. More specifically, the approach advocated in this thesis is to let robots fuse their sensor observations with observations obtained from human operators. While robotic perception is well-suited for lower level world descriptions such as geometric properties, humans are able to contribute perceptual information on higher abstraction levels. Human input is translated into the machine representation via Human Sensor Models. A common mathematical framework for humans and robots reinforces the notion of true peer-to-peer interaction. Human-robot information fusion is demonstrated in two application domains: (1) scalable information gathering, and (2) cooperative decision making. Scalable information gathering is experimentally demonstrated on a system comprised of a ground vehicle, an unmanned air vehicle, and two human operators in a natural environment. Information from humans and robots was fused in a fully decentralised manner to build a shared environment representation on multiple abstraction levels. Results are presented in the form of information exchange patterns, qualitatively demonstrating the benefits of human-robot information fusion. The second application domain adds decision making to the human-robot task. Rational decisions are made based on the robots’ current beliefs which are generated by fusing human and robotic observations. Since humans are considered a valuable resource in this context, operators are only queried for input when the expected benefit of an observation exceeds the cost of obtaining it. The system can be seen as adjusting its autonomy at run-time based on the uncertainty in the robots’ beliefs. A navigation task is used to demonstrate the adjustable autonomy system experimentally. Results from two experiments are reported: a quantitative evaluation of human-robot team effectiveness, and a user study to compare the system to classical teleoperation. Results show the superiority of the system with respect to performance, operator workload, and usability.

APA, Harvard, Vancouver, ISO, and other styles

23

Xu, Philippe. "Information fusion for scene understanding." Thesis, Compiègne, 2014. http://www.theses.fr/2014COMP2153/document.

Full text

Abstract:

La compréhension d'image est un problème majeur de la robotique moderne, la vision par ordinateur et l'apprentissage automatique. En particulier, dans le cas des systèmes avancés d'aide à la conduite, la compréhension de scènes routières est très importante. Afin de pouvoir reconnaître le grand nombre d’objets pouvant être présents dans la scène, plusieurs capteurs et algorithmes de classification doivent être utilisés. Afin de pouvoir profiter au mieux des méthodes existantes, nous traitons le problème de la compréhension de scènes comme un problème de fusion d'informations. La combinaison d'une grande variété de modules de détection, qui peuvent traiter des classes d'objets différentes et utiliser des représentations distinctes, est faites au niveau d'une image. Nous considérons la compréhension d'image à deux niveaux : la détection d'objets et la segmentation sémantique. La théorie des fonctions de croyance est utilisée afin de modéliser et combiner les sorties de ces modules de détection. Nous mettons l'accent sur la nécessité d'avoir un cadre de fusion suffisamment flexible afin de pouvoir inclure facilement de nouvelles classes d'objets, de nouveaux capteurs et de nouveaux algorithmes de détection d'objets. Dans cette thèse, nous proposons une méthode générale permettant de transformer les sorties d’algorithmes d'apprentissage automatique en fonctions de croyance. Nous étudions, ensuite, la combinaison de détecteurs de piétons en utilisant les données Caltech Pedestrian Detection Benchmark. Enfin, les données du KITTI Vision Benchmark Suite sont utilisées pour valider notre approche dans le cadre d'une fusion multimodale d'informations pour de la segmentation sémantique
Image understanding is a key issue in modern robotics, computer vison and machine learning. In particular, driving scene understanding is very important in the context of advanced driver assistance systems for intelligent vehicles. In order to recognize the large number of objects that may be found on the road, several sensors and decision algorithms are necessary. To make the most of existing state-of-the-art methods, we address the issue of scene understanding from an information fusion point of view. The combination of many diverse detection modules, which may deal with distinct classes of objects and different data representations, is handled by reasoning in the image space. We consider image understanding at two levels : object detection ans semantic segmentation. The theory of belief functions is used to model and combine the outputs of these detection modules. We emphazise the need of a fusion framework flexible enough to easily include new classes, new sensors and new object detection algorithms. In this thesis, we propose a general method to model the outputs of classical machine learning techniques as belief functions. Next, we apply our framework to the combination of pedestrian detectors using the Caltech Pedestrain Detection Benchmark. The KITTI Vision Benchmark Suite is then used to validate our approach in a semantic segmentation context using multi-modal information

APA, Harvard, Vancouver, ISO, and other styles

24

Johansson, Ronnie. "Large-Scale Information Acquisition for Data and Information Fusion." Doctoral thesis, Stockholm, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3890.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Houzelle, Stéphane. "Extraction automatique d'objets cartographiques par fusion d'informations extraites d'images satellites /." Paris : École nationale supérieure des télécommunications, 1993. http://catalogue.bnf.fr/ark:/12148/cb355798989.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Arpteg, Anders. "Adaptive Semi-structured Information Extraction." Licentiate thesis, Linköping University, Linköping University, KPLAB - Knowledge Processing Lab, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5688.

Full text

Abstract:

The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.

The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.

The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.

An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.

Report code: LiU-Tek-Lic-2002:73.

APA, Harvard, Vancouver, ISO, and other styles

27

Schierle, Martin. "Language Engineering for Information Extraction." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-81757.

Full text

Abstract:

Accompanied by the cultural development to an information society and knowledge economy and driven by the rapid growth of the World Wide Web and decreasing prices for technology and disk space, the world\'s knowledge is evolving fast, and humans are challenged with keeping up. Despite all efforts on data structuring, a large part of this human knowledge is still hidden behind the ambiguities and fuzziness of natural language. Especially domain language poses new challenges by having specific syntax, terminology and morphology. Companies willing to exploit the information contained in such corpora are often required to build specialized systems instead of being able to rely on off the shelf software libraries and data resources. The engineering of language processing systems is however cumbersome, and the creation of language resources, annotation of training data and composition of modules is often enough rather an art than a science. The scientific field of Language Engineering aims at providing reliable information, approaches and guidelines of how to design, implement, test and evaluate language processing systems. Language engineering architectures have been a subject of scientific work for the last two decades and aim at building universal systems of easily reusable components. Although current systems offer comprehensive features and rely on an architectural sound basis, there is still little documentation about how to actually build an information extraction application. Selection of modules, methods and resources for a distinct usecase requires a detailed understanding of state of the art technology, application demands and characteristics of the input text. The main assumption underlying this work is the thesis that a new application can only occasionally be created by reusing standard components from different repositories. This work recapitulates existing literature about language resources, processing resources and language engineering architectures to derive a theory about how to engineer a new system for information extraction from a (domain) corpus. This thesis was initiated by the Daimler AG to prepare and analyze unstructured information as a basis for corporate quality analysis. It is therefore concerned with language engineering in the area of Information Extraction, which targets the detection and extraction of specific facts from textual data. While other work in the field of information extraction is mainly concerned with the extraction of location or person names, this work deals with automotive components, failure symptoms, corrective measures and their relations in arbitrary arity. The ideas presented in this work will be applied, evaluated and demonstrated on a real world application dealing with quality analysis on automotive domain language. To achieve this goal, the underlying corpus is examined and scientifically characterized, algorithms are picked with respect to the derived requirements and evaluated where necessary. The system comprises language identification, tokenization, spelling correction, part of speech tagging, syntax parsing and a final relation extraction step. The extracted information is used as an input to data mining methods such as an early warning system and a graph based visualization for interactive root cause analysis. It is finally investigated how the unstructured data facilitates those quality analysis methods in comparison to structured data. The acceptance of these text based methods in the company\'s processes further proofs the usefulness of the created information extraction system.

APA, Harvard, Vancouver, ISO, and other styles

28

Lam, Man I. "Business information extraction from web." Thesis, University of Macau, 2008. http://umaclib3.umac.mo/record=b1937939.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Jessop, David M. "Information extraction from chemical patents." Thesis, University of Cambridge, 2011. https://www.repository.cam.ac.uk/handle/1810/238302.

Full text

Abstract:

The automated extraction of semantic chemical data from the existing literature is demonstrated. For reasons of copyright, the work is focused on the patent literature, though the methods are expected to apply equally to other areas of the chemical literature. Hearst Patterns are applied to the patent literature in order to discover hyponymic relations describing chemical species. The acquired relations are manually validated to determine the precision of the determined hypernyms (85.0%) and of the asserted hyponymic relations (94.3%). It is demonstrated that the system acquires relations that are not present in the ChEBI ontology, suggesting that it could function as a valuable aid to the ChEBI curators. The relations discovered by this process are formalised using the Web Ontology Language (OWL) to enable re-use. PatentEye - an automated system for the extraction of reactions from chemical patents and their conversion to Chemical Markup Language (CML) - is presented. Chemical patents published by the European Patent Office over a ten-week period are used to demonstrate the capability of PatentEye - 4444 reactions are extracted with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra are extracted from the text using OSCAR3, which is developed to greatly increase recall. The resulting system is presented as a significant advancement towards the large-scale and automated extraction of high-quality reaction information. Extended Polymer Markup Language (EPML), a CML dialect for the description of Markush structures as they are presented in the literature, is developed. Software to exemplify and to enable substructure searching of EPML documents is presented. Further work is recommended to refine the language and code to publication-quality before they are presented to the community.

APA, Harvard, Vancouver, ISO, and other styles

30

Nguyen, Thien Huu. "Deep Learning for Information Extraction." Thesis, New York University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10260911.

Full text

Abstract:

The explosion of data has made it crucial to analyze the data and distill important information effectively and efficiently. A significant part of such data is presented in unstructured and free-text documents. This has prompted the development of the techniques for information extraction that allow computers to automatically extract structured information from the natural free-text data. Information extraction is a branch of natural language processing in artificial intelligence that has a wide range of applications, including question answering, knowledge base population, information retrieval etc. The traditional approach for information extraction has mainly involved hand-designing large feature sets (feature engineering) for different information extraction problems, i.e, entity mention detection, relation extraction, coreference resolution, event extraction, and entity linking. This approach is limited by the laborious and expensive effort required for feature engineering for different domains, and suffers from the unseen word/feature problem of natural languages.

This dissertation explores a different approach for information extraction that uses deep learning to automate the representation learning process and generate more effective features. Deep learning is a subfield of machine learning that uses multiple layers of connections to reveal the underlying representations of data. I develop the fundamental deep learning models for information extraction problems and demonstrate their benefits through systematic experiments.

First, I examine word embeddings, a general word representation that is produced by training a deep learning model on a large unlabelled dataset. I introduce methods to use word embeddings to obtain new features that generalize well across domains for relation extraction. This is done for both the feature-based method and the kernel-based method of relation extraction.

Second, I investigate deep learning models for different problems, including entity mention detection, relation extraction and event detection. I develop new mechanisms and network architectures that allow deep learning to model the structures of information extraction problems more effectively. Some extensive experiments are conducted on the domain adaptation and transfer learning settings to highlight the generalization advantage of the deep learning models for information extraction.

Finally, I investigate the joint frameworks to simultaneously solve several information extraction problems and benefit from the inter-dependencies among these problems. I design a novel memory augmented network for deep learning to properly exploit such inter-dependencies. I demonstrate the effectiveness of this network on two important problems of information extraction, i.e, event extraction and entity linking.

APA, Harvard, Vancouver, ISO, and other styles

31

Lee, Ji Young Ph D. Massachusetts Institute of Technology. "Information extraction with neural networks." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/111905.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 85-97).
Electronic health records (EHRs) have been widely adopted, and are a gold mine for clinical research. However, EHRs, especially their text components, remain largely unexplored due to the fact that they must be de-identified prior to any medical investigation. Existing systems for de-identification rely on manual rules or features, which are time-consuming to develop and fine-tune for new datasets. In this thesis, we propose the first de-identification system based on artificial neural networks (ANNs), which achieves state-of-the-art results without any human-engineered features. The ANN architecture is extended to incorporate features, further improving the de-identification performance. Under practical considerations, we explore transfer learning to take advantage of large annotated dataset to improve the performance on datasets with limited number of annotations. The ANN-based system is publicly released as an easy-to-use software package for general purpose named-entity recognition as well as de-identification. Finally, we present an ANN architecture for relation extraction, which ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C).
by Ji Young Lee.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

32

Harik, Ralph 1979. "Structural and semantic information extraction." Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87407.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Valenzuela, Escárcega Marco Antonio. "Interpretable Models for Information Extraction." Diss., The University of Arizona, 2016. http://hdl.handle.net/10150/613348.

Full text

Abstract:

There is an abundance of information being generated constantly, most of it encoded as unstructured text. The information expressed this way, although publicly available, is not directly usable by computer systems because it is not organized according to a data model that could inform us how different data nuggets relate to each other. Information extraction provides a way of scanning unstructured text and extracting structured knowledge suitable for querying and manipulation. Most information extraction research focuses on machine learning approaches that can be considered black boxes when deployed in information extraction systems. We propose a declarative language designed for the information extraction task. It allows the use of syntactic patterns alongside token-based surface patterns that incorporate shallow linguistic features. It captures complex constructs such as nested structures, and complex regular expressions over syntactic patterns for event arguments. We implement a novel information extraction runtime system designed for the compilation and execution of the proposed language. The runtime system has novel features for better declarative support, while preserving practicality. It supports features required for handling natural language, like the preservation of ambiguity and the efficient use of contextual information. It has a modular architecture that allows it to be extended with new functionality, which, together with the language design, provides a powerful framework for the development and research of new ideas for declarative information extraction. We use our language and runtime system to build a biomedical information extraction system. This system is capable of recognizing biological entities (e.g., genes, proteins, protein families, simple chemicals), events over entities (e.g., biochemical reactions), and nested events that take other events as arguments (e.g., catalysis). Additionally, it captures complex natural language phenomena like coreference and hedging. Finally, we propose a rule learning procedure to extract rules from statistical systems trained for information extraction. Rule learning combines the advantages of machine learning with the interpretability of our models. This enables us to train information extraction systems using annotated data that can then be extended and modified by human experts, and in this way accelerate the deployment of new systems that can still be extended or modified by human experts.

APA, Harvard, Vancouver, ISO, and other styles

34

Perera, Pathirage Dinindu Sujan Udayanga. "Knowledge-driven Implicit Information Extraction." Wright State University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1472474558.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Batista-Navarro, Riza Theresa Bautista. "Information extraction from pharmaceutical literature." Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/information-extraction-from-pharmaceutical-literature(3f8322b6-8b8d-44eb-a8cd-899026b267b9).html.

Full text

Abstract:

With the constantly growing amount of biomedical literature, methods for automatically distilling information from unstructured data, collectively known as information extraction, have become indispensable. Whilst most biomedical information extraction efforts in the last decade have focussed on the identification of gene products and interactions between them, the biomedical text mining community has recently extended their scope to capture associations between biomedical and chemical entities with the aim of supporting applications in drug discovery. This thesis is the first comprehensive study focussing on information extraction from pharmaceutical chemistry literature. In this research, we describe our work on (1) recognising names of chemical compounds and drugs, facilitated by the incorporation of domain knowledge; (2) exploring different coreference resolution paradigms in order to recognise co-referring expressions given a full-text article; and (3) defining drug-target interactions as events and distilling them from pharmaceutical chemistry literature using event extraction methods.

APA, Harvard, Vancouver, ISO, and other styles

36

Kushmerick, Nicholas. "Wrapper induction for information extraction /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/6867.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Johansson, Ronnie. "Information Acquisition in Data Fusion Systems." Licentiate thesis, KTH, Numerical Analysis and Computer Science, NADA, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-1673.

Full text

Abstract:

By purposefully utilising sensors, for instance by a datafusion system, the state of some system-relevant environmentmight be adequately assessed to support decision-making. Theever increasing access to sensors o.ers great opportunities,but alsoincurs grave challenges. As a result of managingmultiple sensors one can, e.g., expect to achieve a morecomprehensive, resolved, certain and more frequently updatedassessment of the environment than would be possible otherwise.Challenges include data association, treatment of con.ictinginformation and strategies for sensor coordination.

We use the term information acquisition to denote the skillof a data fusion system to actively acquire information. Theaim of this thesis is to instructively situate that skill in ageneral context, explore and classify related research, andhighlight key issues and possible future work. It is our hopethat this thesis will facilitate communication, understandingand future e.orts for information acquisition.

The previously mentioned trend towards utilisation of largesets of sensors makes us especially interested in large-scaleinformation acquisition, i.e., acquisition using many andpossibly spatially distributed and heterogeneous sensors.

Information acquisition is a general concept that emerges inmany di.erent .elds of research. In this thesis, we surveyliterature from, e.g., agent theory, robotics and sensormanagement. We, furthermore, suggest a taxonomy of theliterature that highlights relevant aspects of informationacquisition.

We describe a function, perception management (akin tosensor management), which realizes information acquisition inthe data fusion process and pertinent properties of itsexternal stimuli, sensing resources, and systemenvironment.

An example of perception management is also presented. Thetask is that of managing a set of mobile sensors that jointlytrack some mobile targets. The game theoretic algorithmsuggested for distributing the targets among the sensors proveto be more robust to sensor failure than a measurement accuracyoptimal reference algorithm.

Keywords:information acquisition, sensor management,resource management, information fusion, data fusion,perception management, game theory, target tracking

APA, Harvard, Vancouver, ISO, and other styles

38

Nouranian, Saman. "Information fusion for prostate brachytherapy planning." Thesis, University of British Columbia, 2016. http://hdl.handle.net/2429/58305.

Full text

Abstract:

Low-dose-rate prostate brachytherapy is a minimally invasive treatment approach for localized prostate cancer. It takes place in one session by permanent implantation of several small radio-active seeds inside and adjacent to the prostate. The current procedure at the majority of institutions requires planning of seed locations prior to implantation from transrectal ultrasound (TRUS) images acquired weeks in advance. The planning is based on a set of contours representing the clinical target volume (CTV). Seeds are manually placed with respect to a planning target volume (PTV), which is an anisotropic dilation of the CTV, followed by dosimetry analysis. The main objective of the plan is to meet clinical guidelines in terms of recommended dosimetry by covering the entire PTV with the placement of seeds. The current planning process is manual, hence highly subjective, and can potentially contribute to the rate and type of treatment related morbidity. The goal of this thesis is to reduce subjectivity in prostate brachytherapy planning. To this end, we developed and evaluated several frameworks to automate various components of the current prostate brachytherapy planning process. This involved development of techniques with which target volume labels can be automatically delineated from TRUS images. A seed arrangement planning approach was developed by distributing seeds with respect to priors and optimizing the arrangement according to the clinical guidelines. The design of the proposed frameworks involved the introduction and assessment of data fusion techniques that aim to extract joint information in retrospective clinical plans, containing the TRUS volume, the CTV, the PTV and the seed arrangement. We evaluated the proposed techniques using data obtained in a cohort of 590 brachytherapy treatment cases from the Vancouver Cancer Centre, and compare the automation results with the clinical gold-standards and previously delivered plans. Our results demonstrate that data fusion techniques have the potential to enable automatic planning of prostate brachytherapy.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

39

Dalmas, Tiphaine. "Information fusion for automated question answering." Thesis, University of Edinburgh, 2007. http://hdl.handle.net/1842/27860.

Full text

Abstract:

Until recently, research efforts in automated Question Answering (QA) have mainly focused on getting a good understanding of questions to retrieve correct answers. I focus on the analysis of the relationships between answer candidates as provided in open domain QA on multiple documents. I argue that such candidates have intrinsic properties, partly regardless of the question, and those properties can be exploited to provide better quality and more user-oriented answers in QA. Information fusion refers to the technique of merging pieces of information from different sources. While frequency has proved to be a significant characteristic of a correct answer, I evaluate the value of other relationships characterizing answer variability and redundancy. Partially inspired by recent developments in multi-document summarization, I redefine the concept of “answer” within an engineering approach to QA based on the Model-View-Controller (MVC) pattern of user interface design. An “answer model” is a directed graph in which nodes correspond to entities projected from extractions and edges convey relationships between such nodes. I describe shallow techniques to compare entities and enrich the model by discovering four broad categories of relationships between entities in the model: equivalence, inclusion, aggregation and alternative. Quantitatively, answer candidate modelling improves answer extraction accuracy. It also proves to be more robust to incorrect answer candidates than traditional techniques. Qualitatively, models provide meta-information encoded by relationships that allow shallow reasoning to help organize and generate the final output. Coupling this fusion-based reasoning with the MVC approach, I report experiments on mixed-media answering involving generation of illustrated summaries, and discuss the application of web-based answer modelling to improve non web QA tasks. Finally, I discuss issues related to the computation of answer models (candidate selection for fusion, relationship transitivity), and address the difficulty of assessing fusion-based answers with the current evolution methods in QA.

APA, Harvard, Vancouver, ISO, and other styles

40

Oreshkin, Boris. "Distributed information fusion in sensor networks." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=86916.

Full text

Abstract:

This thesis addresses the problem of design and analysis of distributed in-network signal processing algorithms for effcient aggregation and fusion of information in wireless sensor networks. The distributed in-network signal processing algorithms alleviate a number of drawbacks of the centralized fusion approach. The single point of failure, complex routing protocols, uneven power consumption in sensor nodes, ineffcient wireless channel utilization, and poor scalability are among these drawbacks. These drawbacks of the centralized approach lead to reduced network lifetime, poor robustness to node failures, and reduced network capacity. The distributed algorithms alleviate these issues by using simple pairwise message exchange protocols and localized in-network processing. However, for such algorithms accuracy losses and/or time required to complete a particular fusion task may be significant. The design and analysis of fast and accurate distributed algorithms with guaranteed performance characteristics is thus important. In this thesis two specific problems associated with the analysis and design of such distributed algorithms are addressed.
For the distributed average consensus algorithm a memory based acceleration methodology is proposed. The convergence of the proposed methodology is investigated. For the two important settings of this methodology, optimal values of system parameters are determined and improvement with respect to the standard distributed average consensus algorithm is theoretically characterized. The theoretical improvement characterization matches well with the results of numerical experiments revealing significant and well scaling gain. The practical distributed on-line initialization scheme is devised. Numerical experiments reveal the feasibility of the proposed initialization scheme and superior performance of the proposed methodology with respect to several existing acceleration approaches.
For the collaborative signal and information processing methodology a number of theoretical performance guarantees is obtained. The collaborative signal and information processing framework consists in activating only a cluster of wireless sensors to perform target tracking task in the cluster head using particle filter. The optimal cluster is determined at every time instant and cluster head hand-off is performed if necessary. To reduce communication costs only an approximation of the filtering distribution is sent during hand-off resulting in additional approximation errors. The time uniform performance guarantees accounting for the additional errors are obtained in two settings: the subsample approximation and the parametric mixture approximation hand-off.
Cette thèse aborde le problème de la conception et l'analyse d'algorithmes distribuès servant à l'agrégation efficace et la fusion de l'information dans des reséaux capteurs sans fil. Ces algorithmes distribuès servent à addresser un bon nombre d'inconvénients qu'ont les approches de fusion centralisée telles que le point de défaillance unique, les protocoles de routage complexe, la consommation de puissance inégale dans les noeuds de capteurs, l'utilisation inefficace des voies de transmission sans-fil et l'extensibilité limitée. Ces inconvénients de l'approche centralisée ont comme effet de réduire la durée de vie du reséau, la robustesse des noeuds face aux défaillances et la capacité du réseau. Les algorithmes distribuès atténuent ces problèmes en utilisant des simples protocoles de messageries entre les noeuds ainsi que du traitement d'information localisé. Toutefois, pour ces algorithmes, les pertes de précision et/ou de temps nécessaire pour effectuer une tâche peuvent être importantes. C'est pourquoi la conception et l'analyse d'algorithmes distribuès rapide et précis est importante. Dans cette thèse, deux problèmes spécifiques associés à l'analyse et le conception de tels algorithms sont abordés.
En ce qui concerne l'algorithme de consensus sur la moyenne distribuè, une méthode d'accélération fondé sur la mémoire est proposée et sa convergence analysée. Pour les deux paramètres importants de cette méthodologie, les valeurs optimales pour le système sont déterminées et l'amélioration par rapport à l'algorithme de consensus de base est caractérisée de façon théorique. Cette caractérisation correspond aux resultants d'expériences numériques et révèlent des gains importants et extensibles. Le régime distribuè d'initialisation en ligne est conçu. Des expériences numériques révèlevent la faisabilité du régime d'initilisation proposé ainsi qu'un rendement supérieur à plusieurs approches existantes.
Pour la méthodologie de traitement de signaux et d'information collaborative, un certain nombre de garanties théoriques de performance sont obtenues. Ce cadre de travail consiste à activer seulement une grappe de capteurs sans fil pour effectuer les tâches de pistage d'objet au niveau deu chef de groupe en utilisant un filtre particulaire. La grappe optimale est déterminée à chaque intervale de temps et le transfert du titre de chef de groupe est réalisé au besoin. Pour réduire les coûts de communication, seulement une approximation de la distribution du filtre est envoyé pendant le transfert de responsabilités ce qui entraîne des erreurs supplémentaires. Les garanties de performance uniformes dans le temps tenant compte de ces erreurs supplémentaires sont obtenues dans deux contextes.

APA, Harvard, Vancouver, ISO, and other styles

41

Peacock, Andrew M. "Information fusion for improved motion estimation." Thesis, University of Edinburgh, 2001. http://hdl.handle.net/1842/428.

Full text

Abstract:

Motion Estimation is an important research field with many commercial applications including surveillance, navigation, robotics, and image compression. As a result, the field has received a great deal of attention and there exist a wide variety of Motion Estimation techniques which are often specialised for particular problems. The relative performance of these techniques, in terms of both accuracy and of computational requirements, is often found to be data dependent, and no single technique is known to outperform all others for all applications under all conditions. Information Fusion strategies seek to combine the results of different classifiers or sensors to give results of a better quality for a given problem than can be achieved by any single technique alone. Information Fusion has been shown to be of benefit to a number of applications including remote sensing, personal identity recognition, target detection, forecasting, and medical diagnosis. This thesis proposes and demonstrates that Information Fusion strategies may also be applied to combine the results of different Motion Estimation techniques in order to give more robust, more accurate and more timely motion estimates than are provided by any of the individual techniques alone. Information Fusion strategies for combining motion estimates are investigated and developed. Their usefulness is first demonstrated by combining scalar motion estimates of the frequency of rotation of spinning biological cells. Then the strategies are used to combine the results from three popular 2D Motion Estimation techniques, chosen to be representative of the main approaches in the field. Results are presented, from both real and synthetic test image sequences, which illustrate the potential benefits of Information Fusion to Motion Estimation applications. There is often a trade-off between accuracy of Motion Estimation techniques and their computational requirements. An architecture for Information Fusion that allows faster, less accurate techniques to be effectively combined with slower, more accurate techniques is described. This thesis describes a number of novel techniques for both Information Fusion and Motion Estimation which have potential scope beyond that examined here. The investigations presented in this thesis have also been reported in a number of workshop, conference and journal papers, which are listed at the end of the document.

APA, Harvard, Vancouver, ISO, and other styles

42

Hoang, Thi Bich Ngoc. "Information diffusion, information and knowledge extraction from social networks." Thesis, Toulouse 2, 2018. http://www.theses.fr/2018TOU20078.

Full text

Abstract:

La popularité des réseaux sociaux a rapidement augmenté au cours de la dernière décennie. Selon Statista, environ 2 milliards d'utilisateurs utiliseront les réseaux sociaux d'ici janvier 2018 et ce nombre devrait encore augmenter au cours des prochaines années. Tout en gardant comme objectif principal de connecter le monde, les réseaux sociaux jouent également un rôle majeur dans la connexion des commerçants avec les clients, les célébrités avec leurs fans, les personnes ayant besoin d'aide avec les personnes désireuses d'aider, etc.. Le succès de ces réseaux repose principalement sur l'information véhiculée ainsi que sur la capacité de diffusion des messages dans les réseaux sociaux. Notre recherche vise à modéliser la diffusion des messages ainsi qu'à extraire et à représenter l'information des messages dans les réseaux sociaux. Nous introduisons d'abord une approche de prédiction de la diffusion de l'information dans les réseaux sociaux. Plus précisément, nous prédisons si un tweet va être re-tweeté ou non ainsi que son niveau de diffusion. Notre modèle se base sur trois types de caractéristiques: basées sur l'utilisateur, sur le temps et sur le contenu. Nous avons évalué notre modèle sur différentes collections correspondant à une douzaine de millions de tweets. Nous avons montré que notre modèle améliore significativement la F-mesure par rapport à l'état de l'art, à la fois pour prédire si un tweet va être re-tweeté et pour prédire le niveau de diffusion. La deuxième contribution de cette thèse est de fournir une approche pour extraire des informations dans les microblogs. Plusieurs informations importantes sont incluses dans un message relatif à un événement, telles que la localisation, l'heure et les entités associées. Nous nous concentrons sur l'extraction de la localisation qui est un élément primordial pour plusieurs applications, notamment les applications géospatiales et les applications liées aux événements. Nous proposons plusieurs combinaisons de méthodes existantes d'extraction de localisation dans des tweets en ciblant des applications soit orientées rappel soit orientées précision. Nous présentons également un modèle pour prédire si un tweet contient une référence à un lieu ou non. Nous montrons que nous améliorons significativement la précision des outils d'extraction de lieux lorsqu'ils se focalisent sur les tweets que nous prédisons contenir un lieu. Notre dernière contribution présente une base de connaissances permettant de mieux représenter l'information d'un ensemble de tweets liés à des événements. Nous combinons une collection de tweets de festivals avec d'autres ressources issues d'Internet pour construire une ontologie de domaine. Notre objectif est d'apporter aux utilisateurs une image complète des événements référencés au sein de cette collection
The popularity of online social networks has rapidly increased over the last decade. According to Statista, approximated 2 billion users used social networks in January 2018 and this number is still expected to grow in the next years. While serving its primary purpose of connecting people, social networks also play a major role in successfully connecting marketers with customers, famous people with their supporters, need-help people with willing-help people. The success of online social networks mainly relies on the information the messages carry as well as the spread speed in social networks. Our research aims at modeling the message diffusion, extracting and representing information and knowledge from messages on social networks. Our first contribution is a model to predict the diffusion of information on social networks. More precisely, we predict whether a tweet is going to be diffused or not and the level of the diffusion. Our model is based on three types of features: user-based, time-based and content-based features. Being evaluated on various collections corresponding to dozen millions of tweets, our model significantly improves the effectiveness (F-measure) compared to the state-of-the-art, both when predicting if a tweet is going to be retweeted or not, and when predicting the level of retweet. The second contribution of this thesis is to provide an approach to extract information from microblogs. While several pieces of important information are included in a message about an event such as location, time, related entities, we focus on location which is vital for several applications, especially geo-spatial applications and applications linked to events. We proposed different combinations of various existing methods to extract locations in tweets targeting either recall-oriented or precision-oriented applications. We also defined a model to predict whether a tweet contains a location or not. We showed that the precision of location extraction tools on the tweets we predict to contain a location is significantly improved as compared when extracted from all the tweets.Our last contribution presents a knowledge base that better represents information from a set of tweets on events. We combined a tweet collection with other Internet resources to build a domain ontology. The knowledge base aims at bringing users a complete picture of events referenced in the tweet collection (we considered the CLEF 2016 festival tweet collection)

APA, Harvard, Vancouver, ISO, and other styles

43

Fidalgo, Luis Miguel. "Novel methods for droplet fusion, extraction and analysis in multiphase microfluidics." Thesis, University of Cambridge, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.611619.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Bellenger, Amandine. "Semantic Decision Support for Information Fusion Applications." Phd thesis, INSA de Rouen, 2013. http://tel.archives-ouvertes.fr/tel-00845918.

Full text

Abstract:

This thesis is part of the knowledge representation domain and modeling of uncertainty in a context of information fusion. The main idea is to use semantic tools and more specifically ontologies, not only to represent the general domain knowledge and observations, but also to represent the uncertainty that sources may introduce in their own observations. We propose to represent these uncertainties and semantic imprecision trough a metaontology (called DS-Ontology) based on the theory of belief functions. The contribution of this work focuses first on the definition of semantic inclusion and intersection operators for ontologies and on which relies the implementation of the theory of belief functions, and secondly on the development of a tool called FusionLab for merging semantic information within ontologies from the previous theorical development. These works have been applied within a European maritime surveillance project.

APA, Harvard, Vancouver, ISO, and other styles

45

Cavanaugh, Andrew F. "Bayesian Information Fusion for Precision Indoor Location." Digital WPI, 2011. https://digitalcommons.wpi.edu/etd-theses/157.

Full text

Abstract:

This thesis documents work which is part of the ongoing effort by the Worcester Polytechnic Institute (WPI) Precision Personnel Locator (PPL) project, to track and locate first responders in urban/indoor settings. Specifically, the project intends to produce a system which can accurately determine the floor that a person is on, as well as where on the floor that person is, with sub-meter accuracy. The system must be portable, rugged, fast to set up, and require no pre-installed infrastructure. Several recent advances have enabled us to get closer to meeting these goals: The development of Transactional Array Reconciliation Tomography(TART) algorithm, and corresponding locator hardware, as well as the integration of barometric sensors, and a new antenna deployment scheme. To fully utilize these new capabilities, a Bayesian Fusion algorithm has been designed. The goal of this thesis is to present the necessary methods for incorporating diverse sources of information, in a constructive manner, to improve the performance of the PPL system. While the conceptual methods presented within are meant to be general, the experimental results will focus on the fusion of barometric height estimates and RF data. These information sources will be processed with our existing Singular Value Array Reconciliation Tomography (σART), and the new TART algorithm, using a Bayesian Fusion algorithm to more accurately estimate indoor locations.

APA, Harvard, Vancouver, ISO, and other styles

46

Toledo, Testa Juan Ignacio. "Information extraction from heterogeneous handwritten documents." Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/667388.

Full text

Abstract:

L’objectiu d’aquesta tesi és l’extracció d’Informació de documents total o parcialment manuscrits amb una certa estructura. Bàsicament treballem amb dos escenaris d’aplicació diferent. El primer escenari són els documents moderns altament estructurats, com formularis. En aquests documents, la informació semàntica està ja definida en camps, amb una posició concreta al document i l’extracció de la informació és equivalent a una transcripció. El segon escenari son els documents semi-estructurats totalment manuscrits on, a més de transcriure, cal associar un valor semàntic, d’entre un conjunt conegut de valors possibles, a les paraules que es transcriuen. En ambdós casos la qualitat de la transcripció té un gran pes en la precisió del sistema, per això proposem models basats en xarxes neuronals per a transcriure text manuscrit. Per a poder afrontar el repte dels documents semi-estructurats hem generat un benchmark, compost de dataset, una sèrie de tasques definides i una mètrica que es va presentar a la comunitat científica com a una competició internacional. També proposem diferents models basats en Xarxes Neuronals Convolucionals i recurrents, capaços de transcriure i assignar diferent etiquetes semàntiques a cada paraula manuscrita, és a dir, capaços d'extreure informació.
El objetivo de esta tesis es la extracción de Información de documentos total o parcialmente manuscritos, con una cierta estructura. Básicamente trabajamos con dos escenarios de aplicación diferentes. El primer escenario son los documentos modernos altamente estructurados, como los formularios. En estos documentos, la información semántica está pre-definida en campos con una posición concreta en el documento i la extracción de información es equivalente a una transcripción. El segundo escenario son los documentos semi-estructurados totalmente manuscritos, donde, además de transcribir, es necesario asociar un valor semántico, de entre un conjunto conocido de valores posibles, a las palabras manuscritas. En ambos casos, la calidad de la transcripción tiene un gran peso en la precisión del sistema. Por ese motivo proponemos modelos basados en redes neuronales para transcribir el texto manuscrito. Para poder afrontar el reto de los documentos semi-estructurados, hemos generado un benchmark, compuesto de dataset, una serie de tareas y una métrica que fue presentado a la comunidad científica a modo de competición internacional. También proponemos diferentes modelos basados en Redes Neuronales Convolucionales y Recurrentes, capaces de transcribir y asignar diferentes etiquetas semánticas a cada palabra manuscrita, es decir, capaces de extraer información.
The goal of this thesis is information Extraction from totally or partially handwritten documents. Basically we are dealing with two different application scenarios. The first scenario are modern highly structured documents like forms. In this kind of documents, the semantic information is encoded in different fields with a pre-defined location in the document, therefore, information extraction becomes equivalent to transcription. The second application scenario are loosely structured totally handwritten documents, besides transcribing them, we need to assign a semantic label, from a set of known values to the handwritten words. In both scenarios, transcription is an important part of the information extraction. For that reason in this thesis we present two methods based on Neural Networks, to transcribe handwritten text.In order to tackle the challenge of loosely structured documents, we have produced a benchmark, consisting of a dataset, a defined set of tasks and a metric, that was presented to the community as an international competition. Also, we propose different models based on Convolutional and Recurrent neural networks that are able to transcribe and assign different semantic labels to each handwritten words, that is, able to perform Information Extraction.

APA, Harvard, Vancouver, ISO, and other styles

47

Walessa, Marc. "Bayesian information extraction from SAR images." [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=964273659.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Popescu, Ana-Maria. "Information extraction from unstructured web text /." Thesis, Connect to this title online; UW restricted, 2007. http://hdl.handle.net/1773/6935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Williams, Dean Ashley. "Combining data integration and information extraction." Thesis, Birkbeck (University of London), 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.499152.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Duarte, Lucio Mauro. "Behaviour Model Extraction Using Context Information." Thesis, Imperial College London, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.498466.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Information extraction and fusion'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles