Tesis sobre el tema "Traitement de documents"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Traitement de documents".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Blibech, Kaouther. "L'horodatage sécurisé des documents électroniques". Pau, 2006. http://www.theses.fr/2006PAUU3010.
Texto completoTimestamping is a technique for providing proof-of-existence of a message/document at a given time. Timestamping is mandatory in many domains like patent submissions, electronic votes, electronic commerce and non-repudiation of digital signatures. In secure timestamping systems, one or several timestamping authorities process timestamping requests and provide formal proofs in order to guarantee that requests were correctly timestamped. In this thesis, we provide two secure timestamping systems. The first one uses an authenticated totally ordered append-only dictionary based on perfect skip lists. We show that our timestamping scheme has better performances than existing ones. The second one is a distributed timestamping scheme of type k among n. We first prove that this type of distributed timestamping systems is not secure if the used timestamping authorities do not use secure timestamping protocols. We then provide a new distributed timestamping scheme which is secure
Benjelil, Mohamed. "Analyse d'images de documents complexes et identification de scripts : cas des documents administratifs". La Rochelle, 2010. http://www.theses.fr/2010LAROS299.
Texto completoThis thesis describes our work in the field of multilingual multi-script complex document image segmentation: case of official documents. We proposed texture-based approach. Two different subjects are presented: (1) document image segmentation; (2) Arabic and Latin script identification in printed ant/ or handwriten types. The developed approaches concern the flow of documents that do not obey to a specific model. Chapter 1 presents the problematic and state of the complex document image segmentation and script identification. The work described in chapter 2 aimed at finding new models for complex multilingual multi-script document image segmentation. Algorythms have been developed for the segmentation of document images into homogeneous regions, identifying the script of textual blocs contained in document image and also can segment out a particular object in an image. The approach is based on classification on text and non text regions by mean of steerable pyramid features. Chapter 3 describes our work on official documents images segmentation based on steerable pyramid features. Chapter 4 describes our work on Arabic and Latin script identification in printed and/ or handwritten types. Experimental results shows that the proposed approaches perform consistently well on large sets of complex document images. Examples of application, performance tests and comparative studies are also presented
Xu, Zhiwu. "Polymorphisme paramétrique pour le traitement de documents XML". Paris 7, 2013. http://www.theses.fr/2013PA077030.
Texto completoXML (eXtensible Markup Language) is a current standard format for exchanging semi-structured data, which has been applied to web services, database, research on formal methods, and so on. For a better processing of XML, recently there emerge many statically typed functional languages, such as XDuce, CDuce, XJ, XTatic, XACT, XHaskell, OCamIDuce and so on. But most of these languages lack parametric polymorphism or present it in a limited form. While parametric polymorphism is needed by XML processing, and it has repeatedly been requested to and discussed in various working groups of standards (e. G. , RELAX NG and XQuery). We study in this thesis the techniques to extend parametric polymorphism into XML processing languages. Our solution consists of two parts : a definition of a polymorphic semantic subtyping relation and a definition of a polymorphic calculus. In the first part, we define and study a polymorphic semantic subtyping relation for a type system with recursive, product and arrow types and set-theoretic type connectives (i. E. , union, intersection and negation). We introduce the notion of "convexity" on which our solution is built up and prove there exists at least one model that satisfies convexity. We also propose a sound, complete and decidable subtyping algorithm. The second part is devoted to the theoretical definition of a polymorphic calculus, which takes advance of the subtyping relation. The novelty of the polymorphic calculus is to decorate lambda-abstractions with sets of type-substitutions and to lazily propagate type-substitutions at the moment of the reduction. The second part also explores a semi-decidable local inference algorithm to infer the set of type substitutions as well as the compilation of the polymorphic calculus into a variety of CDuce
Lecluze, Charlotte. "Alignement de documents multilingues sans présupposé de parallélisme". Caen, 2011. http://www.theses.fr/2011CAEN2058.
Texto completoToday the works using multilingual documents are turning to the study of comparable texts even though all aspects of parallel documents have not been studied nor alignment method locks raised, including their formatting and the cases of inversions and deletions at macro level. Thus, to date there is no tools to take benefit from this wealth of information, to extract resources as massively as envisaged, despite their usefulness both for translators and lexicologists. . . We present a method without assumption of parallelism between the different components of a multiple document. The basic idea of this work is: between two components of a multi-document, there are grains that maximize the parallelism, we call them multizones. They can cover several realities: document, series of paragraphs, paragraphs, proposals. . . Their boundaries can not be defined in an ad hoc way, it should be done in context and independently of languages. To this end, we combine several original processes: study each multiple document through a collection of multi-document, use the formatting of documents by direct processing of source or process repeated strings rather than words. The purpose of this work is twofold: matching and alignment, i. E. Resource creation and document analysis. This method requires little supervision. Add a new language or change corpus of entry do not represent a significant cost
Alhéritière, Héloïse. "Extraction de la mise en page de documents : application à la sécurisation des documents hybrides". Electronic Thesis or Diss., Université Paris Cité, 2019. http://www.theses.fr/2019UNIP5201.
Texto completoDigital documents are more and more present in our society. This format has many advantages, whether for distribution or document backup. Distribution allows for an easy transmission of documents but do not guarantee their integrity neither for the receiver nor for the sender. Throughout their life cycle, documents go from a dematerialized state to a materialized state and vice versa. The two formats have their own advantages and disadvantages, justifying the fact that a document can be found in the two formats. When we go from a materialized format to a dematerialized one we get an image, a set of pixels that need to be interpreted. The different instances of a same document obtained by scanning or printing it many times define the "hybrid document". A first level of comparison can be realized by analyzing the document layout. Many layout extraction methods exist. We analyze them to highlight their default and their adequacy to particular category of documents. We have also developed a methodology based on new transforms thus innovating in the representation of a document image. We can process various documents without needing supervised learning. We also adopt a more innovative approach in our evaluation method. Thus, for the purpose of securing hybrid document, we associate to the accuracy of a page decomposition the necessity of stable results for every instance of a document
Xu, Zhiwu. "POLYMORPHISME PARAMTRIQUE POUR LE TRAITEMENT DE DOCUMENTS XML". Phd thesis, Université Paris-Diderot - Paris VII, 2013. http://tel.archives-ouvertes.fr/tel-00858744.
Texto completoArias, Aguilar José Anibal. "Méthodes spectrales pour le traitement automatique de documents audio". Toulouse 3, 2008. http://thesesups.ups-tlse.fr/436/.
Texto completoThe disfluencies are a frequently occurring phenomenon in any spontaneous speech production; it consists of the interruption of the normal flow of speech. They have given rise to numerous studies in Natural Language Processing. Indeed, their study and precise identification are essential, both from a theoretical and applicative perspective. However, most of the researches about the subject relate to everyday uses of language: "small talk" dialogs, requests for schedule, speeches, etc. But what about spontaneous speech production made in a restrained framework? To our knowledge, no study has ever been carried out in this context. However, we know that using a "language specialty" in the framework of a given task leads to specific behaviours. Our thesis work is devoted to the linguistic and computational study of disfluencies within such a framework. These dialogs concern air traffic control, which entails both pragmatic and linguistic constraints. We carry out an exhaustive study of disfluencies phenomena in this context. At first we conduct a subtle analysis of these phenomena. Then we model them to a level of abstraction, which allows us to obtain the patterns corresponding to the different configurations observed. Finally we propose a methodology for automatic processing. It consists of several algorithms to identify the different phenomena, even in the absence of explicit markers. It is integrated into a system of automatic processing of speech. Eventually, the methodology is validated on a corpus of 400 sentences
Janod, Killian. "La représentation des documents par réseaux de neurones pour la compréhension de documents parlés". Thesis, Avignon, 2017. http://www.theses.fr/2017AVIG0222/document.
Texto completoApplication of spoken language understanding aim to extract relevant items of meaning from spoken signal. There is two distinct types of spoken language understanding : understanding of human/human dialogue and understanding in human/machine dialogue. Given a type of conversation, the structure of dialogues and the goal of the understanding process varies. However, in both cases, most of the time, automatic systems have a step of speech recognition to generate the textual transcript of the spoken signal. Speech recognition systems in adverse conditions, even the most advanced one, produce erroneous or partly erroneous transcript of speech. Those errors can be explained by the presence of information of various natures and functions such as speaker and ambience specificities. They can have an important adverse impact on the performance of the understanding process. The first part of the contribution in this thesis shows that using deep autoencoders produce a more abstract latent representation of the transcript. This latent representation allow spoken language understanding system to be more robust to automatic transcription mistakes. In the other part, we propose two different approaches to generate more robust representation by combining multiple views of a given dialogue in order to improve the results of the spoken language understanding system. The first approach combine multiple thematic spaces to produce a better representation. The second one introduce new autoencoders architectures that use supervision in the denoising autoencoders. These contributions show that these architectures reduce the difference in performance between a spoken language understanding using automatic transcript and one using manual transcript
Lemaitre, Aurélie Camillerapp Jean. "Introduction de la vision perceptive pour la reconnaissance de la structure de documents". Rennes : [s.n.], 2008. ftp://ftp.irisa.fr/techreports/theses/2008/lemaitre.pdf.
Texto completoCarmagnac, Fabien. "Classification supervisée et semi-supervisée : contributions à la classification d’images et de documents". Rouen, 2005. http://www.theses.fr/2005ROUES058.
Texto completoThis manuscript proposes some contributions to the supervised and semi-supervised document image classification under contraints such as a low number of training samples, dynamic feature selection and classification time. The first chapter draws up a general introduction of problems by exposing the components of a document image processing system. The second chapter proposes a strategy for the supervised classification based on the concepts from point of view in several feature spaces and induced distance space. The third chapter proposes a method for the semi-supervised classification based on a collaboration between the dendogrammes obtained by ascending hierarchical clustering using several feature spaces. Lastly, the last chapter draws up a conclusion and opens on prospects for the continuation for this work
Adam, Sébastien. "Documents, Graphes et Optimisation Multi-Objectifs". Habilitation à diriger des recherches, Université de Rouen, 2011. http://tel.archives-ouvertes.fr/tel-00671168.
Texto completoLebourgeois, Frank. "Approche mixte pour la reconnaissance des documents imprimes". Lyon, INSA, 1991. http://www.theses.fr/1991ISAL0013.
Texto completoA recognition system for multi font printed documents using contextual informations about typography, structure of document and syntax, has been developed. First, a quick bottom up method to separate text from image and recognize logical structure of documents has been achieved. A mixed approach has been used to recognize individual characters. A first stage realizes a compaction at the character level compared to a dynamically built librairy of shapes. The high redundancy of character's image in printed document justifies this approach. A second stage structurally recognizes the previously built models of characters. A mixed syntaxic and statistic stage is used simultanous. It o perform a high recognition rate
Pietriga, Emmanuel. "Environnements et langages de programmation visuels pour le traitement de documents structurés". Phd thesis, Grenoble INPG, 2002. http://tel.archives-ouvertes.fr/tel-00125472.
Texto completoDoucy, Jérémie. "Méthodologie pour l’orchestration sémantique de services, application au traitement de documents multimédia". Thesis, Rouen, INSA, 2011. http://www.theses.fr/2011ISAM0014.
Texto completoAfter a complete state of the art we detailed our semantic services approach which uses an innovative method for services composition: processing chains patterns. Our approach is composed on an hybrid semantic servicers registry which propose different levels of matching between services, some composition rules when the matching phase failde and an execution engine which is able to dynamically resolve and com^pose services. In order to solve the service regitry population issue, we have designed an upper ontology, which enables links between a service taxonomy class with a semantically annotated abstract service. Finally, we have evaluated our prototype using real processing chains used by Cassidian platforms
Grenier, Vincent. "Contribution à l'interprétation automatique de documents techniques : une approche système". Rouen, 2001. http://www.theses.fr/2001ROUES023.
Texto completoBossard, Aurélien. "Contribution au résumé automatique multi-documents". Phd thesis, Université Paris-Nord - Paris XIII, 2010. http://tel.archives-ouvertes.fr/tel-00573567.
Texto completoHébert, David. "Champs aléatoires conditionnels pour l'extraction de structures dans les images de documents". Rouen, 2013. http://www.theses.fr/2013ROUES029.
Texto completoEn, Sovann. "Détection de patterns dans les documents anciens". Rouen, 2016. http://www.theses.fr/2016ROUES050.
Texto completoDrira, Fadoua Emptoz Hubert Lebourgeois Frank. "Contribution à la restauration des images de documents anciens". Villeurbanne : Doc'INSA, 2008. http://docinsa.insa-lyon.fr/these/pont.php?id=drira.
Texto completoLombard, Jordan. "Guidage des traitements et acceptabilité de la tablette pour la compréhension de documents multiples". Thesis, Toulouse 2, 2019. http://www.theses.fr/2019TOU20035.
Texto completoThis thesis focuses on students' activity (including information selection) when they read multiple textual documents in order to develop their critical perspective on a topic; and it focuses on students' perceptions (including ease of use) of the tablet as a tool for consulting documents. Under these conditions, three studies evaluate the comprehension performance of students following the reading of several documents on a tablet with an innovative application (e.g., display of several documents simultaneously), depending on whether they freely study the documents or are guided in the processing of the documents. In addition, these studies assess how students perceive the tablet as a tool for studying documents, particularly if they consider the tablet to improve their performance
Moïn, Mohammad Shahram. "Traitement en-ligne de documents manuscrits structurés, segmentation en mots par algorithmes d'apprentissage". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0018/NQ57382.pdf.
Texto completoHatmi, Mohamed. "Reconnaissance des entités nommées dans des documents multimodaux". Nantes, 2014. http://archive.bu.univ-nantes.fr/pollux/show.action?id=022d16d5-ad85-43fa-9127-9f1d9d89db14.
Texto completoNamed entity recognition is a subtask of information extraction. It consists of identifying some textual objects such as person, location and organization names. The work of this thesis focuses on the named entity recognition task for the oral modality. Some difficulties may arise for this task due to the intrinsic characteristics of speech processing (lack of capitalisation marks, lack of punctuation marks, presence of disfluences and of recognition errors. . . ). In the first part, we study the characteristics of the named entity recognition downstream of the automatic speech recognition system. We present a methodology which allows named entity recognition following a hierarchical and compositional taxonomy. We measure the impact of the different phenomena specific to speech on the quality of named entity recognition. In the second part, we propose to study the tight pairing between the speech recognition task and the named entity recognition task. For that purpose, we take away the basic functionnalities of a speech recognition system to turn it into a named entity recognition system. Therefore, by mobilising the inherent knowledge of the speech processing to the named entity recognition task, we ensure a better synergy between the two tasks. We carry out different types of experiments to optimize and evaluate our approach
Qureshi, Rashid Jalal Cardot Hubert Ramel Jean-Yves. "Reconnaissance de formes et symboles graphiques complexes dans les images de documents". Tours : SCD de l'université de Tours, 2008. http://www.applis.univ-tours.fr/theses/priv/rashid-jalal.qureshi_2732.pdf.
Texto completoDelalandre, Mathieu. "Analyse des documents graphiques : une approche par reconstruction d'objets". Rouen, 2005. http://www.theses.fr/2005ROUES060.
Texto completoBernard, Guillaume. "Détection et suivi d’événements dans des documents historiques". Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS032.
Texto completoCurrent campaigns to digitise historical documents from all over the world are opening up new avenues for historians and social science researchers. The understanding of past events is renewed by the analysis of these large volumes of historical data: unravelling the thread of events, tracing false information are, among other things, possibilities offered by the digital sciences. This thesis focuses on these historical press articles and suggests, through two opposing strategies, two analysis processes that address the problem of tracking events in the press. A simple use case is for instance a digital humanities researcher or an amateur historian who is interested in an event of the past and seeks to discover all the press documents related to it. Manual analysis of articles is not feasible in a limited time. By publishing algorithms, datasets and analyses, this thesis is a first step towards the publication of more sophisticated tools allowing any individual to search old press collections for events, and why not, renew some of our historical knowledge
Gaceb, Djamel. "Contributions au tri automatique de documents et de courrier d'entreprises". Lyon, INSA, 2009. http://theses.insa-lyon.fr/publication/2009ISAL0077/these.pdf.
Texto completoThis thesis deals with the development of industrial vision systems for automatic business documents and mail sorting. These systems need very high processing time, accuracy and precision of results. The current systems are most of time made of sequential modules needing fast and efficient algorithms throughout the processing line: from low to high level stages of analysis and content recognition. The existing architectures that we have described in the three first chapters of the thesis have shown their weaknesses that are expressed by reading errors and OCR rejections. The modules that are responsible of these rejections and reading errors are mostly the first to occur in the processes of image segmentation and interest regions location. Indeed, theses two processes, involving each other, are fundamental for the system performances and the efficiency of the automatic sorting lines. In this thesis, we have chosen to focus on different sides of mail images segmentation and of relevant zones (as address block) location. We have chosen to develop a model based on a new pyramidal approach using a hierarchical graph coloring. As for now, graph coloring has never been exploited in such context. It has been introduced in our contribution at every stage of document layout analysis for the recognition and decision tasks (kind of document or address block recognition). The recognition stage is made about a training process with a unique model of graph b-coloring. Our architecture is basically designed to guarantee a good cooperation bewtween the different modules of decision and analysis for the layout analysis and the recognition stages. It is composed of three main sections: the low-level segmentation (binarisation and connected component labeling), the physical layout extraction by hierarchical graph coloring and the address block location and document sorting. The algorithms involved in the system have been designed for their execution speed (matching with real time constraints), their robustness, and their compatibility. The experimentations made in this context are very encouraging and lead to investigate a wider diversity of document images
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents". Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
Texto completoThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents". Electronic Thesis or Diss., Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634.
Texto completoThe human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Eglin, Véronique. "Contribution à la structuration fonctionnelle des documents imprimés : exploitation de la dynamique du regard dans le repérage de l'information". Lyon, INSA, 1998. http://www.theses.fr/1998ISAL0087.
Texto completoThis work is a contribution to automatic document analysis and is based on two main themes independent on first sight: the document segmentation and the human visual perception. More specifically, it develops a methodology of document layout analysis by the exploitation of eye catching information. The reconstruction of document layout in homogeneous blacks and the retrieval of their physical properties are realized by the analysis of our visual system. This exploration is described by a scan-path, which selects alternately areas. Their location depends on the observer’s purposes and the visual characteristics of the document. In this work, we chose to simulate a particular kind of scan-paths on the documents. This scan-path expresses the segmentation of an observer, who scans a document without any prior knowledge on what should be found. The treatments are shared in two main phases. The first law-levet stage consists in analyzing geometrical properties of regions outlines. That leads to a fast selection of interest areas and results in a first sketch of physical document segmentation. The second high-level stage is based on the analysis of macroscopic features of texts, which are directly connected to standard typographic rules (arrangement and frequency of characters, type-font farnilies, boldness, language. . . ), but also significant of the editor's intention, This stage leads to a first classification of different type-font farnilies of text blacks. Thus, it improves the previous low-level processes by using textural properties of regions. Finally, we propose a validation step of this work, which is realized by oculometric measurements on human observers
Nicolas, Stéphane. "Segmentation par champs aléatoires pour l'indexation d'images de documents". Rouen, 2006. http://www.theses.fr/2006ROUES058.
Texto completoWith the development of digital technologies, the valorization of our cultural heritage is becoming a major stake, which exhibits a lot of difficulties for information indexing and retrieval. Document image analysis can bring a solution, however traditionnal methods are not flexible enough to deal with the variability found in patrimonial documents. Our contribution relates to the implementation of a 2D Markov random field model and a 2D conditional random field model, which make it possible to make variability into account and to integrate contextual knowledge, while taking benefit from machine learning techniques. Experiments on handwritten drafts and manuscripts of the Renaissance, show that these models can provide interesting solutions. Furthermore, the conditional random field model provids better results, allowing to integrate more intrinsic and contextual features in a discriminative framework, using a classifier combination approach
Duong, Jean Emptoz Hubert. "Etude des Documents Imprimés Approche Statistique et Contribution Méthodologique /". Villeurbanne : Doc'INSA, 2005. http://docinsa.insa-lyon.fr/these/pont.php?id=duong.
Texto completoBrixtel, Romain. "Alignement endogène de documents, une approche multilingue et multi-échelle". Caen, 2011. http://www.theses.fr/2011CAEN2050.
Texto completoThis thesis deals with the machine translation and more precisely with feature extraction (words, phrases, sentences) which are in a translation relation in parallel corpora. The methods, applied to automatically align these elements, are endogenous (without external resources) and multi-scaled (different levels of analysis are used). We propose an alignment strategy renewing approaches based on word and sentence by using levels that overlaps sentences and subsentential levels, respectively alinea and chunk. Alinea alignment performs via visual clues while subsentential alignment is focused on charater strings for a chunk alignment. We also highlight the connexions between alignment and detection of plagiarism in order to provide an abstraction of our model
Al-Hamdi, Ali. "Contributions à l'interprétation des documents techniques par une approche perceptive". Rouen, 1999. http://www.theses.fr/1999ROUES083.
Texto completoAhmad, M. Mumtaz. "Memory optimization strategies for linear mappings and indexation-based shared documents". Thesis, Nancy 1, 2011. http://www.theses.fr/2011NAN10083/document.
Texto completoThis thesis aims at developing strategies to enhance the power of sequential computation and distributed systems, particularly, it deals with sequential break down of operations and decentralized collaborative editing systems. In this thesis, we introduced precision control indexing method that generates unique identifiers which are used for indexed communication in distributed systems, particularly, in decentralized collaborative editing systems. These identifiers are still real numbers with a specific controlled pattern of precision. Set of identifiers is kept finite that makes it possible to compute local as well as global cardinality. This property plays important role in dealing with indexed communication. Besides this, some other properties including order preservation are observed. The indexing method is tested and verified by experimentation successfully and it leads to design decentralized collaborative editing system. Dealing with sequential break down of operations, we explore limitations of the existing strategies, extended the idea by introducing new strategies. These strategies lead towards optimization (processor, compiler, memory, code). This style of decomposition attracts research communities for further investigation and practical implementation that could lead towards designing an arithmetic unit
FERRARI, STEPHANE. "Methode et outils informatiques pour le traitement automatique des metaphores dans les documents ecrits". Paris 11, 1997. http://www.theses.fr/1997PA112381.
Texto completoJoly, Philippe. "Consultation et analyse des documents en image animée numérique". Toulouse 3, 1996. http://www.theses.fr/1996TOU30130.
Texto completoDrira, Fadoua. "Contribution à la restauration des images de documents anciens". Lyon, INSA, 2007. http://theses.insa-lyon.fr/publication/2007ISAL0111/these.pdf.
Texto completoThe massive digitization of heritage documents raised new prospects for Research like the restoration of the degraded documents. These degradations are due to the bad conditions of conservation and even to the digitization process. Images of old and degraded documents cannot be the retored directely by classical approaches. Hence, we propose in this thesis to develop and analyze document image restoration algorithms. We are mainly interested in foreground/background degradations, since they harm the legibility of the digitized documents and limit the processing of these images. For background degradations, considered as a problem of the superposition of layers, we propose two-based segmentation methods. The first is a recursive approach that relies on the k-means clustering algorithm and the principal component analysis. The second method is an improvement of the initial algorithm of MeanShift in an attempt to reduce its complexity. For foreground degradations, we propose to tackle the problem with PDE-based diffusion approaches. This solution has many useful features that are relevant for use in character restoration. Our comparative study of existing methods allows us to select the best approaches well adapted to our problem. We propose also a new diffusion method preserving singularities and edges while smoothing. Our previously proposed solutions, the diffusion and the Mean-Shift algorithms, are used with success in a joint iterative framework to solve foreground and background degradations. This framework generates segmented images with more reduced artefacts on the edges and on the background than those obtained in separate application of each method
Duong, Jean. "Etude des Documents Imprimés : Approche Statistique et Contribution Méthodologique". Lyon, INSA, 2005. http://theses.insa-lyon.fr/publication/2005ISAL0027/these.pdf.
Texto completoThis thesis turns on the study of the structuring of the documents containing `` rich and recurrent typography''. We mainly worked on images of documents corresponding to extracts of inventory-synopses from Archives of départements of Charente-Maritime and Savoy. We proposed a segmentation process to retrieve the layout structure of these pages. We also developed an approach based on hidden Markov models for the logical recognition. In parallel of these applicative contributions, we carried out two more fundamental reflections. The first one is related to the study of the characteristics used for the description of the regions of interest (physical entities) in document images. The second one was devoted to the development of a method of classification resting on a selective training. Among the many applications of this algorithm, it is the advisability of carrying out a character recognition task which justified its development
Oriot, Jean-Claude. "Analyse d'images de documents à structures variées : application à la localisation du bloc adresse sur les objets postaux". Nantes, 1992. http://www.theses.fr/1992NANT2061.
Texto completoDelamarre, Aurélie Le Pottier Nicole. "Traitement et catalogage des manuscrits contemporains". [S.l.] : [s.n.], 2004. http://www.enssib.fr/bibliotheque/documents/dcb/delamarre.pdf.
Texto completoTannier, Xavier. "Extraction et recherche d'information en langage naturel dans les documents semi-structurés". Phd thesis, Ecole Nationale Supérieure des Mines de Saint-Etienne, 2006. http://tel.archives-ouvertes.fr/tel-00121721.
Texto completo(écrits en XML en pratique) combine des aspects de la RI
traditionnelle et ceux de l'interrogation de bases de données. La
structure a une importance primordiale, mais le besoin d'information
reste vague. L'unité de recherche est variable (un paragraphe, une
figure, un article complet\dots). Par ailleurs, la flexibilité du
langage XML autorise des manipulations du contenu qui provoquent
parfois des ruptures arbitraires dans le flot naturel du texte.
Les problèmes posés par ces caractéristiques sont nombreux, que ce
soit au niveau du pré-traitement des documents ou de leur
interrogation. Face à ces problèmes, nous avons étudié les solutions
spécifiques que pouvait apporter le traitement automatique de la
langue (TAL). Nous avons ainsi proposé un cadre théorique et une
approche pratique pour permettre l'utilisation des techniques
d'analyse textuelle en faisant abstraction de la structure. Nous avons
également conçu une interface d'interrogation en langage naturel pour
la RI dans les documents XML, et proposé des méthodes tirant profit de
la structure pour améliorer la recherche des éléments pertinents.
Harrathi, Farah. "Extraction de concepts et de relations entre concepts à partir des documents multilingues : approche statistique et ontologique". Lyon, INSA, 2009. http://theses.insa-lyon.fr/publication/2009ISAL0073/these.pdf.
Texto completoThe research work of this thesis is related to the problem of document search indexing and more specifically in that of the extraction of semantic descriptors for document indexing. Information Retrieval System (IRS) is a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. In IR, a query is composed mainly of two processes for representation and retrieval. The process of representation is called indexing, it allows to represent documents and query descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. In the classical IRS, the descriptors used are words (simple or compound). These IRS consider the document as a set of words, often called a "bag of words". In these systems, the words are considered as graphs without semantics. The only information used for these words is their occurrence frequency in the documents. These systems do not take into account the semantic relationships between words. For example, it is impossible to find documents represented by a word synonymous with M1 word M2, where the request is represented by M2. Also, in a classic IRS document indexed by the term "bus" will never be found by a query indexed by the word "taxi", yet these are two words that deal with the same subject "means of transportation. " To address these limitations, several studies were interested taking into account of the semantic indexing terms. This type of indexing is called semantic or conceptual indexing. These works take into account the notion of concept in place of notion of word. In this work the terms denoting concepts are extracted from the document by using statistical techniques. These terms are then projected onto resource of semantics such as: ontology, thesaurus and so on to extract the concepts involved
Quint, Vincent. "Une approche de l'édition structurée des documents". Phd thesis, Grenoble 1, 1987. http://tel.archives-ouvertes.fr/tel-00010612.
Texto completol'organisation logique des composants du document. A partir de ce principe, on propose un méta-modèle
qui permet la description des structures logiques de toutes sortes de documents et de différents types
d'objets fréquents dans les documents : formules mathématiques, tableaux, schémas, etc... on associe aux
structures logiques des règles de présentation qui déterminent l'aspect graphique de leurs composants.
On montre l'intérêt de cette approche en présentant deux systèmes interactifs construits sur ce modèle :
l'éditeur de formules mathématiques Edimath et l'éditeur de documents Grif. La présentation de ces systèmes
s'appuie sur un état de l'art de la typographie informatique.
Journet, Nicholas. "Analyse d’images de documents anciens : une approche texture". La Rochelle, 2006. http://www.theses.fr/2006LAROS178.
Texto completoMy phd thesis subject is related to the topic of old documents images indexation. The corpus of old documents has specific characteristics. The content (text and image) as well as the layout information are strongly variable. Thus, it is not possible to work on this corpus such as it usually done with contemporary documents. Indeed, the first tests which we realised on the corpus of the “Centre d’Etude de la Renaissance”, with which we work, confirmed that the traditional approaches (driven –model approaches) are not very efficient because it’s impossible to put assumptions on the physical or logical structure of the old documents. We also noted the lack of tools allowing the indexing of large old documents images databases. In this phd work, we propose a new generic method which permits characterization of the contents of old documents images. This characterization is carried out using a multirésolution study of the textures contained in the images of documents. By constructing signatures related with the frequencies and the orientations of the various parts of a page it is possible to extract, compare or to identify different kind of semantic elements (reference letters, illustrations, text, layout. . . ) without making any assumptions about the physical or logical structure of the analyzed documents. These textures information are at the origin of creation of indexing tools for large databases of old documents images
Tran, Thuong Tien. "Modélisation et traitement du contenu des médias pour l'édition et la présentation de documents multimédias". Grenoble INPG, 2003. http://www.theses.fr/2003INPG0019.
Texto completoThis work proposes a new way to edit/present easily multimedia documents. It consists in modelling the contents of complex media (video, audio) as a structure of sub-elements (moving objects, shots, scenes). These internal media fragments can be associated with behaviors (hyperlinks) or spatial/temporal relations with other objects of the document. This enables richer multimedia presentations thanks to a finer synchronization between media. The difficulty of this work is to insure that this model remains consistent with the composition model of multimedia documents and that it covers the needs of the authors for multimedia fine-grained synchronization. The approach chosen consists in using description tools from MPEG-7 to describe media contents and in integrating these descriptions into an extension of the Madeus constraint-based composition model
Lemaitre, Aurélie. "Introduction de la vision perceptive pour la reconnaissance de la structure de documents". Rennes, INSA, 2008. ftp://ftp.irisa.fr/techreports/theses/2008/lemaitre.pdf.
Texto completoHuman perceptive vision combines several levels of perception in order to simplify the interpretation of a scene. It is represented by physiologists as a perceptive cycle guided by visual attention. We propose to use this principle for the recognition of images of old and handwritten documents. Thus, we propose a generic architecture, DMOS-P, that makes it possible to specify mechanisms of perceptive cooperation that makes easier the description and improve the recognition of the structure of documents. In the applications, we show a prediction/verification mechanism: the low resolution vision provides hypotheses on the structure, using the global context; these hypotheses are then verified at a higher resolution. We validated this approach on various kinds of documents (handwritten incoming mails, archive registers, newspapers…) and at a large scale (more than 80,000 images)
Trupin, Eric. "Segmentation de documents : Application a un systeme de lecture pour non-voyants". Rouen, 1993. http://www.theses.fr/1993ROUES009.
Texto completoLefrère, Laurent. "Contribution au développement d'outils pour l'analyse automatique de documents cartographiques". Rouen, 1993. http://www.theses.fr/1993ROUES045.
Texto completoMax, Aurélien. "De la création de documents normalisés à la normalisation de documents en domaine contraint". Grenoble 1, 2003. http://www.theses.fr/2003GRE10227.
Texto completoWell-formedness conditions on documents in constrained domains are often hard to apply. An active research trend approaches the authoring of normalized documents through semantic specification, thereby facilitating such applications as multilingual production. However, the current systems are not able to analyse an existing document in order to normalize it. We therefore propose an approach that reuses the resources of such systems to recreate the semantic content of a document, from which a normalized textual version can be generated. This approach is based on two main paradigms : fuzzy inverted generation, which heuristically finds candidate semantic representations, and interactive negotiation, which allows an expert of the domain to progressively validate the semantic representation that corresponds to the original document
Caro, Dambreville Stéphane. "Rôle des organisateurs paralinguistiques dans la consultation des documents électroniques". Grenoble 3, 1995. https://tel.archives-ouvertes.fr/tel-00451634.
Texto completo