Dissertations / Theses on the topic 'Reconnaissance automatique de texte (ATR)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 22 dissertations / theses for your research on the topic 'Reconnaissance automatique de texte (ATR).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Chiffoleau, Floriane. "Understanding the automatic text recognition process : model training, ground truth and prediction errors." Electronic Thesis or Diss., Le Mans, 2024. http://www.theses.fr/2024LEMA3002.
Full textThis thesis works on identifying what a text recognition model can learn during its training, through the examination of its ground truth’s content, and its prediction’s errors. The main intent here is to improve the knowledge of how a neural network operates, with experiments focused on typewritten documents. The methods used mostly concentrated on the thorough exploration of the training data, the observation of the model’s prediction’s errors, and the correlation between both. A first hypothesis, based on the influence of the lexicon, was inconclusive. However, it steered the observation towards a new level of study, relying on an infralexical level: the n-grams. Their training data’s distribution was analysed and subsequently compared to that of the n-grams retrieved from the prediction errors. Promising results lead to further exploration, while upgrading from single-language to multilingual model. Conclusive results enabled me to infer that the n-grams might indeed be a valid answer to recognition’s performances
Bastos, Dos Santos José Eduardo. "L'identification de texte en images de chèques bancaires brésiliens." Compiègne, 2003. http://www.theses.fr/2003COMP1453.
Full textIdentifying and distinguishing text in document images are tasks whose cat!Jal solutions are mainly based on using contextual informations, like layout informations or informations from the phisical structure. Ln this research work, an alternative for this task is investigated based only in features observed from textual elements, giving more independency to the process. The hole process was developped considering textual elements fragmented in sm ail portions(samples) in order to provide an alternative solution to questions Iike scale and textual elements overlapping. From these samples, a set of features is extracted and serves as input to a classifyer maily chrged with textual extraction from the document and also the distinguish between handwritting and machine-printed text. Moreover, sinGe the only informations emplyed is observed directly from textual elements, the process assumes a character more independent as it doesn't use any heuristics nor à priori information of the treated document. Results around 93% of correct classification confirms the efficacy of the process
Beaudette, David. "Suivi de chansons par reconnaissance automatique de parole et alignement temporel." Mémoire, Université de Sherbrooke, 2010. http://savoirs.usherbrooke.ca/handle/11143/1582.
Full textPicard, Laurent. "Prise en compte de l'environnement marin dans le processus de reconnaissance automatique de cibles sous-marines." Thesis, Brest, 2017. http://www.theses.fr/2017BRES0038/document.
Full textIn the last decades, advances in marine robot technology allowed to perform accurate seafloor surveys by means of autonomous underwater vehicles (AUVs). Thanks to a sidescan sonar carried by an AUV, a wide area can be scanned quickly. Navies are really interested in using such vehicles for underwater mine countermeasures (MCM) purposes, in order to perform mine hunting missions rapidly and safely for human operators. Nevertheless, on-board intelligence, which intends to replace human operator for sonar image analysis, remains challenging. Current automatic target recognition (ATR) processes have to cope with the variability of the seafloor. Indeed, there is a strong relationship between the seafloor appearance on sidescan sonar images and the underwater target detection rates. Thus, embed some environmental information in the ATR process seems to be a way for achieving more effective automatic target recognition. In this thesis, we address the problem of improving the ATR process by taking into account the local environment. To this end, a new representation of sonar images is considered by use of the theory of monogenic signal. It provides a pixelwise energetic, geometric and structural information into a multi-scale framework. Then a seafloor characterization is carried out by estimating the intrinsic dimensionality of the underwater structures so as to describe sonar images in terms of homogeneity, anisotropy and complexity. These three features are directly linked to the difficulty of detecting underwater mines and enable an accurate classification of sonar images into benign, rippled or complex areas. From our point of view, underwater mine hunting cannot be performed in the same way on these three seafloor types with various challenges from an ATR point of view. To proceed with this idea, we propose to design a first specific detection algorithm for sand rippled areas. This algorithm takes into consideration an environmental description of ripples which allow to outperform classic approaches in this type of seafloor
Delemar, Olivier. "Reconnaissance de la parole par une méthode hybride : texte imprimé : Réseaux markoviens et base de règles." Grenoble INPG, 1996. http://www.theses.fr/1996INPG0052.
Full textChen, Yong. "Analyse et interprétation d'images à l'usage des personnes non-voyantes : application à la génération automatique d'images en relief à partir d'équipements banalisés." Thesis, Paris 8, 2015. http://www.theses.fr/2015PA080046/document.
Full textVisual information is a very rich source of information to which blind and visually impaired people (BVI) not always have access. The presence of images is a real handicap for the BVI. The transcription into an embossed image may increase the accessibility of an image to BVI. Our work takes into account the aspects of tactile cognition, the rules and the recommendations for the design of an embossed image. We focused our work on the analysis and comparison of digital image processing techniques in order to find the suitable methods to create an automatic procedure for embossing images. At the end of this research, we tested the embossed images created by our system with users with blindness. In the tests, two important points were evaluated: The degree of understanding of an embossed image; The time required for exploration.The results suggest that the images made by this system are accessible to blind users who know braille. The implemented system can be regarded as an effective tool for the creation of an embossed image. The system offers an opportunity to generalize and formalize the procedure for creating an embossed image. The system gives a very quick and easy solution.The system can process pedagogical images with simplified semantic contents. It can be used as a practical tool for making digital images accessible. It also offers the possibility of cooperation with other modalities of presentation of the image to blind people, for example a traditional interactive map
Fell, Michael. "Traitement automatique des langues pour la recherche d'information musicale : analyse profonde de la structure et du contenu des paroles de chansons." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4017.
Full textApplications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this thesis, we argue that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, we show for different textual aspects (their structure, content and perception) how Natural Language Processing methods can be adapted and successfully applied to lyrics. For the structuralaspect of lyrics, we derive a structural description of it by introducing a model that efficiently segments the lyricsinto its characteristic parts (e.g. intro, verse, chorus). In a second stage, we represent the content of lyrics by meansof summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyricswe investigate the problem of detecting explicit content in a song text. This task proves to be very hard and we showthat the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending onthe context. Furthermore, we touch on another problem of lyrics perception by presenting our preliminary resultson Emotion Recognition. As a result, during the course of this thesis we have created the annotated WASABI SongCorpus, a dataset of two million songs with NLP lyrics annotations on various levels
Mohamadi, Tayeb. "Synthèse à partir du texte de visages parlants : réalisation d'un prototype et mesures d'intelligibilité bimodale." Grenoble INPG, 1993. http://www.theses.fr/1993INPG0010.
Full textOgun, Sewade. "Generating diverse synthetic data for ASR training data augmentation." Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0116.
Full textIn the last two decades, the error rate of automatic speech recognition (ASR) systems has drastically dropped, making them more useful in real-world applications. This improvement can be attributed to several factors including new architectures using deep learning techniques, new training algorithms, large and diverse training datasets, and data augmentation. In particular, the large-scale training datasets have been pivotal to learning robust speech representations for ASR. Their large size allows them to effectively cover the inherent diversity in speech, in terms of speaker voice, speaking rate, pitch, reverberation, and noise. However, the size and diversity of datasets typically found in high-resourced languages are not available in medium- and low-resourced languages and in domains with specialised vocabulary like the medical domain. Therefore, the popular method to increase dataset diversity is through data augmentation. With the recent increase in the naturalness and quality of synthetic data that can be generated by text-to-speech (TTS) and voice conversion (VC) systems, these systems have also become viable options for ASR data augmentation. However, several problems limit their application. First, TTS/VC systems require high-quality speech data for training. Hence, we develop a method of dataset curation from an ASR-designed corpus for training a TTS system. This method leverages the increasing accuracy of deep-learning-based, non-intrusive quality estimators to filter high-quality samples. We explore filtering the ASR dataset at different thresholds to balance the size of the dataset, number of speakers, and quality. With this method, we create a high-quality multi-speaker dataset which is comparable to LibriTTS in quality. Second, the data generation process needs to be controllable to generate diverse TTS/VC data with specific attributes. Previous TTS/VC systems either condition the system on the speaker embedding alone or use discriminative models to learn the speech variabilities. In our approach, we design an improved flow-based architecture that learns the distribution of different speech variables. We find that our modifications significantly increase the diversity and naturalness of the generated utterances over a GlowTTS baseline, while being controllable. Lastly, we evaluated the significance of generating diverse TTS and VC data for augmenting ASR training data. As opposed to naively generating the TTS/VC data, we independently examined different approaches such as sentence selection methods and increasing the diversity of speakers, phoneme duration, and pitch contours, in addition to systematically increasing the environmental conditions of the generated data. Our results show that TTS/VC augmentation holds promise in increasing ASR performance in low- and medium-data regimes. In conclusion, our experiments provide insight into the variabilities that are particularly important for ASR, and reveal a systematic approach to ASR data augmentation using synthetic data
Felhi, Mehdi. "Document image segmentation : content categorization." Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0109.
Full textIn this thesis I discuss the document image segmentation problem and I describe our new approaches for detecting and classifying document contents. First, I discuss our skew angle estimation approach. The aim of this approach is to develop an automatic approach able to estimate, with precision, the skew angle of text in document images. Our method is based on Maximum Gradient Difference (MGD) and R-signature. Then, I describe our second method based on Ridgelet transform.Our second contribution consists in a new hybrid page segmentation approach. I first describe our stroke-based descriptor that allows detecting text and line candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. Finally, text candidates are clustered using mean-shift analysis technique according to their corresponding sizes. The method is applied for segmenting scanned document images (newspapers and magazines) that contain text, lines and photo regions. Finally, I describe our stroke-based text extraction method. Our approach begins by extracting connected components and selecting text character candidates over the CIE LCH color space using the Histogram of Oriented Gradients (HOG) correlation coefficients in order to detect low contrasted regions. The text region candidates are clustered using two different approaches ; a depth first search approach over a graph, and a stable text line criterion. Finally, the resulted regions are refined by classifying the text line candidates into « text» and « non-text » regions using a Kernel Support Vector Machine K-SVM classifier
Le, Thien-Hoa. "Neural Methods for Sentiment Analysis and Text Summarization." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0037.
Full textThis thesis focuses on two Natural Language Processing tasks that require to extract semantic information from raw texts: Sentiment Analysis and Text Summarization. This dissertation discusses issues and seeks to improve neural models on both tasks, which have become the dominant paradigm in the past several years. Accordingly, this dissertation is composed of two parts: the first part (Neural Sentiment Analysis) deals with the computational study of people's opinions, sentiments, and the second part (Neural Text Summarization) tries to extract salient information from a complex sentence and rewrites it in a human-readable form. Neural Sentiment Analysis. Similar to computer vision, numerous deep convolutional neural networks have been adapted to sentiment analysis and text classification tasks. However, unlike the image domain, these studies are carried on different input data types and on different datasets, which makes it hard to know if a deep network is truly needed. In this thesis, we seek to find elements to address this question, i.e. whether neural networks must compute deep hierarchies of features for textual data in the same way as they do in vision. We thus propose a new adaptation of the deepest convolutional architecture (DenseNet) for text classification and study the importance of depth in convolutional models with different atom-levels (word or character) of input. We show that deep models indeed give better performances than shallow networks when the text input is represented as a sequence of characters. However, a simple shallow-and-wide network outperforms the deep DenseNet models with word inputs. Besides, to further improve sentiment classifiers and contextualize them, we propose to model them jointly with dialog acts, which are a factor of explanation and correlate with sentiments but are nevertheless often ignored. We have manually annotated both dialogues and sentiments on a Twitter-like social medium, and train a multi-task hierarchical recurrent network on joint sentiment and dialog act recognition. We show that transfer learning may be efficiently achieved between both tasks, and further analyze some specific correlations between sentiments and dialogues on social media. Neural Text Summarization. Detecting sentiments and opinions from large digital documents does not always enable users of such systems to take informed decisions, as other important semantic information is missing. People also need the main arguments and supporting reasons from the source documents to truly understand and interpret the document. To capture such information, we aim at making the neural text summarization models more explainable. We propose a model that has better explainability properties and is flexible enough to support various shallow syntactic parsing modules. More specifically, we linearize the syntactic tree into the form of overlapping text segments, which are then selected with reinforcement learning (RL) and regenerated into a compressed form. Hence, the proposed model is able to handle both extractive and abstractive summarization. Further, we observe that RL-based models are becoming increasingly ubiquitous for many text summarization tasks. We are interested in better understanding what types of information is taken into account by such models, and we propose to study this question from the syntactic perspective. We thus provide a detailed comparison of both RL-based and syntax-aware approaches and of their combination along several dimensions that relate to the perceived quality of the generated summaries such as number of repetitions, sentence length, distribution of part-of-speech tags, relevance and grammaticality. We show that when there is a resource constraint (computation and memory), it is wise to only train models with RL and without any syntactic information, as they provide nearly as good results as syntax-aware models with less parameters and faster training convergence
Dang, Quoc Bao. "Information spotting in huge repositories of scanned document images." Thesis, La Rochelle, 2018. http://www.theses.fr/2018LAROS024/document.
Full textThis work aims at developing a generic framework which is able to produce camera-based applications of information spotting in huge repositories of heterogeneous content document images via local descriptors. The targeted systems may take as input a portion of an image acquired as a query and the system is capable of returning focused portion of database image that match the query best. We firstly propose a set of generic feature descriptors for camera-based document images retrieval and spotting systems. Our proposed descriptors comprise SRIF, PSRIF, DELTRIF and SSKSRIF that are built from spatial space information of nearest keypoints around a keypoints which are extracted from centroids of connected components. From these keypoints, the invariant geometrical features are considered to be taken into account for the descriptor. SRIF and PSRIF are computed from a local set of m nearest keypoints around a keypoint. While DELTRIF and SSKSRIF can fix the way to combine local shape description without using parameter via Delaunay triangulation formed from a set of keypoints extracted from a document image. Furthermore, we propose a framework to compute the descriptors based on spatial space of dedicated keypoints e.g SURF or SIFT or ORB so that they can deal with heterogeneous-content camera-based document image retrieval and spotting. In practice, a large-scale indexing system with an enormous of descriptors put the burdens for memory when they are stored. In addition, high dimension of descriptors can make the accuracy of indexing reduce. We propose three robust indexing frameworks that can be employed without storing local descriptors in the memory for saving memory and speeding up retrieval time by discarding distance validating. The randomized clustering tree indexing inherits kd-tree, kmean-tree and random forest from the way to select K dimensions randomly combined with the highest variance dimension from each node of the tree. We also proposed the weighted Euclidean distance between two data points that is computed and oriented the highest variance dimension. The secondly proposed hashing relies on an indexing system that employs one simple hash table for indexing and retrieving without storing database descriptors. Besides, we propose an extended hashing based method for indexing multi-kinds of features coming from multi-layer of the image. Along with proposed descriptors as well indexing frameworks, we proposed a simple robust way to compute shape orientation of MSER regions so that they can combine with dedicated descriptors (e.g SIFT, SURF, ORB and etc.) rotation invariantly. In the case that descriptors are able to capture neighborhood information around MSER regions, we propose a way to extend MSER regions by increasing the radius of each region. This strategy can be also applied for other detected regions in order to make descriptors be more distinctive. Moreover, we employed the extended hashing based method for indexing multi-kinds of features from multi-layer of images. This system are not only applied for uniform feature type but also multiple feature types from multi-layers separated. Finally, in order to assess the performances of our contributions, and based on the assessment that no public dataset exists for camera-based document image retrieval and spotting systems, we built a new dataset which has been made freely and publicly available for the scientific community. This dataset contains portions of document images acquired via a camera as a query. It is composed of three kinds of information: textual content, graphical content and heterogeneous content
Le, Berre Guillaume. "Vers la mitigation des biais en traitement neuronal des langues." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0074.
Full textIt is well known that deep learning models are sensitive to biases that may be present in the data used for training. These biases, which can be defined as useless or detrimental information for the task in question, can be of different kinds: one can, for example, find biases in the writing styles used, but also much more problematic biases relating to the sex or ethnic origin of individuals. These biases can come from different sources, such as annotators who created the databases, or from the annotation process itself. My thesis deals with the study of these biases and, in particular, is organized around the mitigation of the effects of biases on the training of Natural Language Processing (NLP) models. In particular, I have worked a lot with pre-trained models such as BERT, RoBERTa or UnifiedQA which have become essential in recent years in all areas of NLP and which, despite their extensive pre-training, are very sensitive to these bias problems.My thesis is organized in three parts, each presenting a different way of managing the biases present in the data. The first part presents a method allowing to use the biases present in an automatic summary database in order to increase the variability and the controllability of the generated summaries. Then, in the second part, I am interested in the automatic generation of a training dataset for the multiple-choice question-answering task. The advantage of such a generation method is that it makes it possible not to call on annotators and therefore to eliminate the biases coming from them in the data. Finally, I am interested in training a multitasking model for optical text recognition. I show in this last part that it is possible to increase the performance of our models by using different types of data (handwritten and typed) during their training
Felhi, Mehdi. "Document image segmentation : content categorization." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0109/document.
Full textIn this thesis I discuss the document image segmentation problem and I describe our new approaches for detecting and classifying document contents. First, I discuss our skew angle estimation approach. The aim of this approach is to develop an automatic approach able to estimate, with precision, the skew angle of text in document images. Our method is based on Maximum Gradient Difference (MGD) and R-signature. Then, I describe our second method based on Ridgelet transform.Our second contribution consists in a new hybrid page segmentation approach. I first describe our stroke-based descriptor that allows detecting text and line candidates using the skeleton of the binarized document image. Then, an active contour model is applied to segment the rest of the image into photo and background regions. Finally, text candidates are clustered using mean-shift analysis technique according to their corresponding sizes. The method is applied for segmenting scanned document images (newspapers and magazines) that contain text, lines and photo regions. Finally, I describe our stroke-based text extraction method. Our approach begins by extracting connected components and selecting text character candidates over the CIE LCH color space using the Histogram of Oriented Gradients (HOG) correlation coefficients in order to detect low contrasted regions. The text region candidates are clustered using two different approaches ; a depth first search approach over a graph, and a stable text line criterion. Finally, the resulted regions are refined by classifying the text line candidates into « text» and « non-text » regions using a Kernel Support Vector Machine K-SVM classifier
Benammar, Riyadh. "Détection non-supervisée de motifs dans les partitions musicales manuscrites." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI112.
Full textThis thesis is part of the data mining applied to ancient handwritten music scores and aims at a search for frequent melodic or rhythmic motifs defined as repetitive note sequences with characteristic properties. There are a large number of possible variations of motifs: transpositions, inversions and so-called "mirror" motifs. These motifs allow musicologists to have a level of in-depth analysis on the works of a composer or a musical style. In a context of exploring large corpora where scores are just digitized and not transcribed, an automated search for motifs that verify targeted constraints becomes an essential tool for their study. To achieve the objective of detecting frequent motifs without prior knowledge, we started from images of digitized scores. After pre-processing steps on the image, we exploited and adapted a model for detecting and recognizing musical primitives (note-heads, stems...) from the family of Region-Proposal CNN (RPN) convolution neural networks. We then developed a primitive encoding method to generate a sequence of notes without the complex task of transcribing the entire manuscript work. This sequence was then analyzed using the CSMA (Constraint String Mining Algorithm) approach designed to detect the frequent motifs present in one or more sequences, taking into account constraints on their frequency and length, as well as the size and number of gaps allowed within the motifs. The gap was then studied to avoid recognition errors produced by the RPN network, thus avoiding the implementation of a post-correction system for transcription errors. The work was finally validated by the study of musical motifs for composers identification and classification
Nouvel, Damien. "Reconnaissance des entités nommées par exploration de règles d'annotation - Interpréter les marqueurs d'annotation comme instructions de structuration locale." Phd thesis, Université François Rabelais - Tours, 2012. http://tel.archives-ouvertes.fr/tel-00788630.
Full textPitou, Cynthia. "Extraction d'informations textuelles au sein de documents numérisés : cas des factures." Thesis, La Réunion, 2017. http://www.theses.fr/2017LARE0015.
Full textDocument processing is the transformation of a human understandable data in a computer system understandable format. Document analysis and understanding are the two phases of document processing. Considering a document containing lines, words and graphical objects such as logos, the analysis of such a document consists in extracting and isolating the words, lines and objects and then grouping them into blocks. The subsystem of document understanding builds relationships (to the right, left, above, below) between the blocks. A document processing system must be able to: locate textual information, identify if that information is relevant comparatively to other information contained in the document, extract that information in a computer system understandable format. For the realization of such a system, major difficulties arise from the variability of the documents characteristics, such as: the type (invoice, form, quotation, report, etc.), the layout (font, style, disposition), the language, the typography and the quality of scanning.This work is concerned with scanned documents, also known as document images. We are particularly interested in locating textual information in invoice images. Invoices are largely used and well regulated documents, but not unified. They contain mandatory information (invoice number, unique identifier of the issuing company, VAT amount, net amount, etc.) which, depending on the issuer, can take various locations in the document. The present work is in the framework of region-based textual information localization and extraction.First, we present a region-based method guided by quadtree decomposition. The principle of the method is to decompose the images of documents in four equals regions and each regions in four new regions and so on. Then, with a free optical character recognition (OCR) engine, we try to extract precise textual information in each region. A region containing a number of expected textual information is not decomposed further. Our method allows to determine accurately in document images, the regions containing text information that one wants to locate and retrieve quickly and efficiently.In another approach, we propose a textual information extraction model consisting in a set of prototype regions along with pathways for browsing through these prototype regions. The life cycle of the model comprises five steps:- Produce synthetic invoice data from real-world invoice images containing the textual information of interest, along with their spatial positions.- Partition the produced data.- Derive the prototype regions from the obtained partition clusters.- Derive pathways for browsing through the prototype regions, from the concept lattice of a suitably defined formal context.- Update incrementally the set of protype regions and the set of pathways, when one has to add additional data
Pisane, Jonathan. "Automatic target recognition using passive bistatic radar signals." Phd thesis, Supélec, 2013. http://tel.archives-ouvertes.fr/tel-00963601.
Full textGhanmi, Nabil. "Segmentation d'images de documents manuscrits composites : application aux documents de chimie." Electronic Thesis or Diss., Université de Lorraine, 2016. http://www.theses.fr/2016LORR0109.
Full textThis thesis deals with chemistry document segmentation and structure analysis. This work aims to help chemists by providing the information on the experiments which have already been carried out. The documents are handwritten, heterogeneous and multi-writers. Although their physical structure is relatively simple, since it consists of a succession of three regions representing: the chemical formula of the experiment, a table of the used products and one or more text blocks describing the experimental procedure, several difficulties are encountered. In fact, the lines located at the region boundaries and the imperfections of the table layout make the separation task a real challenge. The proposed methodology takes into account these difficulties by performing segmentation at several levels and treating the region separation as a classification problem. First, the document image is segmented into linear structures using an appropriate horizontal smoothing. The horizontal threshold combined with a vertical overlapping tolerance favor the consolidation of fragmented elements of the formula without too merge the text. These linear structures are classified in text or graphic based on discriminant structural features. Then, the segmentation is continued on text lines to separate the rows of the table from the lines of the raw text locks. We proposed for this classification, a CRF model for determining the optimal labelling of the line sequence. The choice of this kind of model has been motivated by its ability to absorb the variability of lines and to exploit contextual information. For the segmentation of table into cells, we proposed a hybrid method that includes two levels of analysis: structural and syntactic. The first relies on the presence of graphic lines and the alignment of both text and spaces. The second tends to exploit the coherence of the cell content syntax. We proposed, in this context, a Recognition-based approach using contextual knowledge to detect the numeric fields present in the table. The thesis was carried out in the framework of CIFRE, in collaboration with the eNovalys campany.We have implemented and tested all the steps of the proposed system on a consequent dataset of chemistry documents
Ghorbel, Adam. "Generalized Haar-like filters for document analysis : application to word spotting and text extraction from comics." Thesis, La Rochelle, 2016. http://www.theses.fr/2016LAROS008/document.
Full textThe presented thesis follows two directions. The first one disposes a technique for text and graphic separation in comics. The second one points out a learning free segmentation free word spotting framework based on the query-by-string problem for manuscript documents. The two approaches are based on human perception characteristics. Indeed, they were inspired by several characteristics of human vision such as the Preattentive processing. These characteristics guide us to introduce two multi scale approaches for two different document analysis tasks which are text extraction from comics and word spotting in manuscript document. These two approaches are based on applying generalized Haar-like filters globally on each document image whatever its type. Describing and detailing the use of such features throughout this thesis, we offer the researches of document image analysis field a new line of research that has to be more explored in future. The two approaches are layout segmentation free and the generalized Haar-like filters are applied globally on the image. Moreover, no binarization step of the processed document is done in order to avoid losing data that may influence the accuracy of the two frameworks. Indeed, any learning step is performed. Thus, we avoid the process of extraction features a priori which will be performed automatically, taking into consideration the different characteristics of the documents
Nguyen, Chu Duc. "Localization and quality enhancement for automatic recognition of vehicle license plates in video sequences." Thesis, Ecully, Ecole centrale de Lyon, 2011. http://www.theses.fr/2011ECDL0018.
Full textAutomatic reading of vehicle license plates is considered an approach to mass surveillance. It allows, through the detection / localization and optical recognition to identify a vehicle in the images or video sequences. Many applications such as traffic monitoring, detection of stolen vehicles, the toll or the management of entrance/ exit parking uses this method. Yet in spite of important progress made since the appearance of the first prototype sin 1979, with a recognition rate sometimes impressive thanks to advanced science and sensor technology, the constraints imposed for the operation of such systems limit laid. Indeed, the optimal use of techniques for localizing and recognizing license plates in operational scenarios requiring controlled lighting conditions and a limitation of the pose, velocity, or simply type plate. Automatic reading of vehicle license plates then remains an open research problem. The major contribution of this thesis is threefold. First, a new approach to robust license plate localization in images or image sequences is proposed. Then, improving the quality of the plates is treated with a localized adaptation of super-resolution technique. Finally, a unified model of location and super-resolution is proposed to reduce the time complexity of both approaches combined
Ghanmi, Nabil. "Segmentation d'images de documents manuscrits composites : application aux documents de chimie." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0109/document.
Full textThis thesis deals with chemistry document segmentation and structure analysis. This work aims to help chemists by providing the information on the experiments which have already been carried out. The documents are handwritten, heterogeneous and multi-writers. Although their physical structure is relatively simple, since it consists of a succession of three regions representing: the chemical formula of the experiment, a table of the used products and one or more text blocks describing the experimental procedure, several difficulties are encountered. In fact, the lines located at the region boundaries and the imperfections of the table layout make the separation task a real challenge. The proposed methodology takes into account these difficulties by performing segmentation at several levels and treating the region separation as a classification problem. First, the document image is segmented into linear structures using an appropriate horizontal smoothing. The horizontal threshold combined with a vertical overlapping tolerance favor the consolidation of fragmented elements of the formula without too merge the text. These linear structures are classified in text or graphic based on discriminant structural features. Then, the segmentation is continued on text lines to separate the rows of the table from the lines of the raw text locks. We proposed for this classification, a CRF model for determining the optimal labelling of the line sequence. The choice of this kind of model has been motivated by its ability to absorb the variability of lines and to exploit contextual information. For the segmentation of table into cells, we proposed a hybrid method that includes two levels of analysis: structural and syntactic. The first relies on the presence of graphic lines and the alignment of both text and spaces. The second tends to exploit the coherence of the cell content syntax. We proposed, in this context, a Recognition-based approach using contextual knowledge to detect the numeric fields present in the table. The thesis was carried out in the framework of CIFRE, in collaboration with the eNovalys campany.We have implemented and tested all the steps of the proposed system on a consequent dataset of chemistry documents