Academic literature on the topic 'Handwriting text recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Handwriting text recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Handwriting text recognition"

1

Devaraj, Anjali Yogesh, Anup S. Jain, Omisha N, and Shobana TS. "Kannada Text Recognition." International Journal for Research in Applied Science and Engineering Technology 10, no. 9 (September 30, 2022): 73–78. http://dx.doi.org/10.22214/ijraset.2022.46520.

Full text
Abstract:
Abstract: The task of automatic handwriting recognition is critical. This can be a difficult subject, and it has gotten a lot of attention in recent years. In the realm of picture grouping, handwritten character recognition is a problem. Handwritten characters are difficult to decipher since various people have distinct handwriting styles. For decades, researchers have been focusing on character identification in Latin handwriting. Kannada has had fewer studies conducted on it. Our "Kannada Text Recognition" research and effort attempts to classify and recognize characters written in Kannada, a south Indian language. The characters are taken from written documents, preprocessed with numpy and OpenCV, and then run through a CNN.
APA, Harvard, Vancouver, ISO, and other styles
2

Tran, Dat, Wanli Ma, and Dharmendra Sharma. "Handwriting Recognition Applications for Tablet PCs." Journal of Advanced Computational Intelligence and Intelligent Informatics 11, no. 7 (September 20, 2007): 787–92. http://dx.doi.org/10.20965/jaciii.2007.p0787.

Full text
Abstract:
This paper presents handwriting recognition applications developed and tested on the tablet PC – a new generation of notebook computers. Users write on a tablet PC screen with a tablet pen and a built-in user-independent handwriting recognition tool converts handwritings to printed text. We present handwriting recognition applications using the built-in recognition tool and signature verification using our own verification tool based on fuzzy c-means vector quantization (FCMVQ) and observable Markov modeling (OMM). Experimental results for the signature verification system are also presented.
APA, Harvard, Vancouver, ISO, and other styles
3

Xiong, Yu-Jie, Li Liu, Shujing Lyu, Patrick S. P. Wang, and Yue Lu. "Improving Text-Independent Chinese Writer Identification with the Aid of Character Pairs." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 02 (October 24, 2018): 1953001. http://dx.doi.org/10.1142/s021800141953001x.

Full text
Abstract:
Text-independent Chinese writer identification does not depend on the text content of the query and reference handwritings. In order to deal with the uncertainty of the text content, text-independent approaches usually give special attention to the global writing style of handwriting, rather than the properties of each individual character or word. Thanks to the existence of high-frequency characters, some characters probably appear in both the query and reference handwritings in most cases. If character images in the query handwriting are similar to those in the reference handwriting, this query handwriting and the corresponding reference handwriting are very likely to be written by the identical writer. In this paper, we exploit the above characteristic to improve the performance of Chinese writer identification. We first present an identification scheme using edge co-occurrence feature (ECF). Then, we detect the character pairs in the query and reference handwritings using a two-step framework and propose the displacement field-based similarity (DFS) to determine whether a character pair is written by the identical writer. The character pairs help to re-rank the candidate list obtained by text-independent ECF-based similarity and finally decide the writer of the query handwriting. The proposed method is evaluated on the HIT-MW and CASIA-2.1 datasets. Experimental results demonstrate that our proposed method outperforms the existing ones, and its Top-1 accuracy on the two datasets reaches 97.1% and 98.3%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
4

Ram Kumar, R. P., A. Chandra Prasad, K. Vishnuvardhan, K. Bhuvanesh, and Sanjeev Dhama. "Automated Handwritten Text Recognition." E3S Web of Conferences 430 (2023): 01022. http://dx.doi.org/10.1051/e3sconf/202343001022.

Full text
Abstract:
A computer’s capacity to recognize and convert handwritten inputs from sources like photographs and paper documents into digital format is known as Automated Handwritten Text Recognition (AHTR). Systems for reading handwriting are frequently employed in a variety of fields, including banking, finance, and the healthcare industry. In this paper, we took on the problem of categorizing any handwritten artwork, whether it be in block lettering or cursive. There are many different types of handwritten characters, including digits, symbols, and scripts in both English and other languages. This makes the evolution of handwriting more complex. It is difficult to train an Optical Character Recognition (OCR) system using these requirements. In order to convert handwritten material into digital form, this work aims to categorize each unique handwritten word. Because Convolutional Neural Networks (CNNs) are so good at this task, they are the best method for handwriting recognition system. The method will be used to identify writings in various formats.
APA, Harvard, Vancouver, ISO, and other styles
5

Bazarkulova, Aisaule. "KAZAKH HANDWRITING RECOGNITION." Suleyman Demirel University Bulletin Natural and Technical Sciences 62, no. 1 (October 15, 2024): 88–102. https://doi.org/10.47344/sdubnts.v62i1.963.

Full text
Abstract:
Recognition of handwritten text is one aspect of objectrecognition and known as handwriting detection cause of a computer’spotential to recognize and comprehend readable handwriting from resourcesincluding paper files, touch smart devices, images, etc. Data is categorized intoa number of classes or groups using pattern recognition. The paper presents asuccessful experiment in recognizing handwritten Kazakh text usingConvolutional Recurrent Neural Network based architectures and the KazakhAutonomous Handwritten Text Dataset. The proposed algorithm achieved anoverall accuracy of 86.36% and showed promising results. However, the papersuggests that further research could be conducted to improve the model, suchas correlating and enlarging the database or incorporating other models andlibraries. Additionally, the paper emphasizes the importance of consideringlanguage specifics when building a text recognition model, as modernalgorithms that work well in one language may not guarantee the sameperformance in another.
APA, Harvard, Vancouver, ISO, and other styles
6

Dilmurat, Halmurat, and Kurban Ubul. "Design and Realization of On-Line Uyghur Handwritten Character Collection System." Advanced Materials Research 989-994 (July 2014): 4742–46. http://dx.doi.org/10.4028/www.scientific.net/amr.989-994.4742.

Full text
Abstract:
Data collection is the first step in handwritten character recognition systems, and the data quality collected effects the whole systems efficiency. As the necessary subsystem of on-line handwritten character/word recognition system, a Uyghur handwritten character collection system is designed and implemented with Visual C++ based on the nature of Uyghur handwriting. Uyghur handwritings is encoded by 8 direction tendency and stored in extension stroke file. And they are collected based on the content of Text Prompt File. From experimental results, it can be concluded that the handwriting collection system indicates its strong validity and efficiency during the collection of Uyghur handwriting.
APA, Harvard, Vancouver, ISO, and other styles
7

Shonenkov, A. V., D. K. Karachev, M. Y. Novopoltsev, M. S. Potanin, D. V. Dimitrov, and A. V. Chertok. "Handwritten text generation and strikethrough characters augmentation." Computer Optics 46, no. 3 (June 2022): 455–64. http://dx.doi.org/10.18287/2412-6179-co-1049.

Full text
Abstract:
We introduce two data augmentation techniques, which, used with a Resnet-BiLSTM-CTC network, significantly reduce Word Error Rate and Character Error Rate beyond best-reported results on handwriting text recognition tasks. We apply a novel augmentation that simulates strikethrough text (HandWritten Blots) and a handwritten text generation method based on printed text (StackMix), which proved to be very effective in handwriting text recognition tasks. StackMix uses weakly-supervised framework to get character boundaries. Because these data augmentation techniques are independent of the network used, they could also be applied to enhance the performance of other networks and approaches to handwriting text recognition. Extensive experiments on ten handwritten text datasets show that HandWritten Blots augmentation and StackMix significantly improve the quality of handwriting text recognition models.
APA, Harvard, Vancouver, ISO, and other styles
8

Kaur, Amrit Veer, and Amandeep Verma. "Hybrid Wavelet based Technique for Text Extraction from Images." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 9 (October 31, 2017): 24. http://dx.doi.org/10.23956/ijarcsse.v7i9.406.

Full text
Abstract:
This paper reviews the current state of the art in handwriting recognition research. The paper deals with issues such as hand-printed character and cursive handwritten word recognition. It describes recent achievements, difficulties, successes and challenges in all aspects of handwriting recognition.
APA, Harvard, Vancouver, ISO, and other styles
9

Pittman, James A. "Handwriting Recognition: Tablet PC Text Input." Computer 40, no. 9 (September 2007): 49–54. http://dx.doi.org/10.1109/mc.2007.314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kumar, J., and A. Roy. "DograNet – a comprehensive offline dogra handwriting character dataset." Journal of Physics: Conference Series 2251, no. 1 (April 1, 2022): 012008. http://dx.doi.org/10.1088/1742-6596/2251/1/012008.

Full text
Abstract:
Abstract Handwritten Text Recognition is an important area of research because of growing demand to process and convert a huge data and information available in handwritten form to Digital form. The digital data instead of handwritten form can prove to be highly useful in different fields. Handwritten text recognition plays an important role in applications involved in, postal services, banks for cheque processing, searching of information and organization dealing with such applications. In text recognition application dataset of the specified script is required for training purpose. Datasets of the different languages could be found online but dataset of dogra script characters is still not available. This paper presents a Dogra handwriting character dataset which contains around 38690 character images etc grouped in 73 character classes extracted from 530 one-page handwritings of 265 individuals of having variable age, sex, qualification, location. The dogra character dataset would be freely accessible by scholars and researchers which could also be used for further recognition improvement and updating with more characters and word, Identification of writer, dogra word segmentation. Dogra dataset could also be used for extracting variation of handwriting according to age and gender.
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Handwriting text recognition"

1

Wigington, Curtis Michael. "End-to-End Full-Page Handwriting Recognition." BYU ScholarsArchive, 2018. https://scholarsarchive.byu.edu/etd/7099.

Full text
Abstract:
Despite decades of research, offline handwriting recognition (HWR) of historical documents remains a challenging problem, which if solved could greatly improve the searchability of online cultural heritage archives. Historical documents are plagued with noise, degradation, ink bleed-through, overlapping strokes, variation in slope and slant of the writing, and inconsistent layouts. Often the documents in a collection have been written by thousands of authors, all of whom have significantly different writing styles. In order to better capture the variations in writing styles we introduce a novel data augmentation technique. This methods achieves state-of-the-art results on modern datasets written in English and French and a historical dataset written in German.HWR models are often limited by the accuracy of the preceding steps of text detection and segmentation.Motivated by this, we present a deep learning model that jointly learns text detection, segmentation, and recognition using mostly images without detection or segmentation annotations.Our Start, Follow, Read (SFR) model is composed of a Region Proposal Network to find the start position of handwriting lines, a novel line follower network that incrementally follows and preprocesses lines of (perhaps curved) handwriting into dewarped images, and a CNN-LSTM network to read the characters. SFR exceeds the performance of the winner of the ICDAR2017 handwriting recognition competition, even when not using the provided competition region annotations.
APA, Harvard, Vancouver, ISO, and other styles
2

Elmgren, Rasmus. "Handwriting in VR as a Text Input Method." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208646.

Full text
Abstract:
This thesis discusses handwriting as a possible text input method for Virtual Reality (VR) with a goal of comparing handwriting with a virtual keyboard input method. VR applications have different approaches to text input and there is no standard for how the user should enter text. Text input methods are important for the user in many cases, e.g when they document, communicate or enter their login information. The goal of the study was to understand how a handwriting input would compare to pointing at a virtual keyboard, which is the most common approach to the problem. A prototype was built using Tesseract for character recognition and Unity to create a basic virtual environment. This prototype was then evaluated with a user study, comparing it to the de facto standard virtual keyboard input method. The user study had a usability and desirability questionnaire approach and also uses Sutcliffe's heuristics for evaluation of virtual environments. Interviews were performed with each test user. The results suggested that the virtual keyboard performs better except for how engaging the input method was. From the interviews a common comment was that the handwriting input method was more fun and engaging. Further applications of the handwriting input method are discussed as well as why the users favored the virtual keyboard method.
Virtual Reality (VR) applikationer har olika tillvägagångssätt för textinmatning och det finns ingen tydlig standard hur användaren matar in text i VR. Textinmatning är viktigt när användaren ska dokumentera, kommunicera eller logga in. Målet med studien var att jämföra en inmatningsmetod baserad på handskrift med det de facto standard virtuella tangentbordet och se vilken inmatningsmetod användarna föredrog. En prototyp som använde handskrift byggdes med hjälp av Tesseract för textinmatning och Unity för att skapa en virtuell miljö. Prototypen jämfördes sedan med det virtuella tangentbordet i en användarstudie. Användarstudien bestod av uppmätt tid samt antal fel, en enkät och en intervju. Enkäten grundades på användarbarhet, önskvärdhet och Sutcliffes utvärderingsheuristik av virtuella miljöer. Resultatet visar att det virtuella tangentbordet presterade bättre, handskriftsmetoden presterade endast bättre på att engagera användaren. Resultatet från intervjuerna styrkte också att handskriftsmetoden var roligare och mer engagerande att använda men inte lika användbar. Framtida studier föreslås i diskussionen samt varför användarna föredrog det virtuella tangentbordet.
APA, Harvard, Vancouver, ISO, and other styles
3

Han, Changan. "Neural Network Based Off-line Handwritten Text Recognition System." FIU Digital Commons, 2011. http://digitalcommons.fiu.edu/etd/363.

Full text
Abstract:
This dissertation introduces a new system for handwritten text recognition based on an improved neural network design. Most of the existing neural networks treat mean square error function as the standard error function. The system as proposed in this dissertation utilizes the mean quartic error function, where the third and fourth derivatives are non-zero. Consequently, many improvements on the training methods were achieved. The training results are carefully assessed before and after the update. To evaluate the performance of a training system, there are three essential factors to be considered, and they are from high to low importance priority: 1) error rate on testing set, 2) processing time needed to recognize a segmented character and 3) the total training time and subsequently the total testing time. It is observed that bounded training methods accelerate the training process, while semi-third order training methods, next-minimal training methods, and preprocessing operations reduce the error rate on the testing set. Empirical observations suggest that two combinations of training methods are needed for different case character recognition. Since character segmentation is required for word and sentence recognition, this dissertation provides also an effective rule-based segmentation method, which is different from the conventional adaptive segmentation methods. Dictionary-based correction is utilized to correct mistakes resulting from the recognition and segmentation phases. The integration of the segmentation methods with the handwritten character recognition algorithm yielded an accuracy of 92% for lower case characters and 97% for upper case characters. In the testing phase, the database consists of 20,000 handwritten characters, with 10,000 for each case. The testing phase on the recognition 10,000 handwritten characters required 8.5 seconds in processing time.
APA, Harvard, Vancouver, ISO, and other styles
4

Bluche, Théodore. "Deep Neural Networks for Large Vocabulary Handwritten Text Recognition." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112062/document.

Full text
Abstract:
La transcription automatique du texte dans les documents manuscrits a de nombreuses applications, allant du traitement automatique des documents à leur indexation ou leur compréhension. L'une des approches les plus populaires de nos jours consiste à parcourir l'image d'une ligne de texte avec une fenêtre glissante, de laquelle un certain nombre de caractéristiques sont extraites, et modélisées par des Modèles de Markov Cachés (MMC). Quand ils sont associés à des réseaux de neurones, comme des Perceptrons Multi-Couches (PMC) ou Réseaux de Neurones Récurrents de type Longue Mémoire à Court Terme (RNR-LMCT), et à un modèle de langue, ces modèles produisent de bonnes transcriptions. D'autre part, dans de nombreuses applications d'apprentissage automatique, telles que la reconnaissance de la parole ou d'images, des réseaux de neurones profonds, comportant plusieurs couches cachées, ont récemment permis une réduction significative des taux d'erreur.Dans cette thèse, nous menons une étude poussée de différents aspects de modèles optiques basés sur des réseaux de neurones profonds dans le cadre de systèmes hybrides réseaux de neurones / MMC, dans le but de mieux comprendre et évaluer leur importance relative. Dans un premier temps, nous montrons que des réseaux de neurones profonds apportent des améliorations cohérentes et significatives par rapport à des réseaux ne comportant qu'une ou deux couches cachées, et ce quel que soit le type de réseau étudié, PMC ou RNR, et d'entrée du réseau, caractéristiques ou pixels. Nous montrons également que les réseaux de neurones utilisant les pixels directement ont des performances comparables à ceux utilisant des caractéristiques de plus haut niveau, et que la profondeur des réseaux est un élément important de la réduction de l'écart de performance entre ces deux types d'entrées, confirmant la théorie selon laquelle les réseaux profonds calculent des représentations pertinantes, de complexités croissantes, de leurs entrées, en apprenant les caractéristiques de façon automatique. Malgré la domination flagrante des RNR-LMCT dans les publications récentes en reconnaissance d'écriture manuscrite, nous montrons que des PMCs profonds atteignent des performances comparables. De plus, nous avons évalué plusieurs critères d'entrainement des réseaux. Avec un entrainement discriminant de séquences, nous reportons, pour des systèmes PMC/MMC, des améliorations comparables à celles observées en reconnaissance de la parole. Nous montrons également que la méthode de Classification Temporelle Connexionniste est particulièrement adaptée aux RNRs. Enfin, la technique du dropout a récemment été appliquée aux RNR. Nous avons testé son effet à différentes positions relatives aux connexions récurrentes des RNRs, et nous montrons l'importance du choix de ces positions.Nous avons mené nos expériences sur trois bases de données publiques, qui représentent deux langues (l'anglais et le français), et deux époques, en utilisant plusieurs types d'entrées pour les réseaux de neurones : des caractéristiques prédéfinies, et les simples valeurs de pixels. Nous avons validé notre approche en participant à la compétition HTRtS en 2014, où nous avons obtenu la deuxième place. Les résultats des systèmes présentés dans cette thèse, avec les deux types de réseaux de neurones et d'entrées, sont comparables à l'état de l'art sur les bases Rimes et IAM, et leur combinaison dépasse les meilleurs résultats publiés sur les trois bases considérées
The automatic transcription of text in handwritten documents has many applications, from automatic document processing, to indexing and document understanding. One of the most popular approaches nowadays consists in scanning the text line image with a sliding window, from which features are extracted, and modeled by Hidden Markov Models (HMMs). Associated with neural networks, such as Multi-Layer Perceptrons (MLPs) or Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs), and with a language model, these models yield good transcriptions. On the other hand, in many machine learning applications, including speech recognition and computer vision, deep neural networks consisting of several hidden layers recently produced a significant reduction of error rates. In this thesis, we have conducted a thorough study of different aspects of optical models based on deep neural networks in the hybrid neural network / HMM scheme, in order to better understand and evaluate their relative importance. First, we show that deep neural networks produce consistent and significant improvements over networks with one or two hidden layers, independently of the kind of neural network, MLP or RNN, and of input, handcrafted features or pixels. Then, we show that deep neural networks with pixel inputs compete with those using handcrafted features, and that depth plays an important role in the reduction of the performance gap between the two kinds of inputs, supporting the idea that deep neural networks effectively build hierarchical and relevant representations of their inputs, and that features are automatically learnt on the way. Despite the dominance of LSTM-RNNs in the recent literature of handwriting recognition, we show that deep MLPs achieve comparable results. Moreover, we evaluated different training criteria. With sequence-discriminative training, we report similar improvements for MLP/HMMs as those observed in speech recognition. We also show how the Connectionist Temporal Classification framework is especially suited to RNNs. Finally, the novel dropout technique to regularize neural networks was recently applied to LSTM-RNNs. We tested its effect at different positions in LSTM-RNNs, thus extending previous works, and we show that its relative position to the recurrent connections is important. We conducted the experiments on three public databases, representing two languages (English and French) and two epochs, using different kinds of neural network inputs: handcrafted features and pixels. We validated our approach by taking part to the HTRtS contest in 2014. The results of the final systems presented in this thesis, namely MLPs and RNNs, with handcrafted feature or pixel inputs, are comparable to the state-of-the-art on Rimes and IAM. Moreover, the combination of these systems outperformed all published results on the considered databases
APA, Harvard, Vancouver, ISO, and other styles
5

España, Boquera Salvador. "Contributions to the joint segmentation and classification of sequences (My two cents on decoding and handwriting recognition)." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/62215.

Full text
Abstract:
[EN] This work is focused on problems (like automatic speech recognition (ASR) and handwritten text recognition (HTR)) that: 1) can be represented (at least approximately) in terms of one-dimensional sequences, and 2) solving these problems entails breaking the observed sequence down into segments which are associated to units taken from a finite repertoire. The required segmentation and classification tasks are so intrinsically interrelated ("Sayre's Paradox") that they have to be performed jointly. We have been inspired by what some works call the "successful trilogy", which refers to the synergistic improvements obtained when considering: - a good formalization framework and powerful algorithms; - a clever design and implementation taking the best profit of hardware; - an adequate preprocessing and a careful tuning of all heuristics. We describe and study "two stage generative models" (TSGMs) comprising two stacked probabilistic generative stages without reordering. This model not only includes Hidden Markov Models (HMMs, but also "segmental models" (SMs). "Two stage decoders" may be deduced by simply running a TSGM in reversed way, introducing non determinism when required: 1) A directed acyclic graph (DAG) is generated and 2) it is used together with a language model (LM). One-pass decoders constitute a particular case. A formalization of parsing and decoding in terms of semiring values and language equations proposes the use of recurrent transition networks (RTNs) as a normal form for Context Free Grammars (CFGs), using them in a parsing-as-composition paradigm, so that parsing CFGs result in a slight extension of regular ones. Novel transducer composition algorithms have been proposed that can work with RTNs and can deal with null transitions without resorting to filter-composition even in the presence of null transitions and non-idempotent semirings. A review of LMs is described and some contributions mainly focused on LM interfaces, LM representation and on the evaluation of Neural Network LMs (NNLMs) are provided. A review of SMs includes the combination of generative and discriminative segmental models and general scheme of frame emission and another one of SMs. Some fast cache-friendly specialized Viterbi lexicon decoders taking profit of particular HMM topologies are proposed. They are able to manage sets of active states without requiring dictionary look-ups (e.g. hashing). A dataflow architecture allowing the design of flexible and diverse recognition systems from a little repertoire of components has been proposed, including a novel DAG serialization protocol. DAG generators can take over-segmentation constraints into account, make use SMs other than HMMs, take profit of the specialized decoders proposed in this work and use a transducer model to control its behavior making it possible, for instance, to use context dependent units. Relating DAG decoders, they take profit of a general LM interface that can be extended to deal with RTNs. Some improvements for one pass decoders are proposed by combining the specialized lexicon decoders and the "bunch" extension of the LM interface, including an adequate parallelization. The experimental part is mainly focused on HTR tasks on different input modalities (offline, bimodal). We have proposed some novel preprocessing techniques for offline HTR which replace classical geometrical heuristics and make use of automatic learning techniques (neural networks). Experiments conducted on the IAM database using this new preprocessing and HMM hybridized with Multilayer Perceptrons (MLPs) have obtained some of the best results reported for this reference database. Among other HTR experiments described in this work, we have used over-segmentation information, tried lexicon free approaches, performed bimodal experiments and experimented with the combination of hybrid HMMs with holistic classifiers.
[ES] Este trabajo se centra en problemas (como reconocimiento automático del habla (ASR) o de escritura manuscrita (HTR)) que cumplen: 1) pueden representarse (quizás aproximadamente) en términos de secuencias unidimensionales, 2) su resolución implica descomponer la secuencia en segmentos que se pueden clasificar en un conjunto finito de unidades. Las tareas de segmentación y de clasificación necesarias están tan intrínsecamente interrelacionadas ("paradoja de Sayre") que deben realizarse conjuntamente. Nos hemos inspirado en lo que algunos autores denominan "La trilogía exitosa", refereido a la sinergia obtenida cuando se tiene: - un buen formalismo, que dé lugar a buenos algoritmos; - un diseño e implementación ingeniosos y eficientes, que saquen provecho de las características del hardware; - no descuidar el "saber hacer" de la tarea, un buen preproceso y el ajuste adecuado de los diversos parámetros. Describimos y estudiamos "modelos generativos en dos etapas" sin reordenamientos (TSGMs), que incluyen no sólo los modelos ocultos de Markov (HMM), sino también modelos segmentales (SMs). Se puede obtener un decodificador de "dos pasos" considerando a la inversa un TSGM introduciendo no determinismo: 1) se genera un grafo acíclico dirigido (DAG) y 2) se utiliza conjuntamente con un modelo de lenguaje (LM). El decodificador de "un paso" es un caso particular. Se formaliza el proceso de decodificación con ecuaciones de lenguajes y semianillos, se propone el uso de redes de transición recurrente (RTNs) como forma normal de gramáticas de contexto libre (CFGs) y se utiliza el paradigma de análisis por composición de manera que el análisis de CFGs resulta una extensión del análisis de FSA. Se proponen algoritmos de composición de transductores que permite el uso de RTNs y que no necesita recurrir a composición de filtros incluso en presencia de transiciones nulas y semianillos no idempotentes. Se propone una extensa revisión de LMs y algunas contribuciones relacionadas con su interfaz, con su representación y con la evaluación de LMs basados en redes neuronales (NNLMs). Se ha realizado una revisión de SMs que incluye SMs basados en combinación de modelos generativos y discriminativos, así como un esquema general de tipos de emisión de tramas y de SMs. Se proponen versiones especializadas del algoritmo de Viterbi para modelos de léxico y que manipulan estados activos sin recurrir a estructuras de tipo diccionario, sacando provecho de la caché. Se ha propuesto una arquitectura "dataflow" para obtener reconocedores a partir de un pequeño conjunto de piezas básicas con un protocolo de serialización de DAGs. Describimos generadores de DAGs que pueden tener en cuenta restricciones sobre la segmentación, utilizar modelos segmentales no limitados a HMMs, hacer uso de los decodificadores especializados propuestos en este trabajo y utilizar un transductor de control que permite el uso de unidades dependientes del contexto. Los decodificadores de DAGs hacen uso de un interfaz bastante general de LMs que ha sido extendido para permitir el uso de RTNs. Se proponen también mejoras para reconocedores "un paso" basados en algoritmos especializados para léxicos y en la interfaz de LMs en modo "bunch", así como su paralelización. La parte experimental está centrada en HTR en diversas modalidades de adquisición (offline, bimodal). Hemos propuesto técnicas novedosas para el preproceso de escritura que evita el uso de heurísticos geométricos. En su lugar, utiliza redes neuronales. Se ha probado con HMMs hibridados con redes neuronales consiguiendo, para la base de datos IAM, algunos de los mejores resultados publicados. También podemos mencionar el uso de información de sobre-segmentación, aproximaciones sin restricción de un léxico, experimentos con datos bimodales o la combinación de HMMs híbridos con reconocedores de tipo holístico.
[CAT] Aquest treball es centra en problemes (com el reconeiximent automàtic de la parla (ASR) o de l'escriptura manuscrita (HTR)) on: 1) les dades es poden representar (almenys aproximadament) mitjançant seqüències unidimensionals, 2) cal descompondre la seqüència en segments que poden pertanyer a un nombre finit de tipus. Sovint, ambdues tasques es relacionen de manera tan estreta que resulta impossible separar-les ("paradoxa de Sayre") i s'han de realitzar de manera conjunta. Ens hem inspirat pel que alguns autors anomenen "trilogia exitosa", referit a la sinèrgia obtinguda quan prenim en compte: - un bon formalisme, que done lloc a bons algorismes; - un diseny i una implementació eficients, amb ingeni, que facen bon us de les particularitats del maquinari; - no perdre de vista el "saber fer", emprar un preprocés adequat i fer bon us dels diversos paràmetres. Descrivim i estudiem "models generatiu amb dues etapes" sense reordenaments (TSGMs), que inclouen no sols inclouen els models ocults de Markov (HMM), sinò també models segmentals (SM). Es pot obtindre un decodificador "en dues etapes" considerant a l'inrevés un TSGM introduint no determinisme: 1) es genera un graf acíclic dirigit (DAG) que 2) és emprat conjuntament amb un model de llenguatge (LM). El decodificador "d'un pas" en és un cas particular. Descrivim i formalitzem del procés de decodificació basada en equacions de llenguatges i en semianells. Proposem emprar xarxes de transició recurrent (RTNs) com forma normal de gramàtiques incontextuals (CFGs) i s'empra el paradigma d'anàlisi sintàctic mitjançant composició de manera que l'anàlisi de CFGs resulta una lleugera extensió de l'anàlisi de FSA. Es proposen algorismes de composició de transductors que poden emprar RTNs i que no necessiten recorrer a la composició amb filtres fins i tot amb transicions nul.les i semianells no idempotents. Es proposa una extensa revisió de LMs i algunes contribucions relacionades amb la seva interfície, amb la seva representació i amb l'avaluació de LMs basats en xarxes neuronals (NNLMs). S'ha realitzat una revisió de SMs que inclou SMs basats en la combinació de models generatius i discriminatius, així com un esquema general de tipus d'emissió de trames i altre de SMs. Es proposen versions especialitzades de l'algorisme de Viterbi per a models de lèxic que permeten emprar estats actius sense haver de recórrer a estructures de dades de tipus diccionari, i que trauen profit de la caché. S'ha proposat una arquitectura de flux de dades o "dataflow" per obtindre diversos reconeixedors a partir d'un xicotet conjunt de peces amb un protocol de serialització de DAGs. Descrivim generadors de DAGs capaços de tindre en compte restriccions sobre la segmentació, emprar models segmentals no limitats a HMMs, fer us dels decodificadors especialitzats proposats en aquest treball i emprar un transductor de control que permet emprar unitats dependents del contexte. Els decodificadors de DAGs fan us d'una interfície de LMs prou general que ha segut extesa per permetre l'ús de RTNs. Es proposen millores per a reconeixedors de tipus "un pas" basats en els algorismes especialitzats per a lèxics i en la interfície de LMs en mode "bunch", així com la seua paral.lelització. La part experimental està centrada en el reconeiximent d'escriptura en diverses modalitats d'adquisició (offline, bimodal). Proposem un preprocés d'escriptura manuscrita evitant l'us d'heurístics geomètrics, en el seu lloc emprem xarxes neuronals. S'han emprat HMMs hibridats amb xarxes neuronals aconseguint, per a la base de dades IAM, alguns dels millors resultats publicats. També podem mencionar l'ús d'informació de sobre-segmentació, aproximacions sense restricció a un lèxic, experiments amb dades bimodals o la combinació de HMMs híbrids amb classificadors holístics.
España Boquera, S. (2016). Contributions to the joint segmentation and classification of sequences (My two cents on decoding and handwriting recognition) [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/62215
TESIS
Premiado
APA, Harvard, Vancouver, ISO, and other styles
6

Zouhar, David. "Rozpoznávání rukou psaného textu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236429.

Full text
Abstract:
This diploma thesis deals with handwriting recognition in real-time. It describes the ways how the intput data are processed. It is also focused on the classi cation methods, which are used for the recognition. It especially describes hidden Markov models. It also present the evaluation of the success of the recognition based on implemented experiments. The alternative keyboard for MeeGo system was created for this thesis as well. The established system achieved the success above 96%.
APA, Harvard, Vancouver, ISO, and other styles
7

Álvaro, Muñoz Francisco. "Mathematical Expression Recognition based on Probabilistic Grammars." Doctoral thesis, Universitat Politècnica de València, 2015. http://hdl.handle.net/10251/51665.

Full text
Abstract:
[EN] Mathematical notation is well-known and used all over the world. Humankind has evolved from simple methods representing countings to current well-defined math notation able to account for complex problems. Furthermore, mathematical expressions constitute a universal language in scientific fields, and many information resources containing mathematics have been created during the last decades. However, in order to efficiently access all that information, scientific documents have to be digitized or produced directly in electronic formats. Although most people is able to understand and produce mathematical information, introducing math expressions into electronic devices requires learning specific notations or using editors. Automatic recognition of mathematical expressions aims at filling this gap between the knowledge of a person and the input accepted by computers. This way, printed documents containing math expressions could be automatically digitized, and handwriting could be used for direct input of math notation into electronic devices. This thesis is devoted to develop an approach for mathematical expression recognition. In this document we propose an approach for recognizing any type of mathematical expression (printed or handwritten) based on probabilistic grammars. In order to do so, we develop the formal statistical framework such that derives several probability distributions. Along the document, we deal with the definition and estimation of all these probabilistic sources of information. Finally, we define the parsing algorithm that globally computes the most probable mathematical expression for a given input according to the statistical framework. An important point in this study is to provide objective performance evaluation and report results using public data and standard metrics. We inspected the problems of automatic evaluation in this field and looked for the best solutions. We also report several experiments using public databases and we participated in several international competitions. Furthermore, we have released most of the software developed in this thesis as open source. We also explore some of the applications of mathematical expression recognition. In addition to the direct applications of transcription and digitization, we report two important proposals. First, we developed mucaptcha, a method to tell humans and computers apart by means of math handwriting input, which represents a novel application of math expression recognition. Second, we tackled the problem of layout analysis of structured documents using the statistical framework developed in this thesis, because both are two-dimensional problems that can be modeled with probabilistic grammars. The approach developed in this thesis for mathematical expression recognition has obtained good results at different levels. It has produced several scientific publications in international conferences and journals, and has been awarded in international competitions.
[ES] La notación matemática es bien conocida y se utiliza en todo el mundo. La humanidad ha evolucionado desde simples métodos para representar cuentas hasta la notación formal actual capaz de modelar problemas complejos. Además, las expresiones matemáticas constituyen un idioma universal en el mundo científico, y se han creado muchos recursos que contienen matemáticas durante las últimas décadas. Sin embargo, para acceder de forma eficiente a toda esa información, los documentos científicos han de ser digitalizados o producidos directamente en formatos electrónicos. Aunque la mayoría de personas es capaz de entender y producir información matemática, introducir expresiones matemáticas en dispositivos electrónicos requiere aprender notaciones especiales o usar editores. El reconocimiento automático de expresiones matemáticas tiene como objetivo llenar ese espacio existente entre el conocimiento de una persona y la entrada que aceptan los ordenadores. De este modo, documentos impresos que contienen fórmulas podrían digitalizarse automáticamente, y la escritura se podría utilizar para introducir directamente notación matemática en dispositivos electrónicos. Esta tesis está centrada en desarrollar un método para reconocer expresiones matemáticas. En este documento proponemos un método para reconocer cualquier tipo de fórmula (impresa o manuscrita) basado en gramáticas probabilísticas. Para ello, desarrollamos el marco estadístico formal que deriva varias distribuciones de probabilidad. A lo largo del documento, abordamos la definición y estimación de todas estas fuentes de información probabilística. Finalmente, definimos el algoritmo que, dada cierta entrada, calcula globalmente la expresión matemática más probable de acuerdo al marco estadístico. Un aspecto importante de este trabajo es proporcionar una evaluación objetiva de los resultados y presentarlos usando datos públicos y medidas estándar. Por ello, estudiamos los problemas de la evaluación automática en este campo y buscamos las mejores soluciones. Asimismo, presentamos diversos experimentos usando bases de datos públicas y hemos participado en varias competiciones internacionales. Además, hemos publicado como código abierto la mayoría del software desarrollado en esta tesis. También hemos explorado algunas de las aplicaciones del reconocimiento de expresiones matemáticas. Además de las aplicaciones directas de transcripción y digitalización, presentamos dos propuestas importantes. En primer lugar, desarrollamos mucaptcha, un método para discriminar entre humanos y ordenadores mediante la escritura de expresiones matemáticas, el cual representa una novedosa aplicación del reconocimiento de fórmulas. En segundo lugar, abordamos el problema de detectar y segmentar la estructura de documentos utilizando el marco estadístico formal desarrollado en esta tesis, dado que ambos son problemas bidimensionales que pueden modelarse con gramáticas probabilísticas. El método desarrollado en esta tesis para reconocer expresiones matemáticas ha obtenido buenos resultados a diferentes niveles. Este trabajo ha producido varias publicaciones en conferencias internacionales y revistas, y ha sido premiado en competiciones internacionales.
[CAT] La notació matemàtica és ben coneguda i s'utilitza a tot el món. La humanitat ha evolucionat des de simples mètodes per representar comptes fins a la notació formal actual capaç de modelar problemes complexos. A més, les expressions matemàtiques constitueixen un idioma universal al món científic, i s'han creat molts recursos que contenen matemàtiques durant les últimes dècades. No obstant això, per accedir de forma eficient a tota aquesta informació, els documents científics han de ser digitalitzats o produïts directament en formats electrònics. Encara que la majoria de persones és capaç d'entendre i produir informació matemàtica, introduir expressions matemàtiques en dispositius electrònics requereix aprendre notacions especials o usar editors. El reconeixement automàtic d'expressions matemàtiques té per objectiu omplir aquest espai existent entre el coneixement d'una persona i l'entrada que accepten els ordinadors. D'aquesta manera, documents impresos que contenen fórmules podrien digitalitzar-se automàticament, i l'escriptura es podria utilitzar per introduir directament notació matemàtica en dispositius electrònics. Aquesta tesi està centrada en desenvolupar un mètode per reconèixer expressions matemàtiques. En aquest document proposem un mètode per reconèixer qualsevol tipus de fórmula (impresa o manuscrita) basat en gramàtiques probabilístiques. Amb aquesta finalitat, desenvolupem el marc estadístic formal que deriva diverses distribucions de probabilitat. Al llarg del document, abordem la definició i estimació de totes aquestes fonts d'informació probabilística. Finalment, definim l'algorisme que, donada certa entrada, calcula globalment l'expressió matemàtica més probable d'acord al marc estadístic. Un aspecte important d'aquest treball és proporcionar una avaluació objectiva dels resultats i presentar-los usant dades públiques i mesures estàndard. Per això, estudiem els problemes de l'avaluació automàtica en aquest camp i busquem les millors solucions. Així mateix, presentem diversos experiments usant bases de dades públiques i hem participat en diverses competicions internacionals. A més, hem publicat com a codi obert la majoria del software desenvolupat en aquesta tesi. També hem explorat algunes de les aplicacions del reconeixement d'expressions matemàtiques. A més de les aplicacions directes de transcripció i digitalització, presentem dues propostes importants. En primer lloc, desenvolupem mucaptcha, un mètode per discriminar entre humans i ordinadors mitjançant l'escriptura d'expressions matemàtiques, el qual representa una nova aplicació del reconeixement de fórmules. En segon lloc, abordem el problema de detectar i segmentar l'estructura de documents utilitzant el marc estadístic formal desenvolupat en aquesta tesi, donat que ambdós són problemes bidimensionals que poden modelar-se amb gramàtiques probabilístiques. El mètode desenvolupat en aquesta tesi per reconèixer expressions matemàtiques ha obtingut bons resultats a diferents nivells. Aquest treball ha produït diverses publicacions en conferències internacionals i revistes, i ha sigut premiat en competicions internacionals.
Álvaro Muñoz, F. (2015). Mathematical Expression Recognition based on Probabilistic Grammars [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/51665
TESIS
APA, Harvard, Vancouver, ISO, and other styles
8

Serrano, Martínez-Santos Nicolás. "Interactive Transcription of Old Text Documents." Doctoral thesis, Universitat Politècnica de València, 2014. http://hdl.handle.net/10251/37979.

Full text
Abstract:
Nowadays, there are huge collections of handwritten text documents in libraries all over the world. The high demand for these resources has led to the creation of digital libraries in order to facilitate the preservation and provide electronic access to these documents. However text transcription of these documents im- ages are not always available to allow users to quickly search information, or computers to process the information, search patterns or draw out statistics. The problem is that manual transcription of these documents is an expensive task from both economical and time viewpoints. This thesis presents a novel ap- proach for e cient Computer Assisted Transcription (CAT) of handwritten text documents using state-of-the-art Handwriting Text Recognition (HTR) systems. The objective of CAT approaches is to e ciently complete a transcription task through human-machine collaboration, as the e ort required to generate a manual transcription is high, and automatically generated transcriptions from state-of-the-art systems still do not reach the accuracy required. This thesis is centered on a special application of CAT, that is, the transcription of old text document when the quantity of user e ort available is limited, and thus, the entire document cannot be revised. In this approach, the objective is to generate the best possible transcription by means of the user e ort available. This thesis provides a comprehensive view of the CAT process from feature extraction to user interaction. First, a statistical approach to generalise interactive transcription is pro- posed. As its direct application is unfeasible, some assumptions are made to apply it to two di erent tasks. First, on the interactive transcription of hand- written text documents, and next, on the interactive detection of the document layout. Next, the digitisation and annotation process of two real old text documents is described. This process was carried out because of the scarcity of similar resources and the need of annotated data to thoroughly test all the developed tools and techniques in this thesis. These two documents were carefully selected to represent the general di culties that are encountered when dealing with HTR. Baseline results are presented on these two documents to settle down a benchmark with a standard HTR system. Finally, these annotated documents were made freely available to the community. It must be noted that, all the techniques and methods developed in this thesis have been assessed on these two real old text documents. Then, a CAT approach for HTR when user e ort is limited is studied and extensively tested. The ultimate goal of applying CAT is achieved by putting together three processes. Given a recognised transcription from an HTR system. The rst process consists in locating (possibly) incorrect words and employs the user e ort available to supervise them (if necessary). As most words are not expected to be supervised due to the limited user e ort available, only a few are selected to be revised. The system presents to the user a small subset of these words according to an estimation of their correctness, or to be more precise, according to their con dence level. Next, the second process starts once these low con dence words have been supervised. This process updates the recogni- tion of the document taking user corrections into consideration, which improves the quality of those words that were not revised by the user. Finally, the last process adapts the system from the partially revised (and possibly not perfect) transcription obtained so far. In this adaptation, the system intelligently selects the correct words of the transcription. As results, the adapted system will bet- ter recognise future transcriptions. Transcription experiments using this CAT approach show that this approach is mostly e ective when user e ort is low. The last contribution of this thesis is a method for balancing the nal tran- scription quality and the supervision e ort applied using our previously de- scribed CAT approach. In other words, this method allows the user to control the amount of errors in the transcriptions obtained from a CAT approach. The motivation of this method is to let users decide on the nal quality of the desired documents, as partially erroneous transcriptions can be su cient to convey the meaning, and the user e ort required to transcribe them might be signi cantly lower when compared to obtaining a totally manual transcription. Consequently, the system estimates the minimum user e ort required to reach the amount of error de ned by the user. Error estimation is performed by computing sepa- rately the error produced by each recognised word, and thus, asking the user to only revise the ones in which most errors occur. Additionally, an interactive prototype is presented, which integrates most of the interactive techniques presented in this thesis. This prototype has been developed to be used by palaeographic expert, who do not have any background in HTR technologies. After a slight ne tuning by a HTR expert, the prototype lets the transcribers to manually annotate the document or employ the CAT ap- proach presented. All automatic operations, such as recognition, are performed in background, detaching the transcriber from the details of the system. The prototype was assessed by an expert transcriber and showed to be adequate and e cient for its purpose. The prototype is freely available under a GNU Public Licence (GPL).
Serrano Martínez-Santos, N. (2014). Interactive Transcription of Old Text Documents [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37979
TESIS
APA, Harvard, Vancouver, ISO, and other styles
9

Pastor, Pellicer Joan. "Neural Networks for Document Image and Text Processing." Doctoral thesis, Universitat Politècnica de València, 2017. http://hdl.handle.net/10251/90443.

Full text
Abstract:
Nowadays, the main libraries and document archives are investing a considerable effort on digitizing their collections. Indeed, most of them are scanning the documents and publishing the resulting images without their corresponding transcriptions. This seriously limits the document exploitation possibilities. When the transcription is necessary, it is manually performed by human experts, which is a very expensive and error-prone task. Obtaining transcriptions to the level of required quality demands the intervention of human experts to review and correct the resulting output of the recognition engines. To this end, it is extremely useful to provide interactive tools to obtain and edit the transcription. Although text recognition is the final goal, several previous steps (known as preprocessing) are necessary in order to get a fine transcription from a digitized image. Document cleaning, enhancement, and binarization (if they are needed) are the first stages of the recognition pipeline. Historical Handwritten Documents, in addition, show several degradations, stains, ink-trough and other artifacts. Therefore, more sophisticated and elaborate methods are required when dealing with these kind of documents, even expert supervision in some cases is needed. Once images have been cleaned, main zones of the image have to be detected: those that contain text and other parts such as images, decorations, versal letters. Moreover, the relations among them and the final text have to be detected. Those preprocessing steps are critical for the final performance of the system since an error at this point will be propagated during the rest of the transcription process. The ultimate goal of the Document Image Analysis pipeline is to receive the transcription of the text (Optical Character Recognition and Handwritten Text Recognition). During this thesis we aimed to improve the main stages of the recognition pipeline, from the scanned documents as input to the final transcription. We focused our effort on applying Neural Networks and deep learning techniques directly on the document images to extract suitable features that will be used by the different tasks dealt during the following work: Image Cleaning and Enhancement (Document Image Binarization), Layout Extraction, Text Line Extraction, Text Line Normalization and finally decoding (or text line recognition). As one can see, the following work focuses on small improvements through the several Document Image Analysis stages, but also deals with some of the real challenges: historical manuscripts and documents without clear layouts or very degraded documents. Neural Networks are a central topic for the whole work collected in this document. Different convolutional models have been applied for document image cleaning and enhancement. Connectionist models have been used, as well, for text line extraction: first, for detecting interest points and combining them in text segments and, finally, extracting the lines by means of aggregation techniques; and second, for pixel labeling to extract the main body area of the text and then the limits of the lines. For text line preprocessing, i.e., to normalize the text lines before recognizing them, similar models have been used to detect the main body area and then to height-normalize the images giving more importance to the central area of the text. Finally, Convolutional Neural Networks and deep multilayer perceptrons have been combined with hidden Markov models to improve our transcription engine significantly. The suitability of all these approaches has been tested with different corpora for any of the stages dealt, giving competitive results for most of the methodologies presented.
Hoy en día, las principales librerías y archivos está invirtiendo un esfuerzo considerable en la digitalización de sus colecciones. De hecho, la mayoría están escaneando estos documentos y publicando únicamente las imágenes sin transcripciones, limitando seriamente la posibilidad de explotar estos documentos. Cuando la transcripción es necesaria, esta se realiza normalmente por expertos de forma manual, lo cual es una tarea costosa y propensa a errores. Si se utilizan sistemas de reconocimiento automático se necesita la intervención de expertos humanos para revisar y corregir la salida de estos motores de reconocimiento. Por ello, es extremadamente útil para proporcionar herramientas interactivas con el fin de generar y corregir la transcripciones. Aunque el reconocimiento de texto es el objetivo final del Análisis de Documentos, varios pasos previos (preprocesamiento) son necesarios para conseguir una buena transcripción a partir de una imagen digitalizada. La limpieza, mejora y binarización de las imágenes son las primeras etapas del proceso de reconocimiento. Además, los manuscritos históricos tienen una mayor dificultad en el preprocesamiento, puesto que pueden mostrar varios tipos de degradaciones, manchas, tinta a través del papel y demás dificultades. Por lo tanto, este tipo de documentos requiere métodos de preprocesamiento más sofisticados. En algunos casos, incluso, se precisa de la supervisión de expertos para garantizar buenos resultados en esta etapa. Una vez que las imágenes han sido limpiadas, las diferentes zonas de la imagen deben de ser localizadas: texto, gráficos, dibujos, decoraciones, letras versales, etc. Por otra parte, también es importante conocer las relaciones entre estas entidades. Estas etapas del pre-procesamiento son críticas para el rendimiento final del sistema, ya que los errores cometidos en aquí se propagarán al resto del proceso de transcripción. El objetivo principal del trabajo presentado en este documento es mejorar las principales etapas del proceso de reconocimiento completo: desde las imágenes escaneadas hasta la transcripción final. Nuestros esfuerzos se centran en aplicar técnicas de Redes Neuronales (ANNs) y aprendizaje profundo directamente sobre las imágenes de los documentos, con la intención de extraer características adecuadas para las diferentes tareas: Limpieza y Mejora de Documentos, Extracción de Líneas, Normalización de Líneas de Texto y, finalmente, transcripción del texto. Como se puede apreciar, el trabajo se centra en pequeñas mejoras en diferentes etapas del Análisis y Procesamiento de Documentos, pero también trata de abordar tareas más complejas: manuscritos históricos, o documentos que presentan degradaciones. Las ANNs y el aprendizaje profundo son uno de los temas centrales de esta tesis. Diferentes modelos neuronales convolucionales se han desarrollado para la limpieza y mejora de imágenes de documentos. También se han utilizado modelos conexionistas para la extracción de líneas: primero, para detectar puntos de interés y segmentos de texto y, agregarlos para extraer las líneas del documento; y en segundo lugar, etiquetando directamente los píxeles de la imagen para extraer la zona central del texto y así definir los límites de las líneas. Para el preproceso de las líneas de texto, es decir, la normalización del texto antes del reconocimiento final, se han utilizado modelos similares a los mencionados para detectar la zona central del texto. Las imagenes se rescalan a una altura fija dando más importancia a esta zona central. Por último, en cuanto a reconocimiento de escritura manuscrita, se han combinado técnicas de ANNs y aprendizaje profundo con Modelos Ocultos de Markov, mejorando significativamente los resultados obtenidos previamente por nuestro motor de reconocimiento. La idoneidad de todos estos enfoques han sido testeados con diferentes corpus en cada una de las tareas tratadas., obtenie
Avui en dia, les principals llibreries i arxius històrics estan invertint un esforç considerable en la digitalització de les seues col·leccions de documents. De fet, la majoria estan escanejant aquests documents i publicant únicament les imatges sense les seues transcripcions, fet que limita seriosament la possibilitat d'explotació d'aquests documents. Quan la transcripció del text és necessària, normalment aquesta és realitzada per experts de forma manual, la qual cosa és una tasca costosa i pot provocar errors. Si s'utilitzen sistemes de reconeixement automàtic es necessita la intervenció d'experts humans per a revisar i corregir l'eixida d'aquests motors de reconeixement. Per aquest motiu, és extremadament útil proporcionar eines interactives amb la finalitat de generar i corregir les transcripcions generades pels motors de reconeixement. Tot i que el reconeixement del text és l'objectiu final de l'Anàlisi de Documents, diversos passos previs (coneguts com preprocessament) són necessaris per a l'obtenció de transcripcions acurades a partir d'imatges digitalitzades. La neteja, millora i binarització de les imatges (si calen) són les primeres etapes prèvies al reconeixement. A més a més, els manuscrits històrics presenten una major dificultat d'analisi i preprocessament, perquè poden mostrar diversos tipus de degradacions, taques, tinta a través del paper i altres peculiaritats. Per tant, aquest tipus de documents requereixen mètodes de preprocessament més sofisticats. En alguns casos, fins i tot, es precisa de la supervisió d'experts per a garantir bons resultats en aquesta etapa. Una vegada que les imatges han sigut netejades, les diferents zones de la imatge han de ser localitzades: text, gràfics, dibuixos, decoracions, versals, etc. D'altra banda, també és important conéixer les relacions entre aquestes entitats i el text que contenen. Aquestes etapes del preprocessament són crítiques per al rendiment final del sistema, ja que els errors comesos en aquest moment es propagaran a la resta del procés de transcripció. L'objectiu principal del treball que estem presentant és millorar les principals etapes del procés de reconeixement, és a dir, des de les imatges escanejades fins a l'obtenció final de la transcripció del text. Els nostres esforços se centren en aplicar tècniques de Xarxes Neuronals (ANNs) i aprenentatge profund directament sobre les imatges de documents, amb la intenció d'extraure característiques adequades per a les diferents tasques analitzades: neteja i millora de documents, extracció de línies, normalització de línies de text i, finalment, transcripció. Com es pot apreciar, el treball realitzat aplica xicotetes millores en diferents etapes de l'Anàlisi de Documents, però també tracta d'abordar tasques més complexes: manuscrits històrics, o documents que presenten degradacions. Les ANNs i l'aprenentatge profund són un dels temes centrals d'aquesta tesi. Diferents models neuronals convolucionals s'han desenvolupat per a la neteja i millora de les dels documents. També s'han utilitzat models connexionistes per a la tasca d'extracció de línies: primer, per a detectar punts d'interés i segments de text i, agregar-los per a extraure les línies del document; i en segon lloc, etiquetant directament els pixels de la imatge per a extraure la zona central del text i així definir els límits de les línies. Per al preprocés de les línies de text, és a dir, la normalització del text abans del reconeixement final, s'han utilitzat models similars als utilitzats per a l'extracció de línies. Finalment, quant al reconeixement d'escriptura manuscrita, s'han combinat tècniques de ANNs i aprenentatge profund amb Models Ocults de Markov, que han millorat significativament els resultats obtinguts prèviament pel nostre motor de reconeixement. La idoneïtat de tots aquests enfocaments han sigut testejats amb diferents corpus en cadascuna de les tasques tractad
Pastor Pellicer, J. (2017). Neural Networks for Document Image and Text Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90443
TESIS
APA, Harvard, Vancouver, ISO, and other styles
10

Fizaine, Florian. "Transcription de documents historiques avec des algorithmes de Deep Learning." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCK095.

Full text
Abstract:
Notre travail de thèse s'inscrit dans le cadre d'un projet de recherche porté par les Archives départementales de la Côte-d'Or, le projet "Lettres en Lumière", qui vise à adapter des algorithmes d'intelligence artificielle à la transcription automatique de documents historiques que ces Archives possèdent. Pour initier le projet, concernant ces documents, le choix s'est porté sur des textes manuscrits du XVIIIe siècle contenus dans les Registres des Délibérations des Etats de Bourgogne.Les approches actuelles les plus compétitives de la transcription automatique de textes manuscrits déroulent le processus en deux étapes : segmentation optimale des lignes du texte, suivie du processus de transcription à proprement dit, durant lequel les caractères vont être déchiffrés pour reconstituer les mots du texte.Après un premier chapitre qui décrit le contexte du projet, nous présentons, dans le deuxième chapitre, notre étude sur la segmentation optimale des lignes de textes. Notre choix de la méthode de segmentation s'est tourné vers deux algorithmes de Deep Learning : Unet et MaskRCNN. Ce choix s'est appuyé sur un état de l'art approfondi des différents algorithmes de segmentation. Nous montrons que MaskRCNN, qui est un algorithme de segmentation d’instances, est le plus performant pour le cas de l’extraction optimisée de lignes de textes anciens manuscrits.Le travail portant sur la transcription de lignes, décrit dans le chapitre trois, nous a conduit à retenir, après une étude approfondie de l’état de l'art, des architectures à base de réseaux de neurones Transformer. Nous montrons que le réseau de neurones Transformer TrOCR, associé à notre algorithme de segmentation de lignes, permet de réaliser des transcriptions avec un taux d'erreur maximal par caractère de 3.4%.Si les résultats présentés dans ces chapitres permettent d’envisager la mise à disposition à court terme d'une plateforme de transcription exploitable par un large public, un problème important apparaît concernant l'utilisation excessive de moyens calculs en lien avec la complexité sous-jacente des algorithmes d'IA. Pour résoudre ce verrou majeur, de nombreux chercheurs en intelligence artificielle mènent des travaux autour de l'IA frugale.Dans ce contexte, nous proposons, dans le chapitre 4, une approche de transcription de lignes de textes à base de réseaux de neurones bio-inspirés. Plus précisément, nous nous appuyons sur des réseaux de neurones impulsionnels ("Spiking Neural Networks" ou SNN) intitulés aussi réseaux de neurones à spikes. Après une étude approfondie sur l’état de l’art de tels réseaux, notre choix s’est orienté vers l’utilisation du réseau de neurones Spikformer que nous avons optimisé pour la transcription de lignes de textes. Nous montrons que notre approche bio-inspirée est bénéfique et prometteuse : taux d'erreur maximal par caractère de 4.2% pour des textes dactylographiés et de 12.7% pour des textes simulant des textes manuscrits. Cette étude est la première dans la littérature à aborder une application aussi complexe pour ce type de réseaux, et elle démontre tout l’intérêt qu’il y a à poursuivre cette voie
Our work is part of a research project led by the Archives of the Côte-d'Or Department, the "Lettres en Lumière" project, which aims to adapt artificial intelligence algorithms to the automatic transcription of historical documents held by the Archives. At the beginning of the project, these documents were selected from the 18th century manuscripts of the Registres des Délibérations des Etats de Bourgogne.Today's most competitive approaches to automatic transcription of handwritten texts involve a two-step process: optimal segmentation of the lines of the text, followed by the actual transcription process, in which the characters are deciphered to reconstruct the words of the text.After a first chapter describing the context of the project, in the second chapter we present our study on the optimal segmentation of text lines. Our choice of line segmentation method turned on two main deep learning algorithms: Unet and MaskRCNN. This choice was based on a thorough state-of-the-art review of the various segmentation algorithms. We show that MaskRCNN, which is an instance segmentation algorithm, performs best in the case of optimized line extraction from handwritten text.Our work on line transcription, described in chapter three, led us, after a thorough study of the state of the art, to select architectures based on Transformer neural networks. We show that the Transformer TrOCR neural network, combined with our line segmentation algorithm, allows us to achieve transcriptions with a maximum error rate per character of 3.4%.While the results obtained and presented in these chapters suggest that a transcription platform usable by the general public interested in paleography could be made available in the short term, a major problem arises concerning the excessive use of computational resources related to the underlying complexity of AI algorithms. To solve this major problem, many artificial intelligence researchers are working on frugal AI.In this context, in Chapter 4, we propose an approach to line-of-text transcription based on bio-inspired neural networks. More specifically, we rely on spiking neural networks (SNNs). After a thorough study of the state of the art in such networks, we decided to use the Spikformer neural network, which we optimized for line-of-text transcription. We show that our bio-inspired approach is advantageous and promising: a maximum error rate per character of 4.2% for typed texts and 12.7% for texts simulating handwriting. This study is the first in the literature to tackle such a complex application for this type of network and demonstrates the interest in pursuing this avenue of research
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Handwriting text recognition"

1

Rafique, Aftab, and M. Ishtiaq. "UOHTD: Urdu Offline Handwritten Text Dataset." In Frontiers in Handwriting Recognition, 498–511. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Xiaoyi, Tianwei Wang, Jiapeng Wang, Lianwen Jin, Canjie Luo, and Yang Xue. "ChaCo: Character Contrastive Learning for Handwritten Text Recognition." In Frontiers in Handwriting Recognition, 345–59. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mondal, Ajoy, and C. V. Jawahar. "Enhancing Indic Handwritten Text Recognition Using Global Semantic Information." In Frontiers in Handwriting Recognition, 360–74. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Madi, Boraq, Reem Alaasam, and Jihad El-Sana. "Text Edges Guided Network for Historical Document Super Resolution." In Frontiers in Handwriting Recognition, 18–33. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

d’Arce, Rafael, Terence Norton, Sion Hannuna, and Nello Cristianini. "Self-attention Networks for Non-recurrent Handwritten Text Recognition." In Frontiers in Handwriting Recognition, 389–403. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_27.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Pao, Yoh-Han, and Gwang-Hoon Park. "Neural-Net Computing for Machine Recognition of Handwritten English Language text." In Fundamentals in Handwriting Recognition, 335–51. Berlin, Heidelberg: Springer Berlin Heidelberg, 1994. http://dx.doi.org/10.1007/978-3-642-78646-4_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Wei, Xiangdong Su, and Haoran Zhang. "Script-Level Word Sample Augmentation for Few-Shot Handwritten Text Recognition." In Frontiers in Handwriting Recognition, 316–30. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kiessling, Benjamin. "CurT: End-to-End Text Line Detection in Historical Documents with Transformers." In Frontiers in Handwriting Recognition, 34–48. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yu, Ming-Ming, Heng Zhang, Fei Yin, and Cheng-Lin Liu. "An Efficient Prototype-Based Model for Handwritten Text Recognition with Multi-loss Fusion." In Frontiers in Handwriting Recognition, 404–18. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Qiao, Zhi, Zhilong Ji, Ye Yuan, and Jinfeng Bai. "A Vision Transformer Based Scene Text Recognizer with Multi-grained Encoding and Decoding." In Frontiers in Handwriting Recognition, 198–212. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21648-0_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Handwriting text recognition"

1

Sumathy, R., S. Narayana Swami, T. Pavan Kumar, V. Lakshmi Narasimha, and B. Premalatha. "Handwriting Text Recognition using CNN and RNN." In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC). IEEE, 2023. http://dx.doi.org/10.1109/icaaic56838.2023.10140449.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sanchez, Joan Andreu, and Umapada Pal. "Handwritten Text Recognition for Bengali." In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 2016. http://dx.doi.org/10.1109/icfhr.2016.0105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Noubigh, Zouhaira, Anis Mezghani, and Monji Kherallah. "Transfer Learning to improve Arabic handwriting text Recognition." In 2020 21st International Arab Conference on Information Technology (ACIT). IEEE, 2020. http://dx.doi.org/10.1109/acit50332.2020.9300105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Chowdhury, Sadia, Farhan Rahman Wasee, Mohammad Shafiqul Islam, and Hasan U. Zaman. "Bengali Handwriting Recognition and Conversion to Editable Text." In 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC). IEEE, 2018. http://dx.doi.org/10.1109/icaecc.2018.8479487.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Junqing, Peng Ren, and Xiaoxiao Kong. "Handwriting Text Recognition Based on Faster R-CNN." In 2019 Chinese Automation Congress (CAC). IEEE, 2019. http://dx.doi.org/10.1109/cac48633.2019.8997382.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Gatos, Basilis, Georgios Louloudis, and Nikolaos Stamatopoulos. "Segmentation of Historical Handwritten Documents into Text Zones and Text Lines." In 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 2014. http://dx.doi.org/10.1109/icfhr.2014.84.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Potyashin, Ivan, Mariam Kaprielova, Yury Chekhovich, Alexandr Kildyakov, Temirlan Seil, Evgeny Finogeev, and Andrey Grabovoy. "HWR200: New open access dataset of handwritten texts images in Russian." In INTERNATIONAL CONFERENCE on Computational Linguistics and Intellectual Technologies. RSUH, 2023. http://dx.doi.org/10.28995/2075-7182-2023-22-452-458.

Full text
Abstract:
Handwritten text image datasets are highly useful for solving many problems using machine learning. Such problems include recognition of handwritten characters and handwriting, visual question answering, near-duplicate detection, search for text reuse in handwriting and many auxiliary tasks: highlighting lines, words, other objects in the text. The paper presents new dataset of handwritten texts images in Russian created by 200 writers with different handwriting and photographed in different environment1 . We described the procedure for creating this dataset and the requirements that were set for the texts and photos. The experiments with the baseline solution on fraud search and text reuse search problems showed results of results of 60% and 83% recall respectively and 5% and 2% false positive rate respectively on the dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Gargouri, Mariem, Slim Kanoun, and Jean-Marc Ogier. "Text-Independent Writer Identification on Online Arabic Handwriting." In 2013 12th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2013. http://dx.doi.org/10.1109/icdar.2013.93.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Phan, Truyen Van, and Masaki Nakagawa. "Text/Non-text Classification in Online Handwritten Documents with Recurrent Neural Networks." In 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 2014. http://dx.doi.org/10.1109/icfhr.2014.12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Paschalakis, S., G. Filis, C. Allgrove, and M. C. Fairhurst. "Estimating wordlength for efficient text analysis." In IEE Third European Workshop on Handwriting Analysis and Recognition. IEE, 1998. http://dx.doi.org/10.1049/ic:19980695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography