Dissertations / Theses on the topic 'Machine translations'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Machine translations.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Ilisei, Iustina-Narcisa. "A machine learning approach to the identification of translational language : an inquiry into translationese learning models." Thesis, University of Wolverhampton, 2012. http://hdl.handle.net/2436/299371.
Full textTirnauca, Catalin Ionut. "Syntax-directed translations, tree transformations and bimorphisms." Doctoral thesis, Universitat Rovira i Virgili, 2016. http://hdl.handle.net/10803/381246.
Full textLa traducción basada en la sintaxis surgió en el ámbito de la traducción automática de los lenguajes naturales. Los sistemas deben modelar las transformaciones de árboles, reordenar partes de oraciones, ser simétricos y poseer propiedades como la composición o simetría. Existen varias maneras de definir transformaciones de árboles: gramáticas síncronas, transductores de árboles y bimorfismos de árboles. Las gramáticas síncronas hacen todo tipo de rotaciones, pero las propiedades matemáticas son más difíciles de probar. Los transductores de árboles son operacionales y fáciles de implementar pero las clases principales no son cerradas bajo la composición. Los bimorfismos de árboles son difíciles de implementar, pero proporcionan una herramienta natural para probar composición o simetría. Para mejorar el proceso de traducción, las gramáticas síncronas se relacionan con los bimorfismos de árboles y con los transductores de árboles. En esta tesis se lleva a cabo un amplio estudio de la teoría y las propiedades de los sistemas de traducción dirigidas por la sintaxis, desde estas tres perspectivas muy diferentes que se complementan perfectamente entre sí: como dispositivos generativos (gramáticas síncronas), como máquinas aceptadores (transductores) y como estructuras algebraicas (bimorfismos). Se investigan y comparan al nivel de la transformación de árboles y como dispositivos que definen translaciones. El estudio se centra en bimorfismos, con especial énfasis en sus aplicaciones para el procesamiento del lenguaje natural. También se propone una completa y actualizada visión general sobre las clases de transformaciones de árboles definidos por bimorfismos, vinculándolos con los tipos conocidos de gramáticas síncronas y transductores de árboles. Probamos o recordamos todas las propiedades interesantes que tales clases poseen, mejorando así los previos conocimientos matemáticos. Además, se exponen las relaciones de inclusión entre las principales clases de bimorfismos a través de un diagrama Hasse, como dispositivos de traducción y como mecanismos de transformación de árboles.
Syntax-based machine translation was established by the demanding need of systems used in practical translations between natural languages. Such systems should, among others, model tree transformations, re-order parts of sentences, be symmetric and possess composability or forward and backward application. There are several formal ways to define tree transformations: synchronous grammars, tree transducers and tree bimorphisms. The synchronous grammars do all kind of rotations, but mathematical properties are harder to prove. The tree transducers are operational and easy to implement, but closure under composition does not hold for the main types. The tree bimorphisms are difficult to implement, but they provide a natural tool for proving composability or symmetry. To improve the translation process, synchronous grammars were related to tree bimorphisms and tree transducers. Following this lead, we give a comprehensive study of the theory and properties of syntax-directed translation systems seen from these three very different perspectives that perfectly complement each other: as generating devices (synchronous grammars), as acceptors (transducer machines) and as algebraic structures (bimorphisms). They are investigated and compared both as tree transformation and translation defining devices. The focus is on bimorphisms as they only recently got again into the spotlight especially given their applications to natural language processing. Moreover, we propose a complete and up-to-date overview on tree transformations classes defined by bimorphisms, linking them with well-known types of synchronous grammars and tree transducers. We prove or recall all the interesting properties such classes possess improving thus the mathematical knowledge on synchronous grammars and/or tree transducers. Also, inclusion relations between the main classes of bimorphisms both as translation devices and as tree transformation mechanisms are given for the first time through a Hasse diagram. Directions for future work are suggested by exhibiting how to extend previous results to more general classes of bimorphisms and synchronous grammars.
Al, Batineh Mohammed S. "Latent Semantic Analysis, Corpus stylistics and Machine Learning Stylometry for Translational and Authorial Style Analysis: The Case of Denys Johnson-Davies’ Translations into English." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1429300641.
Full textTebbifakhr, Amirhossein. "Machine Translation For Machines." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/320504.
Full textTiedemann, Jörg. "Recycling Translations : Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing." Doctoral thesis, Uppsala University, Department of Linguistics, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-3791.
Full textThe focus of this thesis is on re-using translations in natural language processing. It involves the collection of documents and their translations in an appropriate format, the automatic extraction of translation data, and the application of the extracted data to different tasks in natural language processing.
Five parallel corpora containing more than 35 million words in 60 languages have been collected within co-operative projects. All corpora are sentence aligned and parts of them have been analyzed automatically and annotated with linguistic markup.
Lexical data are extracted from the corpora by means of word alignment. Two automatic word alignment systems have been developed, the Uppsala Word Aligner (UWA) and the Clue Aligner. UWA implements an iterative "knowledge-poor" word alignment approach using association measures and alignment heuristics. The Clue Aligner provides an innovative framework for the combination of statistical and linguistic resources in aligning single words and multi-word units. Both aligners have been applied to several corpora. Detailed evaluations of the alignment results have been carried out for three of them using fine-grained evaluation techniques.
A corpus processing toolbox, Uplug, has been developed. It includes the implementation of UWA and is freely available for research purposes. A new version, Uplug II, includes the Clue Aligner. It can be used via an experimental web interface (UplugWeb).
Lexical data extracted by the word aligners have been applied to different tasks in computational lexicography and machine translation. The use of word alignment in monolingual lexicography has been investigated in two studies. In a third study, the feasibility of using the extracted data in interactive machine translation has been demonstrated. Finally, extracted lexical data have been used for enhancing the lexical components of two machine translation systems.
Joelsson, Jakob. "Translationese and Swedish-English Statistical Machine Translation." Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305199.
Full textKarlbom, Hannes. "Hybrid Machine Translation : Choosing the best translation with Support Vector Machines." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-304257.
Full textAhmadniaye, Bosari Benyamin. "Reliable training scenarios for dealing with minimal parallel-resource language pairs in statistical machine translation." Doctoral thesis, Universitat Autònoma de Barcelona, 2017. http://hdl.handle.net/10803/461204.
Full textThe thesis is about the topic of high-quality Statistical Machine Translation (SMT) systems for working with minimal parallel-resource language pairs entitled “Reliable Training Scenarios for Dealing with Minimal Parallel-Resource Language Pairs in Statistical Machine Translation”. Then main challenge we targeted in our approaches is parallel data scarcity, and this challenge is faced in different solution scenarios. SMT is one of the preferred approaches to Machine Translation (MT), and various improvements could be detected in this approach, specifically in the output quality in a number of systems for language pairs since the advances in computational power, together with the exploration of new methods and algorithms have been made. When we ponder over the development of SMT systems for many language pairs, the major bottleneck that we will find is the lack of training parallel data. Due to the fact that lots of time and effort is required to create these corpora, they are available in limited quantity, genre, and language. SMT models learn that how they could do translation through the process of examining a bilingual parallel corpus that contains the sentences aligned with their human-produced translations. However, the output quality of SMT systems is heavily dependent on the availability of massive amounts of parallel text within the source and target languages. Hence, an important role is played by the parallel resources so that the quality of SMT systems could be improved. We define minimal parallel-resource SMT settings possess only small amounts of parallel data, which can also be seen in various pairs of languages. The performance achieved by current state-of-the-art minimal parallel-resource SMT is highly appreciable, but they usually use the monolingual text and do not fundamentally address the shortage of parallel training text. Creating enlargement in the parallel training data without providing any sort of guarantee on the quality of the bilingual sentence pairs that have been newly generated, is also raising concerns. The limitations that emerge during the training of the minimal parallel- resource SMT prove that the current systems are incapable of producing the high- quality translation output. In this thesis, we have proposed the “direct-bridge combination” scenario as well as the “round-trip training” scenario, that the former is based on bridge language technique while the latter one is based on retraining approach, for dealing with minimal parallel-resource SMT systems. Our main aim for putting forward the direct-bridge combination scenario is that we might bring it closer to state-of-the-art performance. This scenario has been proposed to maximize the information gain by choosing the appropriate portions of the bridge-based translation system that do not interfere with the direct translation system which is trusted more. Furthermore, the round-trip training scenario has been proposed to take advantage of the readily available generated bilingual sentence pairs to build high-quality SMT system in an iterative behavior; by selecting high- quality subset of generated sentence pairs in target side, preparing their suitable correspond source sentences, and using them together with the original sentence pairs to retrain the SMT system. The proposed methods are intrinsically evaluated, and their comparison is made against the baseline translation systems. We have also conducted the experiments in the aforementioned proposed scenarios with minimal initial bilingual data. We have demonstrated improvement made in the performance through the use of proposed methods while building high-quality SMT systems over the baseline involving each scenario.
Davis, Paul C. "Stone Soup Translation: The Linked Automata Model." Connect to this title online, 2002. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1023806593.
Full textTitle from first page of PDF file. Document formatted into pages; contains xvi, 306 p.; includes graphics. Includes abstract and vita. Advisor: Chris Brew, Dept. of Linguistics. Includes indexes. Includes bibliographical references (p. 284-293).
Martínez, Garcia Eva. "Document-level machine translation : ensuring translational consistency of non-local phenomena." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/668473.
Full textEn esta tesis se estudia la traducción automática de documentos teniendo en cuenta fenómenos que ocurren entre oraciones. Típicamente, esta información a nivel de documento se ignora por la mayoría de los sistemas de Traducción Automática (MT), que se centran en traducir los textos procesando cada una de las frases que los componen de manera aislada. Traducir cada frase sin mirar al contexto que la rodea puede llevar a generar cierto tipo de errores de traducción, como pueden ser traducciones inconsistentes para la misma palabra o para elementos que aparecen en la misma cadena de correferencia. En este trabajo se presentan métodos para prestar atención a fenómenos a nivel de documento con el objetivo de evitar este tipo de errores y así llegar a generar traducciones que transmitan correctamente el significado original del texto. Nuestra investigación empieza por identificar los errores de traducción relacionados con los fenómenos a nivel de documento que aparecen de manera común en la salida de los sistemas Estadísticos del Traducción Automática (SMT). Para dos de estos errores, la traducción inconsistente de palabras, así como los desacuerdos en género y número entre palabras, diseñamos técnicas simples pero efectivas como post-procesos para tratarlos y corregirlos. Como estas técnicas se aplican a posteriori, pueden acceder a los documentos enteros tanto del origen como la traducción generada, y así son capaces de hacer un análisis global y mejorar la coherencia y la consistencia de la traducción. Sin embargo, como seguir una estrategia de traducción en dos pasos no es óptima en términos de eficiencia, también nos centramos en introducir la conciencia del contexto durante el propio proceso de generación de la traducción. Para esto, extendemos un sistema SMT orientado a documentos incluyendo información semántica distribucional en forma de word embeddings bilingües y monolingües. En particular, estos embeddings se usan como un Modelo de Lenguaje de Espacio Semántico (SSLM) y como una nueva función característica del sistema. La meta del primero es promover traducciones de palabras que sean semánticamente cercanas a su contexto precedente, mientras que la segunda quiere promover la selección léxica que es más cercana a su contexto para aquellas palabras que tienen diferentes traducciones a lo largo de un documento. En ambos casos, el contexto que se tiene en cuenta va más allá de los límites de una frase u oración. Recientemente, la comunidad MT ha hecho una transición hacia el paradigma neuronal. El paso final de nuestra investigación propone una extensión del proceso de decodificación de un sistema de Traducción Automática Neuronal (NMT), independiente de la arquitectura del modelo de traducción, aplicando la técnica de Shallow Fusion para combinar la información del modelo de traducción neuronal y la información semántica del contexto encerrada en los modelos SSLM estudiados previamente. La motivación de esta modificación está en introducir los beneficios de la información del contexto también en el proceso de decodificación de los sistemas NMT, así como también obtener una validación adicional para las técnicas que se han ido explorando a lo largo de esta tesis. La evaluación automática de nuestras propuestas no refleja variaciones significativas. Esto es un comportamiento esperado ya que la mayoría de las métricas automáticas no se diseñan para ser sensibles al contexto o a la semántica, y además los fenómenos que tratamos son escasos, llevando a pocas modificaciones con respecto a las traducciones de partida. Por otro lado, las evaluaciones manuales demuestran el impacto positivo de nuestras propuestas ya que los evaluadores humanos tienen a preferir las traducciones generadas por nuestros sistemas a nivel de documento. Entonces, los cambios introducidos por nuestros sistemas extendidos son importantes porque están relacionados con la forma en que los humanos perciben la calidad de la traducción de textos largos.
Кириченко, Олена Анатоліївна, Елена Анатольевна Кириченко, Olena Anatoliivna Kyrychenko, and Y. V. Kalashnyk. "Machine translation." Thesis, Видавництво СумДУ, 2011. http://essuir.sumdu.edu.ua/handle/123456789/12977.
Full textQuernheim, Daniel. "Bimorphism Machine Translation." Doctoral thesis, Universitätsbibliothek Leipzig, 2017. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-223667.
Full textCaglayan, Ozan. "Multimodal Machine Translation." Thesis, Le Mans, 2019. http://www.theses.fr/2019LEMA1016/document.
Full textMachine translation aims at automatically translating documents from one language to another without human intervention. With the advent of deep neural networks (DNN), neural approaches to machine translation started to dominate the field, reaching state-ofthe-art performance in many languages. Neural machine translation (NMT) also revived the interest in interlingual machine translation due to how it naturally fits the task into an encoder-decoder framework which produces a translation by decoding a latent source representation. Combined with the architectural flexibility of DNNs, this framework paved the way for further research in multimodality with the objective of augmenting the latent representations with other modalities such as vision or speech, for example. This thesis focuses on a multimodal machine translation (MMT) framework that integrates a secondary visual modality to achieve better and visually grounded language understanding. I specifically worked with a dataset containing images and their translated descriptions, where visual context can be useful forword sense disambiguation, missing word imputation, or gender marking when translating from a language with gender-neutral nouns to one with grammatical gender system as is the case with English to French. I propose two main approaches to integrate the visual modality: (i) a multimodal attention mechanism that learns to take into account both sentence and convolutional visual representations, (ii) a method that uses global visual feature vectors to prime the sentence encoders and the decoders. Through automatic and human evaluation conducted on multiple language pairs, the proposed approaches were demonstrated to be beneficial. Finally, I further show that by systematically removing certain linguistic information from the input sentences, the true strength of both methods emerges as they successfully impute missing nouns, colors and can even translate when parts of the source sentences are completely removed
Wang, Long Qi. "Translation accuracy comparison between machine translation and context-free machine natural language grammar–based translation." Thesis, University of Macau, 2018. http://umaclib3.umac.mo/record=b3950657.
Full textSim, Smith Karin M. "Coherence in machine translation." Thesis, University of Sheffield, 2018. http://etheses.whiterose.ac.uk/20083/.
Full textSato, Satoshi. "Example-Based Machine Translation." Kyoto University, 1992. http://hdl.handle.net/2433/154652.
Full textKyoto University (京都大学)
0048
新制・論文博士
博士(工学)
乙第7735号
論工博第2539号
新制||工||860(附属図書館)
UT51-92-B162
(主査)教授 長尾 真, 教授 堂下 修司, 教授 池田 克夫
学位規則第4条第2項該当
García, Martínez Mercedes. "Factored neural machine translation." Thesis, Le Mans, 2018. http://www.theses.fr/2018LEMA1002/document.
Full textCommunication between humans across the lands is difficult due to the diversity of languages. Machine translation is a quick and cheap way to make translation accessible to everyone. Recently, Neural Machine Translation (NMT) has achievedimpressive results. This thesis is focus on the Factored Neural Machine Translation (FNMT) approach which is founded on the idea of using the morphological and grammatical decomposition of the words (lemmas and linguistic factors) in the target language. This architecture addresses two well-known challenges occurring in NMT. Firstly, the limitation on the target vocabulary size which is a consequence of the computationally expensive softmax function at the output layer of the network, leading to a high rate of unknown words. Secondly, data sparsity which is arising when we face a specific domain or a morphologically rich language. With FNMT, all the inflections of the words are supported and larger vocabulary is modelled with similar computational cost. Moreover, new words not included in the training dataset can be generated. In this work, I developed different FNMT architectures using various dependencies between lemmas and factors. In addition, I enhanced the source language side also with factors. The FNMT model is evaluated on various languages including morphologically rich ones. State of the art models, some using Byte Pair Encoding (BPE) are compared to the FNMT model using small and big training datasets. We found out that factored models are more robust in low resource conditions. FNMT has been combined with BPE units performing better than pure FNMT model when trained with big data. We experimented with different domains obtaining improvements with the FNMT models. Furthermore, the morphology of the translations is measured using a special test suite showing the importance of explicitly modeling the target morphology. Our work shows the benefits of applying linguistic factors in NMT
Fernández, Parra Maria Asunción. "Formulaic expressions in computer-assisted translation : a specialised translation approach." Thesis, Swansea University, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.579586.
Full textChen, Yuan Yuan. "A critical review of current E-to-C machine translation of academic abstracts." Thesis, University of Macau, 2012. http://umaclib3.umac.mo/record=b2586616.
Full textLiu, Yan. "Translation hypotheses re-ranking for statistical machine translation." Thesis, University of Macau, 2017. http://umaclib3.umac.mo/record=b3691283.
Full textDi, Gangi Mattia Antonino. "Neural Speech Translation: From Neural Machine Translation to Direct Speech Translation." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/259137.
Full textDi, Gangi Mattia Antonino. "Neural Speech Translation: From Neural Machine Translation to Direct Speech Translation." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/259137.
Full textLaw, Mei In. "Assessing online translation systems using the BLEU score : Google Language Tools & SYSTRANBox." Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2525828.
Full textWatanabe, Taro. "Example-Based Statistical Machine Translation." 京都大学 (Kyoto University), 2004. http://hdl.handle.net/2433/147584.
Full textNaruedomkul, Kanlaya. "Generate and repair machine translation." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0016/NQ54676.pdf.
Full textLevenberg, Abby D. "Stream-based statistical machine translation." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5760.
Full textHardmeier, Christian. "Discourse in Statistical Machine Translation." Doctoral thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-223798.
Full textPirrelli, Vito. "Morphology, analogy and machine translation." Thesis, University of Salford, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238781.
Full textYahyaei, Mohammad Sirvan. "Reordering in statistical machine translation." Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/2517.
Full textDudnyk, Tamara. "Machine translation advantages and disadvantages." Thesis, Київський національний університет технологій та дизайну, 2020. https://er.knutd.edu.ua/handle/123456789/15236.
Full textBeaven, John L. "Lexicalist unification-based machine translation." Thesis, University of Edinburgh, 1992. http://hdl.handle.net/1842/19993.
Full textSabtan, Yasser Muhammad Naguib mahmoud. "Lexical selection for machine translation." Thesis, University of Manchester, 2011. https://www.research.manchester.ac.uk/portal/en/theses/lexical-selection-for-machine-translation(28ea687c-5eaf-4412-992a-16fc88b977c8).html.
Full textLopez, Adam David. "Machine translation by pattern matching." College Park, Md.: University of Maryland, 2008. http://hdl.handle.net/1903/8110.
Full textThesis research directed by: Dept. of Linguistics and Institute for Advanced Computer Studies. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
Fomicheva, Marina. "The Role of human reference translation in machine translation evaluation." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404987.
Full textTanto los métodos manuales como los automáticos para la evaluación de la Traducción Automática (TA) dependen en gran medida de la traducción humana profesional. En la evaluación manual, la traducción humana se utiliza a menudo en lugar del texto original para evitar la necesidad de hablantes bilingües, mientras que la mayoría de las técnicas de evaluación automática miden la similitud entre la TA y una traducción humana (comúnmente llamadas traducción candidato y traducción de referencia), asumiendo que cuanto más cerca están, mayor es la calidad de la TA. A pesar del papel fundamental que juega la traducción de referencia en la evaluación de la calidad de la TA, sus características han sido en gran parte ignoradas. Una propiedad inherente de la traducción profesional es la adaptación del texto original a las expectativas del lector. Como consecuencia, la traducción humana puede ser bastante diferente del texto original, lo cual, como se demostrará a lo largo de este trabajo, tiene un fuerte impacto en los resultados de la evaluación de la TA. El primer objetivo de nuestra investigación fue evaluar los efectos del uso de la traducción humana como punto de referencia para la evaluación de la TA. Para lograr este objetivo, comenzamos con una discusión teórica sobre la relación entre textos originales y traducidos. Se identificó la presencia de cambios de traducción opcionales como una de las características fundamentales de la traducción humana. Se analizó el impacto de estos cambios en la evaluación automática y manual de la TA demostrándose en ambos casos que la evaluación está fuertemente sesgada por la referencia proporcionada. El segundo objetivo de nuestro trabajo fue mejorar la precisión de la evaluación automática medida en términos de correlación con los juicios humanos. Dadas las limitaciones de la evaluación basada en la referencia discutidas en la primera parte del trabajo, en lugar de enfocarnos en la similitud, nos concentramos en el impacto de las diferencias entre la TA y la traducción de referencia buscando criterios que permitiesen distinguir entre variación lingüística aceptable y desviaciones inducidas por los errores de TA. En primer lugar, exploramos el uso del contexto sintáctico local para validar las coincidencias entre palabras candidato y de referencia. En segundo lugar, para compensar la falta de información sobre los segmentos de la TA para los cuales no se encontró ninguna relación con la traducción de referencia, introdujimos características orientadas a la fluidez de la TA en la evaluación basada en la referencia. Implementamos nuestro enfoque como una familia de métricas de evaluación automática que mostraron un rendimiento altamente competitivo en una serie de conocidas campañas de evaluación de la TA.
Mehay, Dennis Nolan. "Bean Soup Translation: Flexible, Linguistically-motivated Syntax for Machine Translation." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1345433807.
Full textMoré, i. López Joaquim. "Machine Translationness: a Concept for Machine Translation Evaluation and Detection." Doctoral thesis, Universitat Oberta de Catalunya, 2015. http://hdl.handle.net/10803/305494.
Full textLa tradautomacidad es el fenómeno lingüístico que hace que las traducciones automáticas suenen a máquina. Esta tesis introduce el concepto de tradautomaticidad como un objeto de investigación y presenta un método de evaluación que consiste en determinar si la traducción es propia de una máquina en vez de determinar su parecido a una traducción humana, como en los métodos de evaluación actuales. El método evalúa la calidad de una traducción con una métrica, la MTS (Machine Translationness Score). Esta métrica es consecuente con la percepción de la tradautomaticidad de la gente corriente. La MTS correlaciona bien con las valoraciones de calidad de evaluadores humanos. Además, nuestra propuesta permite realizar evaluaciones de bajo coste porque no requieren de recursos que son caros de obtener (traducciones de referencia, corpus de entrenamiento, etc.). El criterio de tradautomaticidad tiene aplicaciones que van más allá de la evaluación de traducciones automáticas (detección de plagio, detección de publicaciones no supervisadas en Internet, etc.).
Machine translationness (MTness) is the linguistic phenomena that make machine translations distinguishable from human translations. This thesis introduces MTness as a research object and presents an MT evaluation method based on determining whether the translation is machinelike instead of determining its humanlikeness as in current evaluation approaches. The method rates the MTness of a translation with a metric, the MTS (Machine Translationness Score). The MTS calculation is in accordance with the results of an experimental study on machine translation perception by common people. MTS proved to correlate well with human ratings on translation quality. Besides, our approach allows the performance of cheap evaluations since expensive resources (e.g. reference translations, training corpora) are not needed. Machine translationness ratings can be applied for other uses beyond machine translation evaluation (plagiarism and other forms of cheating, detection of unsupervised MT documents published on the Web, etc.).
Giménez, Linares Jesús Ángel. "Empirical machine translation and its evaluation." Doctoral thesis, Universitat Politècnica de Catalunya, 2008. http://hdl.handle.net/10803/6674.
Full textD'una banda, tractem el problema de l'avaluació automàtica. Hem analitzat les principals deficiències dels mètodes d'avaluació actuals, les quals es deuen, al nostre parer, als principis de qualitat superficials en els que es basen. En comptes de limitar-nos al nivell lèxic, proposem una nova direcció cap a avaluacions més heterogènies. El nostre enfocament es basa en el disseny d'un ric conjunt de mesures automàtiques destinades a capturar un ampli ventall d'aspectes de qualitat a diferents nivells lingüístics (lèxic, sintàctic i semàntic). Aquestes mesures lingüístiques han estat avaluades sobre diferents escenaris. El resultat més notable ha estat la constatació de que les mètriques basades en un coneixement lingüístic més profund (sintàctic i semàntic) produeixen avaluacions a nivell de sistema més fiables que les mètriques que es limiten a la dimensió lèxica, especialment quan els sistemes avaluats pertanyen a paradigmes de traducció diferents. Tanmateix, a nivell de frase, el comportament d'algunes d'aquestes mètriques lingüístiques empitjora lleugerament en comparació al comportament de les mètriques lèxiques. Aquest fet és principalment atribuïble als errors comesos pels processadors lingüístics. A fi i efecte de millorar l'avaluació a nivell de frase, a més de recòrrer a la similitud lèxica en absència d'anàlisi lingüística, hem estudiat la possibiliat de combinar les puntuacions atorgades per mètriques a diferents nivells lingüístics en una sola mesura de qualitat. S'han presentat dues estratègies no paramètriques de combinació de mètriques, essent el seu principal avantatge no haver d'ajustar la contribució relativa de cadascuna de les mètriques a la puntuació global. A més, el nostre treball mostra com fer servir el conjunt de mètriques heterogènies per tal d'obtenir detallats informes d'anàlisi d'errors automàticament.
D'altra banda, hem estudiat el problema de la selecció lèxica en Traducció Automàtica Estadística. Amb aquesta finalitat, hem construit un sistema de Traducció Automàtica Estadística Castellà-Anglès basat en -phrases', i hem iterat en el seu cicle de desenvolupament, analitzant diferents maneres de millorar la seva qualitat mitjançant la incorporació de coneixement lingüístic. En primer lloc, hem extès el sistema a partir de la combinació de models de traducció basats en anàlisi sintàctica superficial, obtenint una millora significativa. En segon lloc, hem aplicat models de traducció discriminatius basats en tècniques d'Aprenentatge Automàtic. Aquests models permeten una millor representació del contexte de traducció en el que les -phrases' ocorren, efectivament conduint a una millor selecció lèxica. No obstant, a partir d'avaluacions automàtiques heterogènies i avaluacions manuals, hem observat que les millores en selecció lèxica no comporten necessàriament una millor estructura sintàctica o semàntica. Així doncs, la incorporació d'aquest tipus de prediccions en el marc estadístic requereix, per tant, un estudi més profund.
Com a qüestió complementària, hem estudiat una de les principals crítiques en contra dels sistemes de traducció basats en mètodes empírics, la seva forta dependència del domini, i com els seus efectes negatius poden ésser mitigats combinant adequadament fonts de coneixement externes. En aquest sentit, hem adaptat amb èxit un sistema de traducció estadística Anglès-Castellà entrenat en el domini polític, al domini de definicions de diccionari.
Les dues parts d'aquesta tesi estan íntimament relacionades, donat que el desenvolupament d'un sistema real de Traducció Automàtica ens ha permès viure en primer terme l'important paper dels mètodes d'avaluació en el cicle de desenvolupament dels sistemes de Traducció Automàtica.
In this thesis we have exploited current Natural Language Processing technology for Empirical Machine Translation and its Evaluation.
On the one side, we have studied the problem of automatic MT evaluation. We have analyzed the main deficiencies of current evaluation methods, which arise, in our opinion, from the shallow quality principles upon which they are based. Instead of relying on the lexical dimension alone, we suggest a novel path towards heterogeneous evaluations. Our approach is based on the design of a rich set of automatic metrics devoted to capture a wide variety of translation quality aspects at different linguistic levels (lexical, syntactic and semantic). Linguistic metrics have been evaluated over different scenarios. The most notable finding is that metrics based on deeper linguistic information (syntactic/semantic) are able to produce more reliable system rankings than metrics which limit their scope to the lexical dimension, specially when the systems under evaluation are different in nature. However, at the sentence level, some of these metrics suffer a significant decrease, which is mainly attributable to parsing errors. In order to improve sentence-level evaluation, apart from backing off to lexical similarity in the absence of parsing, we have also studied the possibility of combining the scores conferred by metrics at different linguistic levels into a single measure of quality. Two valid non-parametric strategies for metric combination have been presented. These offer the important advantage of not having to adjust the relative contribution of each metric to the overall score. As a complementary issue, we show how to use the heterogeneous set of metrics to obtain automatic and detailed linguistic error analysis reports.
On the other side, we have studied the problem of lexical selection in Statistical Machine Translation. For that purpose, we have constructed a Spanish-to-English baseline phrase-based Statistical Machine Translation system and iterated across its development cycle, analyzing how to ameliorate its performance through the incorporation of linguistic knowledge. First, we have extended the system by combining shallow-syntactic translation models based on linguistic data views. A significant improvement is reported. This system is further enhanced using dedicated discriminative phrase translation models. These models allow for a better representation of the translation context in which phrases occur, effectively yielding an improved lexical choice. However, based on the proposed heterogeneous evaluation methods and manual evaluations conducted, we have found that improvements in lexical selection do not necessarily imply an improved overall syntactic or semantic structure. The incorporation of dedicated predictions into the statistical framework requires, therefore, further study.
As a side question, we have studied one of the main criticisms against empirical MT systems, i.e., their strong domain dependence, and how its negative effects may be mitigated by properly combining outer knowledge sources when porting a system into a new domain. We have successfully ported an English-to-Spanish phrase-based Statistical Machine Translation system trained on the political domain to the domain of dictionary definitions.
The two parts of this thesis are tightly connected, since the hands-on development of an actual MT system has allowed us to experience in first person the role of the evaluation methodology in the development cycle of MT systems.
Kauchak, David. "Contributions to research on machine translation." Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2006. http://wwwlib.umi.com/cr/ucsd/fullcit?p3237012.
Full textTitle from first page of PDF file (viewed December 8, 2006). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 87-92).
Shah, Kashif. "Model adaptation techniques in machine translation." Phd thesis, Université du Maine, 2012. http://tel.archives-ouvertes.fr/tel-00718226.
Full textNakazawa, Toshiaki. "Fully Syntactic Example-based Machine Translation." 京都大学 (Kyoto University), 2010. http://hdl.handle.net/2433/120373.
Full textYamashita, Naomi. "Supporting machine translation mediated collaborative work." 京都大学 (Kyoto University), 2006. http://hdl.handle.net/2433/135939.
Full textPayvar, Bamdad. "Machine Translation, universal languages and Descartes." Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3643.
Full textbamdadpayvar@msn.com
Song, Xingyi. "Training machine translation for human acceptability." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/14284/.
Full textUeffing, Nicola. "Word confidence measures for machine translation." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97967669X.
Full textBirch, Alexandra. "Reordering metrics for statistical machine translation." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5024.
Full textBabych, Bogdan. "Information extraction technology in machine translation." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.416402.
Full textTrujillo, Indalecio Arturo. "Lexicalist machine translation of spatial prepositions." Thesis, University of Cambridge, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.388507.
Full textBérard, Alexandre. "Neural machine translation architectures and applications." Thesis, Lille 1, 2018. http://www.theses.fr/2018LIL1I022/document.
Full textThis thesis is centered on two main objectives: adaptation of Neural Machine Translation techniques to new tasks and research replication. Our efforts towards research replication have led to the production of two resources: MultiVec, a framework that facilitates the use of several techniques related to word embeddings (Word2vec, Bivec and Paragraph Vector); and a framework for Neural Machine Translation that implements several architectures and can be used for regular MT, Automatic Post-Editing, and Speech Recognition or Translation. These two resources are publicly available and now extensively used by the research community. We extend our NMT framework to work on three related tasks: Machine Translation (MT), Automatic Speech Translation (AST) and Automatic Post-Editing (APE). For the machine translation task, we replicate pioneer neural-based work, and do a case study on TED talks where we advance the state-of-the-art. Automatic speech translation consists in translating speech from one language to text in another language. In this thesis, we focus on the unexplored problem of end-to-end speech translation, which does not use an intermediate source-language text transcription. We propose the first model for end-to-end AST and apply it on two benchmarks: translation of audiobooks and of basic travel expressions. Our final task is automatic post-editing, which consists in automatically correcting the outputs of an MT system in a black-box scenario, by training on data that was produced by human post-editors. We replicate and extend published results on the WMT 2016 and 2017 tasks, and propose new neural architectures for low-resource automatic post-editing
Logacheva, Varvara. "Human feedback in Statistical Machine Translation." Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/18534/.
Full textTapkanova, Elmira. "Machine Translation and Text Simplification Evaluation." Scholarship @ Claremont, 2016. http://scholarship.claremont.edu/scripps_theses/790.
Full text