Tesi: "Bio-statistique"

1

Douib, Ameur. "Algorithmes bio-inspirés pour la traduction automatique statistique". Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0005/document.

Testo completo

Abstract (sommario):

Différentes composantes des systèmes de traduction automatique statistique sont considérées comme des problèmes d'optimisations. En effet, l'apprentissage du modèle de traduction, le décodage et l'optimisation des poids de la fonction log-linéaire sont trois importants problèmes d'optimisation. Savoir définir les bons algorithmes pour les résoudre est l'une des tâches les plus importantes afin de mettre en place un système de traduction performant. Plusieurs algorithmes d'optimisation sont proposés pour traiter les problèmes d'optimisation du décodeur. Ils sont combinés pour résoudre, d'une part, le problème de décodage qui produit une traduction dans la langue cible d'une phrase source, d'autre part, le problème d'optimisation des poids des scores combinés dans la fonction log-linéaire pour d'évaluation des hypothèses de traduction au cours du décodage. Le système de traduction statistique de référence est basé sur un algorithme de recherche en faisceau pour le décodage, et un algorithme de recherche linéaire pour l'optimisation des poids associés aux scores. Nous proposons un nouveau système de traduction avec un décodeur entièrement basé sur les algorithmes génétiques. Les algorithmes génétiques sont des algorithmes d'optimisation bio-inspirés qui simulent le processus de l'évolution naturelle des espèces. Ils permettent de manipuler un ensemble de solutions à travers plusieurs itérations pour converger vers des solutions optimales. Ce travail, nous permet d'étudier l'efficacité des algorithmes génétiques pour la traduction automatique statistique. L'originalité de notre proposition est de proposer deux algorithmes : un algorithme génétique, appelé GAMaT, comme décodeur pour un système de traduction statistique à base de segments, et un algorithme génétique, appelé GAWO, pour l'optimisation des poids de la fonction log-linéaire afin de l'utiliser comme fonction fitness pour GAMaT. Nous proposons également, une approche neuronale pour définir une nouvelle fonction fitness pour GAMaT. Cette approche consiste à utiliser un réseau de neurones pour l'apprentissage d'une fonction qui combine plusieurs scores, évaluant différents aspects d'une hypothèse de traduction, combinés auparavant dans la fonction log-linéaire, et qui prédit le score BLEU de cette hypothèse de traduction. Ce travail, nous a permis de proposer un nouveau système de traduction automatique statistique ayant un décodeur entièrement basé sur des algorithmes génétiques
Different components of statistical machine translation systems are considered as optimization problems. Indeed, the learning of the translation model, the decoding and the optimization of the weights of the log-linear function are three important optimization problems. Knowing how to define the right algorithms to solve them is one of the most important tasks in order to build an efficient translation system. Several optimization algorithms are proposed to deal with decoder optimization problems. They are combined to solve, on the one hand, the decoding problem that produces a translation in the target language for each source sentence, on the other hand, to solve the problem of optimizing the weights of the combined scores in the log-linear function to fix the translation evaluation function during the decoding. The reference system in statistical translation is based on a beam-search algorithm for the decoding, and a line search algorithm for optimizing the weights associated to the scores. We propose a new statistical translation system with a decoder entirely based on genetic algorithms. Genetic algorithms are bio-inspired optimization algorithms that simulate the natural process of evolution of species. They allow to handle a set of solutions through several iterations to converge towards optimal solutions. This work allows us to study the efficiency of the genetic algorithms for machine translation. The originality of our work is the proposition of two algorithms: a genetic algorithm, called GAMaT, as a decoder for a phrase-based machine translation system, and a second genetic algorithm, called GAWO, for optimizing the weights of the log-linear function in order to use it as a fitness function for GAMaT. We propose also, a neuronal approach to define a new fitness function for GAMaT. This approach consists in using a neural network to learn a function that combines several scores, which evaluate different aspects of a translation hypothesis, previously combined in the log-linear function, and that predicts the BLEU score of this translation hypothesis. This work allowed us to propose a new machine translation system with a decoder entirely based on genetic algorithms