Дисертації з теми "Réseau neuronal récurrent profond"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-24 дисертацій для дослідження на тему "Réseau neuronal récurrent profond".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Cîrstea, Bogdan-Ionut. "Contribution à la reconnaissance de l'écriture manuscrite en utilisant des réseaux de neurones profonds et le calcul quantique." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0059.
Повний текст джерелаIn this thesis, we provide several contributions from the fields of deep learning and quantum computation to handwriting recognition. We begin by integrating some of the more recent deep learning techniques (such as dropout, batch normalization and different activation functions) into convolutional neural networks and show improved performance on the well-known MNIST dataset. We then propose Tied Spatial Transformer Networks (TSTNs), a variant of Spatial Transformer Networks (STNs) with shared weights, as well as different training variants of the TSTN. We show improved performance on a distorted variant of the MNIST dataset. In another work, we compare the performance of Associative Long Short-Term Memory (ALSTM), a recently introduced recurrent neural network (RNN) architecture, against Long Short-Term Memory (LSTM), on the Arabic handwriting recognition IFN-ENIT dataset. Finally, we propose a neural network architecture, which we name a hybrid classical-quantum neural network, which can integrate and take advantage of quantum computing. While our simulations are performed using classical computation (on a GPU), our results on the Fashion-MNIST dataset suggest that exponential improvements in computational requirements might be achievable, especially for recurrent neural networks trained for sequence classification
Dahmani, Sara. "Synthèse audiovisuelle de la parole expressive : modélisation des émotions par apprentissage profond." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0137.
Повний текст джерела: The work of this thesis concerns the modeling of emotions for expressive audiovisual textto-speech synthesis. Today, the results of text-to-speech synthesis systems are of good quality, however audiovisual synthesis remains an open issue and expressive synthesis is even less studied. As part of this thesis, we present an emotions modeling method which is malleable and flexible, and allows us to mix emotions as we mix shades on a palette of colors. In the first part, we present and study two expressive corpora that we have built. The recording strategy and the expressive content of these corpora are analyzed to validate their use for the purpose of audiovisual speech synthesis. In the second part, we present two neural architectures for speech synthesis. We used these two architectures to model three aspects of speech : 1) the duration of sounds, 2) the acoustic modality and 3) the visual modality. First, we use a fully connected architecture. This architecture allowed us to study the behavior of neural networks when dealing with different contextual and linguistic descriptors. We were also able to analyze, with objective measures, the network’s ability to model emotions. The second neural architecture proposed is a variational auto-encoder. This architecture is able to learn a latent representation of emotions without using emotion labels. After analyzing the latent space of emotions, we presented a procedure for structuring it in order to move from a discrete representation of emotions to a continuous one. We were able to validate, through perceptual experiments, the ability of our system to generate emotions, nuances of emotions and mixtures of emotions, and this for expressive audiovisual text-to-speech synthesis
Biasutto-Lervat, Théo. "Modélisation de la coarticulation multimodale : vers l'animation d'une tête parlante intelligible." Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0019.
Повний текст джерелаThis thesis deals with neural network based coarticulation modeling, and aims to synchronize facial animation of a 3D talking head with speech. Predicting articulatory movements is not a trivial task, as it is well known that production of a phoneme is greatly affected by its phonetic context, a phoneme called coarticulation. We propose in this work a coarticulation model, i.e. a model able to predict spatial trajectories of articulators from speech. We rely on a sequential model, the recurrent neural networks, and more specifically the Gated Recurrent Units, which are able to consider the articulation dynamic as a central component of its modeling. Unfortunately, the typical amount of data in articulatory and audiovisual databases seems to be quite low for a deep learning approach. To overcome this difficulty, we propose to integrate articulatory knowledge into the networks during its initialization. The RNNs robustness allow uw to apply our coarticulation model to predict both face and tongue movements, in french and german for the face, and in english and german for the tongue. Evaluation has been conducted through objective measures of the trajectories, and through experiments to ensure a complete reach of critical articulatory targets. We also conducted a subjective evaluation to attest the perceptual quality of the predicted articulation once applied to our facial animation system. Finally, we analyzed the model after training to explore phonetic knowledges learned
Haykal, Vanessa. "Modélisation des séries temporelles par apprentissage profond." Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4019.
Повний текст джерелаTime series prediction is a problem that has been addressed for many years. In this thesis, we have been interested in methods resulting from deep learning. It is well known that if the relationships between the data are temporal, it is difficult to analyze and predict accurately due to non-linear trends and the existence of noise specifically in the financial and electrical series. From this context, we propose a new hybrid noise reduction architecture that models the recursive error series to improve predictions. The learning process fusessimultaneouslyaconvolutionalneuralnetwork(CNN)andarecurrentlongshort-term memory network (LSTM). This model is distinguished by its ability to capture globally a variety of hybrid properties, where it is able to extract local signal features, to learn long-term and non-linear dependencies, and to have a high noise resistance. The second contribution concerns the limitations of the global approaches because of the dynamic switching regimes in the signal. We present a local unsupervised modification with our previous architecture in order to adjust the results by adapting the Hidden Markov Model (HMM). Finally, we were also interested in multi-resolution techniques to improve the performance of the convolutional layers, notably by using the variational mode decomposition method (VMD)
Etienne, Caroline. "Apprentissage profond appliqué à la reconnaissance des émotions dans la voix." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS517.
Повний текст джерелаThis thesis deals with the application of artificial intelligence to the automatic classification of audio sequences according to the emotional state of the customer during a commercial phone call. The goal is to improve on existing data preprocessing and machine learning models, and to suggest a model that is as efficient as possible on the reference IEMOCAP audio dataset. We draw from previous work on deep neural networks for automatic speech recognition, and extend it to the speech emotion recognition task. We are therefore interested in End-to-End neural architectures to perform the classification task including an autonomous extraction of acoustic features from the audio signal. Traditionally, the audio signal is preprocessed using paralinguistic features, as part of an expert approach. We choose a naive approach for data preprocessing that does not rely on specialized paralinguistic knowledge, and compare it with the expert approach. In this approach, the raw audio signal is transformed into a time-frequency spectrogram by using a short-term Fourier transform. In order to apply a neural network to a prediction task, a number of aspects need to be considered. On the one hand, the best possible hyperparameters must be identified. On the other hand, biases present in the database should be minimized (non-discrimination), for example by adding data and taking into account the characteristics of the chosen dataset. We study these aspects in order to develop an End-to-End neural architecture that combines convolutional layers specialized in the modeling of visual information with recurrent layers specialized in the modeling of temporal information. We propose a deep supervised learning model, competitive with the current state-of-the-art when trained on the IEMOCAP dataset, justifying its use for the rest of the experiments. This classification model consists of a four-layer convolutional neural networks and a bidirectional long short-term memory recurrent neural network (BLSTM). Our model is evaluated on two English audio databases proposed by the scientific community: IEMOCAP and MSP-IMPROV. A first contribution is to show that, with a deep neural network, we obtain high performances on IEMOCAP, and that the results are promising on MSP-IMPROV. Another contribution of this thesis is a comparative study of the output values of the layers of the convolutional module and the recurrent module according to the data preprocessing method used: spectrograms (naive approach) or paralinguistic indices (expert approach). We analyze the data according to their emotion class using the Euclidean distance, a deterministic proximity measure. We try to understand the characteristics of the emotional information extracted autonomously by the network. The idea is to contribute to research focused on the understanding of deep neural networks used in speech emotion recognition and to bring more transparency and explainability to these systems, whose decision-making mechanism is still largely misunderstood
Szilas, Nicolas. "Apprentissage dans les réseaux récurrents pour la modélisation mécanique et étude de leurs interactions avec l'environnement." Phd thesis, Grenoble INPG, 1995. http://tel.archives-ouvertes.fr/tel-00345820.
Повний текст джерелаJavid, Gelareh. "Contribution à l’estimation de charge et à la gestion optimisée d’une batterie Lithium-ion : application au véhicule électrique." Thesis, Mulhouse, 2021. https://www.learning-center.uha.fr/.
Повний текст джерелаThe State Of Charge (SOC) estimation is a significant issue for safe performance and the lifespan of Lithium-ion (Li-ion) batteries, which is used to power the Electric Vehicles (EVs). In this thesis, the accuracy of SOC estimation is investigated using Deep Recurrent Neural Network (DRNN) algorithms. To do this, for a one cell Li-ion battery, three new SOC estimator based on different DRNN algorithms are proposed: a Bidirectional LSTM (BiLSTM) method, Robust Long-Short Term Memory (RoLSTM) algorithm, and a Gated Recurrent Units (GRUs) technique. Using these, one is not dependent on precise battery models and can avoid complicated mathematical methods especially in a battery pack. In addition, these models are able to precisely estimate the SOC at varying temperature. Also, unlike the traditional recursive neural network where content is re-written at each time, these networks can decide on preserving the current memory through the proposed gateways. In such case, it can easily transfer the information over long paths to receive and maintain long-term dependencies. Comparing the results indicates the BiLSTM network has a better performance than the other two. Moreover, the BiLSTM model can work with longer sequences from two direction, the past and the future, without gradient vanishing problem. This feature helps to select a sequence length as much as a discharge period in one drive cycle, and to have more accuracy in the estimation. Also, this model well behaved against the incorrect initial value of SOC. Finally, a new BiLSTM method introduced to estimate the SOC of a pack of batteries in an Ev. IPG Carmaker software was used to collect data and test the model in the simulation. The results showed that the suggested algorithm can provide a good SOC estimation without using any filter in the Battery Management System (BMS)
Mehr, Éloi. "Unsupervised Learning of 3D Shape Spaces for 3D Modeling." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS566.
Повний текст джерелаEven though 3D data is becoming increasingly more popular, especially with the democratization of virtual and augmented experiences, it remains very difficult to manipulate a 3D shape, even for designers or experts. Given a database containing 3D instances of one or several categories of objects, we want to learn the manifold of plausible shapes in order to develop new intelligent 3D modeling and editing tools. However, this manifold is often much more complex compared to the 2D domain. Indeed, 3D surfaces can be represented using various embeddings, and may also exhibit different alignments and topologies. In this thesis we study the manifold of plausible shapes in the light of the aforementioned challenges, by deepening three different points of view. First of all, we consider the manifold as a quotient space, in order to learn the shapes’ intrinsic geometry from a dataset where the 3D models are not co-aligned. Then, we assume that the manifold is disconnected, which leads to a new deep learning model that is able to automatically cluster and learn the shapes according to their typology. Finally, we study the conversion of an unstructured 3D input to an exact geometry, represented as a structured tree of continuous solid primitives
Baylon, Fuentes Antonio. "Ring topology of an optical phase delayed nonlinear dynamics for neuromorphic photonic computing." Thesis, Besançon, 2016. http://www.theses.fr/2016BESA2047/document.
Повний текст джерелаNowadays most of computers are still based on concepts developed more than 60 years ago by Alan Turing and John von Neumann. However, these digital computers have already begun to reach certain physical limits of their implementation via silicon microelectronics technology (dissipation, speed, integration limits, energy consumption). Alternative approaches, more powerful, more efficient and with less consume of energy, have constituted a major scientific issue for several years. Many of these approaches naturally attempt to get inspiration for the human brain, whose operating principles are still far from being understood. In this line of research, a surprising variation of recurrent neural network (RNN), simpler, and also even sometimes more efficient for features or processing cases, has appeared in the early 2000s, now known as Reservoir Computing (RC), which is currently emerging new brain-inspired computational paradigm. Its structure is quite similar to the classical RNN computing concepts, exhibiting generally three parts: an input layer to inject the information into a nonlinear dynamical system (Write-In), a second layer where the input information is projected in a space of high dimension called dynamical reservoir and an output layer from which the processed information is extracted through a so-called Read-Out function. In RC approach the learning procedure is performed in the output layer only, while the input and reservoir layer are randomly fixed, being the main originality of RC compared to the RNN methods. This feature allows to get more efficiency, rapidity and a learning convergence, as well as to provide an experimental implementation solution. This PhD thesis is dedicated to one of the first photonic RC implementation using telecommunication devices. Our experimental implementation is based on a nonlinear delayed dynamical system, which relies on an electro-optic (EO) oscillator with a differential phase modulation. This EO oscillator was extensively studied in the context of the optical chaos cryptography. Dynamics exhibited by such systems are indeed known to develop complex behaviors in an infinite dimensional phase space, and analogies with space-time dynamics (as neural network ones are a kind of) are also found in the literature. Such peculiarities of delay systems supported the idea of replacing the traditional RNN (usually difficult to design technologically) by a nonlinear EO delay architecture. In order to evaluate the computational power of our RC approach, we implement two spoken digit recognition tests (classification tests) taken from a standard databases in artificial intelligence TI-46 and AURORA-2, obtaining results very close to state-of-the-art performances and establishing state-of-the-art in classification speed. Our photonic RC approach allowed us to process around of 1 million of words per second, improving the information processing speed by a factor ~3
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Повний текст джерелаMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Mealier, Anne-Laure. "Comment le langage impose-t-il la structure du sens : construal et narration." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSE1333.
Повний текст джерелаThis thesis takes place in the context of the European project WYSIWYD (What You Say is What You Did). The goal of this project is to provide transparency in Human-robot interactions, including by mean of language. The deployment of companion and service robots requires that humans and robots can understand each other and communicate. Humans have developed an advanced coding of their behavior that provides the basis of transparency of most of their actions and their communication. Until now, the robots do not share this code of behavior and are not able to explain their own actions to humans. We know that in spoken language, there is a direct mapping between languages and meaning allowing a listener to focus attention on a specific aspect of an event. This is particularly true in language production. Moreover, visual perception allows the extraction of the aspects of "who did what to whom" in the understanding of social events. However, in the context of human interaction, other important aspects cannot be determined only from the visual image. The exchange of an object can be interpreted from the perspective of the giver or taker. This introduces the notion of construal that is how a person interprets the world and perceive a particular situation. The events are related in time, but there are causal and intentional connexion that cannot be seen only from a visual standpoint. An agent performs an action because he knows that this action satisfies the need for another person. This may not be directly visible in the visual scene. The language allows specifying this characteristic: "He gave you the book because you like it." The first point that we demonstrate in this work is how the language can be used to represent these construals. In response, we have developed a system in which a mental model represents an action event. This model is determined by the correspondence between two abstract vectors: the force vector exerted by the action and the result vector corresponding to the effect of the applied force. The application of an attentional process selects one of the two vectors, thus generating the construal of the event. The second point that we consider in this work is how the construction of narrative discourse can be learned with a narrative discourse model. This model is based on both existing neural networks of production and comprehension of sentences that we enrich with additional structures to represent a context of discourse. We present also how this model can be integrated into an overall cognitive system for understanding and generate new constructions of narrative discourse based on similar structure, but different arguments. For each of the works mentioned above, we show how these theoretical models are integrated into the development platform of the iCub humanoid robot. This thesis will explore two main mechanisms to enrich the meaning of events through language. The work is situated between computational neuroscience, with development of neural network models of comprehension and production of narrative discourse, and cognitive linguistics where to understand and explain the meaning according to joint attention is crucial
Matteo, Lionel. "De l’image optique "multi-stéréo" à la topographie très haute résolution et la cartographie automatique des failles par apprentissage profond." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4099.
Повний текст джерелаSeismogenic faults are the source of earthquakes. The study of their properties thus provides information on some of the properties of the large earthquakes they might produce. Faults are 3D features, forming complex networks generally including one master fault and myriads of secondary faults and fractures that intensely dissect the master fault embedding rocks. I aim in my thesis to develop approaches to help studying this intense secondary faulting/fracturing. To identify, map and measure the faults and fractures within dense fault networks, I have handled two challenges:1) Faults generally form steep topographic escarpments at the ground surface that enclose narrow, deep corridors or canyons, where topography, and hence fault traces, are difficult to measure using the available standard methods (such as stereo and tri-stereo of optical satellite images). To address this challenge, I have thus used multi-stéréo acquisitions with different configuration such as different roll and pitch angles, different date of acquisitions and different mode of acquisitions (mono and tri-stéréo). Our dataset amounting 37 Pléiades images in three different tectonic sites within Western USA (Valley of Fire, Nevada; Granite Dells, Arizona; Bishop Tuff, California) allow us to test different configuration of acquisitions to calculate the topography with three different approaches. Using the free open-source software Micmac (IGN ; Rupnik et al., 2017), I have calculated the topography in the form of Digital Surface Models (DSM): (i) with the combination of 2 to 17 Pleiades images, (ii) stacking and merging DSM built from individual stéréo or tri-stéréo acquisitions avoiding the use of multi-dates combinations, (iii) stacking and merging point clouds built from tri-stereo acquisitions following the multiview pipeline developped by Rupnik et al., 2018. We used the recent multiview stereo pipeling CARS (CNES/CMLA) developped by Michel et al., 2020 as a last approach (iv), combnining tri-stereo acquisitions. From the four different approaches, I have thus calculated more than 200 DSM and my results suggest that combining two tri-stéréo acquisitions or one stéréo and one tri-stéréo acquisitions with opposite roll angles leads to the most accurate DSM (with the most complete and precise topography surface).2) Commonly, faults are mapped manually in the field or from optical images and topographic data through the recognition of the specific curvilinear traces they form at the ground surface. However, manual mapping is time-consuming, which limits our capacity to produce complete representations and measurements of the fault networks. To overcome this problem, we have adopted a machine learning approach, namely a U-Net Convolutional Neural Network, to automate the identification and mapping of fractures and faults in optical images and topographic data. Intentionally, we trained the CNN with a moderate amount of manually created fracture and fault maps of low resolution and basic quality, extracted from one type of optical images (standard camera photographs of the ground surface). Based on the results of a number of performance tests, we select the best performing model, MRef, and demonstrate its capacity to predict fractures and faults accurately in image data of various types and resolutions (ground photographs, drone and satellite images and topographic data). The MRef predictions thus enable the statistical analysis of the fault networks. MRef exhibits good generalization capacities, making it a viable tool for fast and accurate extraction of fracture and fault networks from image and topographic data
Kang, Chen. "Image Aesthetic Quality Assessment Based on Deep Neural Networks." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG004.
Повний текст джерелаWith the development of capture devices and the Internet, people access to an increasing amount of images. Assessing visual aesthetics has important applications in several domains, from image retrieval and recommendation to enhancement. Image aesthetic quality assessment aims at determining how beautiful an image looks to human observers. Many problems in this field are not studied well, including the subjectivity of aesthetic quality assessment, explanation of aesthetics and the human-annotated data collection. Conventional image aesthetic quality prediction aims at predicting the average score or aesthetic class of a picture. However, the aesthetic prediction is intrinsically subjective, and images with similar mean aesthetic scores/class might display very different levels of consensus by human raters. Recent work has dealt with aesthetic subjectivity by predicting the distribution of human scores, but predicting the distribution is not directly interpretable in terms of subjectivity, and might be sub-optimal compared to directly estimating subjectivity descriptors computed from ground-truth scores. Furthermore, labels in existing datasets are often noisy, incomplete or they do not allow more sophisticated tasks such as understanding why an image looks beautiful or not to a human observer. In this thesis, we first propose several measures of subjectivity, ranging from simple statistical measures such as the standard deviation of the scores, to newly proposed descriptors inspired by information theory. We evaluate the prediction performance of these measures when they are computed from predicted score distributions and when they are directly learned from ground-truth data. We find that the latter strategy provides in general better results. We also use the subjectivity to improve predicting aesthetic scores, showing that information theory inspired subjectivity measures perform better than statistical measures. Then, we propose an Explainable Visual Aesthetics (EVA) dataset, which contains 4070 images with at least 30 votes per image. EVA has been crowd-sourced using a more disciplined approach inspired by quality assessment best practices. It also offers additional features, such as the degree of difficulty in assessing the aesthetic score, rating for 4 complementary aesthetic attributes, as well as the relative importance of each attribute to form aesthetic opinions. The publicly available dataset is expected to contribute to future research on understanding and predicting visual quality aesthetics. Additionally, we studied the explainability of image aesthetic quality assessment. A statistical analysis on EVA demonstrates that the collected attributes and relative importance can be linearly combined to explain effectively the overall aesthetic mean opinion scores. We found subjectivity has a limited correlation to average personal difficulty in aesthetic assessment, and the subject's region, photographic level and age affect the user's aesthetic assessment significantly
Grégoire, Francis. "Extraction de phrases parallèles à partir d’un corpus comparable avec des réseaux de neurones récurrents bidirectionnels." Thèse, 2017. http://hdl.handle.net/1866/20191.
Повний текст джерелаDutil, Francis. "Prédiction et génération de données structurées à l'aide de réseaux de neurones et de décisions discrètes." Thèse, 2018. http://hdl.handle.net/1866/22124.
Повний текст джерелаChung, Junyoung. "On Deep Multiscale Recurrent Neural Networks." Thèse, 2018. http://hdl.handle.net/1866/21588.
Повний текст джерелаMesnil, Grégoire. "Apprentissage d'espaces sémantiques." Thèse, 2015. http://hdl.handle.net/1866/12338.
Повний текст джерелаLaurent, César. "Advances in parameterisation, optimisation and pruning of neural networks." Thesis, 2020. http://hdl.handle.net/1866/25592.
Повний текст джерелаNeural networks are a family of Machine Learning models able to learn complex tasks directly from the data. Although already producing impressive results in many areas such as speech recognition, computer vision or machine translation, there are still a lot of challenges in both training and deployment of neural networks. In particular, training neural networks typically requires huge amounts of computational resources, and trained models are often too big or too computationally expensive to be deployed on resource-limited devices, such as smartphones or low-power chips. The articles presented in this thesis investigate solutions to these different issues. The first couple of articles focus on improving the training of Recurrent Neural Networks (RNNs), networks specially designed to process sequential data. RNNs are notoriously hard to train, so we propose to improve their parameterisation by upgrading them with Batch Normalisation (BN), a very effective parameterisation which was hitherto used only in feed-forward networks. In the first article, we apply BN to the input-to-hidden connections of the RNNs, thereby reducing internal covariate shift between layers. In the second article, we show how to apply it to both input-to-hidden and hidden-to-hidden connections of the Long Short-Term Memory (LSTM), a popular RNN architecture, thus also reducing internal covariate shift between time steps. Our experiments show that these proposed parameterisations allow for faster and better training of RNNs on several benchmarks. In the third article, we propose a new optimiser to accelerate the training of neural networks. Traditional diagonal optimisers, such as RMSProp, operate in parameters coordinates, which is not optimal when several parameters are updated at the same time. Instead, we propose to apply such optimisers in a basis in which the diagonal approximation is likely to be more effective. We leverage the same approximation used in Kronecker-factored Approximate Curvature (K-FAC) to efficiently build this Kronecker-factored Eigenbasis (KFE). Our experiments show improvements over K-FAC in training speed for several deep network architectures. The last article focuses on network pruning, the action of removing parameters from the network, in order to reduce its memory footprint and computational cost. Typical pruning methods rely on first or second order Taylor approximations of the loss landscape to identify which parameters can be discarded. We propose to study the impact of the assumptions behind such approximations. Moreover, we systematically compare methods based on first and second order approximations with Magnitude Pruning (MP), showing how they perform both before and after a fine-tuning phase. Our experiments show that better preserving the original network function does not necessarily transfer to better performing networks after fine-tuning, suggesting that only considering the impact of pruning on the loss might not be a sufficient objective to design good pruning criteria.
Gulcehre, Caglar. "Learning and time : on using memory and curricula for language understanding." Thèse, 2018. http://hdl.handle.net/1866/21739.
Повний текст джерелаZhang, Saizheng. "Recurrent neural models and related problems in natural language processing." Thèse, 2019. http://hdl.handle.net/1866/22663.
Повний текст джерелаBhardwaj, Shivendra. "Open source quality control tool for translation memory using artificial intelligence." Thesis, 2020. http://hdl.handle.net/1866/24307.
Повний текст джерелаTranslation Memory (TM) plays a decisive role during translation and is the go-to database for most language professionals. However, they are highly prone to noise, and additionally, there is no one specific source. There have been many significant efforts in cleaning the TM, especially for training a better Machine Translation system. In this thesis, we also try to clean the TM but with a broader goal of maintaining its overall quality and making it robust for internal use in institutions. We propose a two-step process, first clean an almost clean TM, i.e. noise removal and then detect texts translated from neural machine translation systems. For the noise removal task, we propose an architecture involving five approaches based on heuristics, feature engineering, and deep-learning and evaluate this task by both manual annotation and Machine Translation (MT). We report a notable gain of +1.08 BLEU score over a state-of-the-art, off-the-shelf TM cleaning system. We also propose a web-based tool “OSTI: An Open-Source Translation-memory Instrument” that automatically annotates the incorrect translations (including misaligned) for the institutions to maintain an error-free TM. Deep neural models tremendously improved MT systems, and these systems are translating an immense amount of text every day. The automatically translated text finds a way to TM, and storing these translation units in TM is not ideal. We propose a detection module under two settings: a monolingual task, in which the classifier only looks at the translation; and a bilingual task, in which the source text is also taken into consideration. We report a mean accuracy of around 85% in-domain and 75% out-of-domain for bilingual and 81% in-domain and 63% out-of-domain from monolingual tasks using deep-learning classifiers.
Serdyuk, Dmitriy. "Advances in deep learning methods for speech recognition and understanding." Thesis, 2020. http://hdl.handle.net/1866/24803.
Повний текст джерелаThis work presents several studies in the areas of speech recognition and understanding. The semantic speech understanding is an important sub-domain of the broader field of artificial intelligence. Speech processing has had interest from the researchers for long time because language is one of the defining characteristics of a human being. With the development of neural networks, the domain has seen rapid progress both in terms of accuracy and human perception. Another important milestone was achieved with the development of end-to-end approaches. Such approaches allow co-adaptation of all the parts of the model thus increasing the performance, as well as simplifying the training procedure. End-to-end models became feasible with the increasing amount of available data, computational resources, and most importantly with many novel architectural developments. Nevertheless, traditional, non end-to-end, approaches are still relevant for speech processing due to challenging data in noisy environments, accented speech, and high variety of dialects. In the first work, we explore the hybrid speech recognition in noisy environments. We propose to treat the recognition in the unseen noise condition as the domain adaptation task. For this, we use the novel at the time technique of the adversarial domain adaptation. In the nutshell, this prior work proposed to train features in such a way that they are discriminative for the primary task, but non-discriminative for the secondary task. This secondary task is constructed to be the domain recognition task. Thus, the features trained are invariant towards the domain at hand. In our work, we adopt this technique and modify it for the task of noisy speech recognition. In the second work, we develop a general method for regularizing the generative recurrent networks. It is known that the recurrent networks frequently have difficulties staying on same track when generating long outputs. While it is possible to use bi-directional networks for better sequence aggregation for feature learning, it is not applicable for the generative case. We developed a way improve the consistency of generating long sequences with recurrent networks. We propose a way to construct a model similar to bi-directional network. The key insight is to use a soft L2 loss between the forward and the backward generative recurrent networks. We provide experimental evaluation on a multitude of tasks and datasets, including speech recognition, image captioning, and language modeling. In the third paper, we investigate the possibility of developing an end-to-end intent recognizer for spoken language understanding. The semantic spoken language understanding is an important step towards developing a human-like artificial intelligence. We have seen that the end-to-end approaches show high performance on the tasks including machine translation and speech recognition. We draw the inspiration from the prior works to develop an end-to-end system for intent recognition.
Goyette, Kyle. "On two sequential problems : the load planning and sequencing problem and the non-normal recurrent neural network." Thesis, 2020. http://hdl.handle.net/1866/24314.
Повний текст джерелаLe travail de cette thèse est divisé en deux parties. La première partie traite du problème de planification et de séquencement des chargements de conteneurs sur des wagons, un problème opérationnel rencontré dans de nombreux terminaux ferroviaires intermodaux. Dans ce problème, les conteneurs doivent être affectés à une plate-forme sur laquelle un ou deux conteneurs seront chargés et l'ordre de chargement doit être déterminé. Ces décisions sont prises dans le but de minimiser les coûts associés à la manutention des conteneurs, ainsi que de minimiser le coût des conteneurs non chargés. La version déterministe du problème peut être formulé comme un problème de plus court chemin sur un graphe ordonné. Ce problème est difficile à résoudre en raison de la grande taille du graphe. Nous proposons une heuristique en deux étapes basée sur l'algorithme Iterative Deepening A* pour calculer des solutions au problème de planification et de séquencement de la charge dans un budget de cinq minutes. Ensuite, nous illustrons également comment un algorithme d'apprentissage Deep Q peut être utilisé pour résoudre heuristiquement le même problème. La deuxième partie de cette thèse examine les modèles séquentiels en apprentissage profond. Une stratégie récente pour contourner le problème de gradient qui explose et disparaît dans les réseaux de neurones récurrents (RNN) consiste à imposer des matrices de poids récurrentes orthogonales ou unitaires. Bien que cela assure une dynamique stable pendant l'entraînement, cela se fait au prix d'une expressivité réduite en raison de la variété limitée des transformations orthogonales. Nous proposons une paramétrisation des RNN, basée sur la décomposition de Schur, qui atténue les problèmes de gradient, tout en permettant des matrices de poids récurrentes non orthogonales dans le modèle.
Sankar, Chinnadhurai. "Neural approaches to dialog modeling." Thesis, 2020. http://hdl.handle.net/1866/24802.
Повний текст джерелаThis thesis by article consists of four articles which contribute to the field of deep learning, specifically in understanding and learning neural approaches to dialog systems. The first article takes a step towards understanding if commonly used neural dialog architectures effectively capture the information present in the conversation history. Through a series of perturbation experiments on popular dialog datasets, wefindthatcommonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most input context perturbations such as missing or reordering utterances, shuffling words, etc. The second article introduces a simple and cost-effective way to collect large scale datasets for modeling task-oriented dialog systems. This approach avoids the requirement of a com-plex argument annotation schema. The initial release of the dataset includes 13,215 task-based dialogs comprising six domains and around 8k unique named entities, almost 8 times more than the popular MultiWOZ dataset. The third article proposes to improve response generation quality in open domain dialog systems by jointly modeling the utterances with the dialog attributes of each utterance. Dialog attributes of an utterance refer to discrete features or aspects associated with an utterance like dialog-acts, sentiment, emotion, speaker identity, speaker personality, etc. The final article introduces an embedding-free method to compute word representations on-the-fly. This approach significantly reduces the memory footprint which facilitates de-ployment in on-device (memory constraints) devices. Apart from being independent of the vocabulary size, we find this approach to be inherently resilient to common misspellings.