Rozprawy doktorskie na temat „L'apprentissage profond”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 24 najlepszych rozpraw doktorskich naukowych na temat „L'apprentissage profond”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Millan, Mégane. "L'apprentissage profond pour l'évaluation et le retour d'information lors de l'apprentissage de gestes". Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUS057.
Pełny tekst źródłaLearning a new sport or manual work is complex. Indeed, many gestures have to be assimilated in order to reach a good level of skill. However, learning these gestures cannot be done alone. Indeed, it is necessary to see the gesture execution with an expert eye in order to indicate corrections for improvement. However, experts, whether in sports or in manual works, are not always available to analyze and evaluate a novice’s gesture. In order to help experts in this task of analysis, it is possible to develop virtual coaches. Depending on the field, the virtual coach will have more or less skills, but an evaluation according to precise criteria is always mandatory. Providing feedback on mistakes is also essential for the learning of a novice. In this thesis, different solutions for developing the most effective virtual coaches are proposed. First of all, and as mentioned above, it is necessary to evaluate the gestures. From this point of view, a first part consisted in understanding the stakes of automatic gesture analysis, in order to develop an automatic evaluation algorithm that is as efficient as possible. Subsequently, two algorithms for automatic quality evaluation are proposed. These two algorithms, based on deep learning, were then tested on two different gestures databases in order to evaluate their genericity. Once the evaluation has been carried out, it is necessary to provide relevant feedback to the learner on his errors. In order to maintain continuity in the work carried out, this feedback is also based on neural networks and deep learning. A method has been developed based on neural network explanability methods. It allows to go back to the moments of the gestures when errors were made according to the evaluation model. Finally, coupled with semantic segmentation, this method makes it possible to indicate to learners which part of the gesture was badly performed, and to provide them with statistics and a learning curve
Martinez, Coralie. "Classification précoce de séquences temporelles par de l'apprentissage par renforcement profond". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT123.
Pełny tekst źródłaEarly classification (EC) of time series is a recent research topic in the field of sequential data analysis. It consists in assigning a label to some data that is sequentially collected with new data points arriving over time, and the prediction of a label has to be made using as few data points as possible in the sequence. The EC problem is of paramount importance for supporting decision makers in many real-world applications, ranging from process control to fraud detection. It is particularly interesting for applications concerned with the costs induced by the acquisition of data points, or for applications which seek for rapid label prediction in order to take early actions. This is for example the case in the field of health, where it is necessary to provide a medical diagnosis as soon as possible from the sequence of medical observations collected over time. Another example is predictive maintenance with the objective to anticipate the breakdown of a machine from its sensor signals. In this doctoral work, we developed a new approach for this problem, based on the formulation of a sequential decision making problem, that is the EC model has to decide between classifying an incomplete sequence or delaying the prediction to collect additional data points. Specifically, we described this problem as a Partially Observable Markov Decision Process noted EC-POMDP. The approach consists in training an EC agent with Deep Reinforcement Learning (DRL) in an environment characterized by the EC-POMDP. The main motivation for this approach was to offer an end-to-end model for EC which is able to simultaneously learn optimal patterns in the sequences for classification and optimal strategic decisions for the time of prediction. Also, the method allows to set the importance of time against accuracy of the classification in the definition of rewards, according to the application and its willingness to make this compromise. In order to solve the EC-POMDP and model the policy of the EC agent, we applied an existing DRL algorithm, the Double Deep-Q-Network algorithm, whose general principle is to update the policy of the agent during training episodes, using a replay memory of past experiences. We showed that the application of the original algorithm to the EC problem lead to imbalanced memory issues which can weaken the training of the agent. Consequently, to cope with those issues and offer a more robust training of the agent, we adapted the algorithm to the EC-POMDP specificities and we introduced strategies of memory management and episode management. In experiments, we showed that these contributions improved the performance of the agent over the original algorithm, and that we were able to train an EC agent which compromised between speed and accuracy, on each sequence individually. We were also able to train EC agents on public datasets for which we have no expertise, showing that the method is applicable to various domains. Finally, we proposed some strategies to interpret the decisions of the agent, validate or reject them. In experiments, we showed how these solutions can help gain insight in the choice of action made by the agent
Lelong, Thibault. "Reconnaissance des documents avec de l'apprentissage profond pour la réalité augmentée". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS017.
Pełny tekst źródłaThis doctoral project focuses on issues related to the identification of images and documents in augmented reality applications using markers, particularly when using cameras. The research is set in a technological context where interaction through augmented reality is essential in several domains, including industry, which require reliable identification methodologies.In an initial phase, the project assesses various identification and image processing methodologies using a database specially designed to reflect the challenges of the industrial context. This research allows an in-depth analysis of existing methodologies, thus revealing their potentials and limitations in various application scenarios.Subsequently, the project proposes a document detection system aimed at enhancing existing solutions, optimized for environments such as web browsers. Then, an innovative image research methodology is introduced, relying on an analysis of the image in sub-parts to increase the accuracy of identification and avoid image confusions. This approach allows for more precise and adaptive identification, particularly with respect to variations in the layout of the target image.Finally, in the context of collaborative work with ARGO company, a real-time image tracking engine was developed, optimized for low-power devices and web environments. This ensures the deployment of augmented reality web applications and their operation on a wide range of devices, including those with limited processing capabilities.It is noteworthy that the works resulting from this doctoral project have been concretely applied and valorized by the Argo company for commercial purposes, thereby confirming the relevance and viability of the developed methodologies and solutions, and attesting to their significant contribution to the technological and industrial field of augmented reality
Moreau, Thomas. "Représentations Convolutives Parcimonieuses -- application aux signaux physiologiques et interpétabilité de l'apprentissage profond". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLN054/document.
Pełny tekst źródłaConvolutional representations extract recurrent patterns which lead to the discovery of local structures in a set of signals. They are well suited to analyze physiological signals which requires interpretable representations in order to understand the relevant information. Moreover, these representations can be linked to deep learning models, as a way to bring interpretability intheir internal representations. In this disserta tion, we describe recent advances on both computational and theoretical aspects of these models.First, we show that the Singular Spectrum Analysis can be used to compute convolutional representations. This representation is dense and we describe an automatized procedure to improve its interpretability. Also, we propose an asynchronous algorithm, called DICOD, based on greedy coordinate descent, to solve convolutional sparse coding for long signals. Our algorithm has super-linear acceleration.In a second part, we focus on the link between representations and neural networks. An extra training step for deep learning, called post-training, is introduced to boost the performances of the trained network by making sure the last layer is optimal. Then, we study the mechanisms which allow to accelerate sparse coding algorithms with neural networks. We show that it is linked to afactorization of the Gram matrix of the dictionary.Finally, we illustrate the relevance of convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize human walk signals and Singular Spectrum Analysis is used to remove the gaze movement in young infant’s oculometric recordings
Phan, Thi Hai Hong. "Reconnaissance d'actions humaines dans des vidéos avec l'apprentissage automatique". Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1038.
Pełny tekst źródłaIn recent years, human action recognition (HAR) has attracted the research attention thanks to its various applications such as intelligent surveillance systems, video indexing, human activities analysis, human-computer interactions and so on. The typical issues that the researchers are envisaging can be listed as the complexity of human motions, the spatial and temporal variations, cluttering, occlusion and change of lighting condition. This thesis focuses on automatic recognizing of the ongoing human actions in a given video. We address this research problem by using both shallow learning and deep learning approaches.First, we began the research work with traditional shallow learning approaches based on hand-scrafted features by introducing a novel feature named Motion of Oriented Magnitudes Patterns (MOMP) descriptor. We then incorporated this discriminative descriptor into simple yet powerful representation techniques such as Bag of Visual Words, Vector of locally aggregated descriptors (VLAD) and Fisher Vector to better represent actions. Also, PCA (Principal Component Analysis) and feature selection (statistical dependency, mutual information) are applied to find out the best subset of features in order to improve the performance and decrease the computational expense. The proposed method obtained the state-of-the-art results on several common benchmarks.Recent deep learning approaches require an intensive computations and large memory usage. They are therefore difficult to be used and deployed on the systems with limited resources. In the second part of this thesis, we present a novel efficient algorithm to compress Convolutional Neural Network models in order to decrease both the computational cost and the run-time memory footprint. We measure the redundancy of parameters based on their relationship using the information theory based criteria, and we then prune the less important ones. The proposed method significantly reduces the model sizes of different networks such as AlexNet, ResNet up to 70% without performance loss on the large-scale image classification task.Traditional approach with the proposed descriptor achieved the great performance for human action recognition but only on small datasets. In order to improve the performance on the large-scale datasets, in the last part of this thesis, we therefore exploit deep learning techniques to classify actions. We introduce the concepts of MOMP Image as an input layer of CNNs as well as incorporate MOMP image into deep neural networks. We then apply our network compression algorithm to accelerate and improve the performance of system. The proposed method reduces the model size, decreases the over-fitting, and thus increases the overall performance of CNN on the large-scale action datasets.Throughout the thesis, we have showed that our algorithms obtain good performance in comparison to the state-of-the-art on challenging action datasets (Weizmann, KTH, UCF Sports, UCF-101 and HMDB51) with low resource required
Poirier, Jasmine. "Segmentation de neurones pour imagerie calcique du poisson zèbre : des méthodes classiques à l'apprentissage profond". Master's thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/36452.
Pełny tekst źródłaThe experimental study of the resilience of a complex network lies on our capacity to reproduceits structural and functional organization. Having chosen the neuronal network of the larvalzebrafish as our animal model for its transparency, we can use techniques such as light-sheet microscopy combined with calcium imaging to image its whole brain more than twice every second, with a cellular spatial resolution. Having both those spatial and temporal resolutions, we have to process and segment a great quantity of data, which can’t be done manually. Wethus have to resort to numerical techniques to segment the neurons and extract their activity. Three segmentation techniques have been compared : adaptive threshold (AT), random deci-sion forests (ML), and a pretrained deep convolutional neural network. While the adaptive threshold technique allow rapid identification and with almost no error of the more active neurons, it generates many more false negatives than the two other methods. On the contrary, the deep convolutional neural network method identify more neurons, but generates more false positives which can be filtered later in the proces. Using the F1 score as our comparison metrics, the neural network (F1= 59,2%) out performs the adaptive threshold (F1= 25,4%) and random decision forests (F1= 48,8%). Even though the performances seem lower compared to results generally shown for deep neural network, we are competitive with the best technique known to this day for neurons segmentation, which is 3dCNN (F1= 65.9%), an algorithm presented in the neurofinder challenge.
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066056/document.
Pełny tekst źródłaThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066056.
Pełny tekst źródłaThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Harbaoui, Nesrine. "Diagnostic adaptatif à l'environnement de navigation : apport de l'apprentissage profond pour une localisation sûre et précise". Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB041.
Pełny tekst źródłaFor an autonomous terrestrial transport system, the ability to determine its position is essential in order to allow other functions, such as control or perception, to be safely controlled or perceived. Thus, the criticality of these functions generates important requirements in terms of safety (integrity), availability, accuracy and precision. For land vehicles, meeting these requirements is related to various parameters such as vehicle dynamics, weather conditions, or the navigation context, which includes both the operational environment and the behavior of the host vehicle or user. All of these circumstances can be an obstacle to the reception of Global Navigation Satellite System (GNSS) signals since the environment determines the type and quality of electromagnetic signals available for positioning.Although many navigation and positioning techniques have been developed, none is capable of providing a reliable and accurate position in all contexts. Therefore, in order to deploy a localization function capable of operating in different contexts, based on low cost sensors, mainly GNSS and Inertial Navigation system (IMU), it is necessary, from the design phase, to develop strategies that solve both the antagonism of certain requirements and the adaptation to changing environment/dynamics. In this context, this thesis proposes a diagnostic layer that adapts by deep learning methods to changes in the context and adjusts the trade-off between functional requirements. This layer is integrated in a fault-tolerant data fusion framework through an informational divergence, the α-Rényi divergence, known by its generalization of other divergences such as the Kullback-Leibler divergence, the Bhattacharyya distance. In order to detect and isolate the divergence faults based on the generation of residuals, we offer the solution of selecting the appropriate residual for each situation by fixing the value of the parameter α using artificial intelligence technologies in order to increase the detectability of the defects. In order to increase the availability of the system while maintaining an acceptable level of operational safety, a context-sensitive threshold that adjusts the trade-off between the probability of false alarm and the probability of missed detection is proposed. To test and validate the proposed approaches, two types of data have been provided; real by the PRETIL platform of the CRIStAL laboratory and simulated by the Stella NGC simulator as a part of the ANR LOCSP project
Bourgeais, Victoria. "Interprétation de l'apprentissage profond pour la prédiction de phénotypes à partir de données d'expression de gènes". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG069.
Pełny tekst źródłaDeep learning has been a significant advance in artificial intelligence in recent years. Its main domains of interest are image analysis and natural language processing. One of the major future challenges of this approach is its application to precision medicine. This new form of medicine will make it possible to personalize each stage of a patient's care pathway according to his or her characteristics, in particular molecular characteristics such as gene expression data that inform about the cellular state of a patient. However, deep learning models are considered black boxes as their predictions are not accompanied by an explanation, limiting their use in clinics. The General Data Protection Regulation (GDPR), adopted recently by the European Union, imposes that the machine learning algorithms must be able to explain their decisions to the users. Thus, there is a real need to make neural networks more interpretable, and this is particularly true in the medical field for several reasons. Understanding why a phenotype has been predicted is necessary to ensure that the prediction is based on reliable representations of the patients rather than on irrelevant artifacts present in the training data. Regardless of the model's effectiveness, this will affect any end user's decisions and confidence in the model. Finally, a neural network performing well for the prediction of a certain phenotype may have identified a signature in the data that could open up new research avenues.In the current state of the art, two general approaches exist for interpreting these black-boxes: creating inherently interpretable models or using a third-party method dedicated to the interpretation of the trained neural network. Whatever approach is chosen, the explanation provided generally consists of identifying the important input variables and neurons for the prediction. However, in the context of phenotype prediction from gene expression, these approaches generally do not provide an understandable explanation, as these data are not directly comprehensible by humans. Therefore, we propose novel and original deep learning methods, interpretable by design. The architecture of these methods is defined from one or several knowledge databases. A neuron represents a biological object, and the connections between neurons correspond to the relations between biological objects. Three methods have been developed, listed below in chronological order.Deep GONet is based on a multilayer perceptron constrained by a biological knowledge database, the Gene Ontology (GO), through an adapted regularization term. The explanations of the predictions are provided by a posteriori interpretation method.GraphGONet takes advantage of both a multilayer perceptron and a graph neural network to deal with the semantic richness of GO knowledge. This model has the capacity to generate explanations automatically.BioHAN is only established on a graph neural network and can easily integrate different knowledge databases and their semantics. Interpretation is facilitated by the use of an attention mechanism, enabling the model to focus on the most informative neurons.These methods have been evaluated on diagnostic tasks using real gene expression datasets and have shown competitiveness with state-of-the-art machine learning methods. Our models provide intelligible explanations composed of the most contributive neurons and their associated biological concepts. This feature allows experts to use our tools in a medical setting
Resmerita, Diana. "Compression pour l'apprentissage en profondeur". Thesis, Université Côte d'Azur, 2022. http://www.theses.fr/2022COAZ4043.
Pełny tekst źródłaAutonomous cars are complex applications that need powerful hardware machines to be able to function properly. Tasks such as staying between the white lines, reading signs, or avoiding obstacles are solved by using convolutional neural networks (CNNs) to classify or detect objects. It is highly important that all the networks work in parallel in order to transmit all the necessary information and take a common decision. Nowadays, as the networks improve, they also have become bigger and more computational expensive. Deploying even one network becomes challenging. Compressing the networks can solve this issue. Therefore, the first objective of this thesis is to find deep compression methods in order to cope with the memory and computational power limitations present on embedded systems. The compression methods need to be adapted to a specific processor, Kalray's MPPA, for short term implementations. Our contributions mainly focus on compressing the network post-training for storage purposes, which means compressing the parameters of the network without retraining or changing the original architecture and the type of the computations. In the context of our work, we decided to focus on quantization. Our first contribution consists in comparing the performances of uniform quantization and non-uniform quantization, in order to identify which of the two has a better rate-distortion trade-off and could be quickly supported in the company. The company's interest is also directed towards finding new innovative methods for future MPPA generations. Therefore, our second contribution focuses on comparing standard floating-point representations (FP32, FP16) to recently proposed alternative arithmetical representations such as BFloat16, msfp8, Posit8. The results of this analysis were in favor for Posit8. This motivated the company Kalray to conceive a decompressor from FP16 to Posit8. Finally, since many compression methods already exist, we decided to move to an adjacent topic which aims to quantify theoretically the effects of quantization error on the network's accuracy. This is the second objective of the thesis. We notice that well-known distortion measures are not adapted to predict accuracy degradation in the case of inference for compressed neural networks. We define a new distortion measure with a closed form which looks like a signal-to-noise ratio. A set of experiments were done using simulated data and small networks, which show the potential of this distortion measure
Mainsant, Marion. "Apprentissage continu sous divers scénarios d'arrivée de données : vers des applications robustes et éthiques de l'apprentissage profond". Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALS045.
Pełny tekst źródłaThe human brain continuously receives information from external stimuli. It then has the ability to adapt to new knowledge while retaining past events. Nowadays, more and more artificial intelligence algorithms aim to learn knowledge in the same way as a human being. They therefore have to be able to adapt to a large variety of data arriving sequentially and available over a limited period of time. However, when a deep learning algorithm learns new data, the knowledge contained in the neural network overlaps old one and the majority of the past information is lost, a phenomenon referred in the literature as catastrophic forgetting. Numerous methods have been proposed to overcome this issue, but as they were focused on providing the best performance, studies have moved away from real-life applications where algorithms need to adapt to changing environments and perform, no matter the type of data arrival. In addition, most of the best state of the art methods are replay methods which retain a small memory of the past and consequently do not preserve data privacy.In this thesis, we propose to explore data arrival scenarios existing in the literature, with the aim of applying them to facial emotion recognition, which is essential for human-robot interactions. To this end, we present Dream Net - Data-Free, a privacy preserving algorithm, able to adapt to a large number of data arrival scenarios without storing any past samples. After demonstrating the robustness of this algorithm compared to existing state-of-the-art methods on standard computer vision databases (Mnist, Cifar-10, Cifar-100 and Imagenet-100), we show that it can also adapt to more complex facial emotion recognition databases. We then propose to embed the algorithm on a Nvidia Jetson nano card creating a demonstrator able to learn and predict emotions in real-time. Finally, we discuss the relevance of our approach for bias mitigation in artificial intelligence, opening up perspectives towards a more ethical AI
Fourure, Damien. "Réseaux de neurones convolutifs pour la segmentation sémantique et l'apprentissage d'invariants de couleur". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSES056/document.
Pełny tekst źródłaComputer vision is an interdisciplinary field that investigates how computers can gain a high level of understanding from digital images or videos. In artificial intelligence, and more precisely in machine learning, the field in which this thesis is positioned,computer vision involves extracting characteristics from images and then generalizing concepts related to these characteristics. This field of research has become very popular in recent years, particularly thanks to the results of the convolutional neural networks that form the basis of so-called deep learning methods. Today, neural networks make it possible, among other things, to recognize different objects present in an image, to generate very realistic images or even to beat the champions at the Go game. Their performance is not limited to the image domain, since they are also used in other fields such as natural language processing (e. g. machine translation) or sound recognition. In this thesis, we study convolutional neural networks in order to develop specialized architectures and loss functions for low-level tasks (color constancy) as well as high-level tasks (semantic segmentation). Color constancy, is the ability of the human visual system to perceive constant colours for a surface despite changes in the spectrum of illumination (lighting change). In computer vision, the main approach consists in estimating the color of the illuminant and then suppressing its impact on the perceived color of objects. We approach the task of color constancy with the use of neural networks by developing a new architecture composed of a subsampling operator inspired by traditional methods. Our experience shows that our method makes it possible to obtain competitive performances with the state of the art. Nevertheless, our architecture requires a large amount of training data. In order to partially correct this problem and improve the training of neural networks, we present several techniques for artificial data augmentation. We are also making two contributions on a high-level issue : semantic segmentation. This task, which consists of assigning a semantic class to each pixel of an image, is a challenge in computer vision because of its complexity. On the one hand, it requires many examples of training that are costly to obtain. On the other hand, it requires the adaptation of traditional convolutional neural networks in order to obtain a so-called dense prediction, i. e., a prediction for each pixel present in the input image. To solve the difficulty of acquiring training data, we propose an approach that uses several databases annotated with different labels at the same time. To do this, we define a selective loss function that has the advantage of allowing the training of a convolutional neural network from data from multiple databases. We also developed self-context approach that captures the correlations between labels in different databases. Finally, we present our third contribution : a new convolutional neural network architecture called GridNet specialized for semantic segmentation. Unlike traditional networks, implemented with a single path from the input (image) to the output (prediction), our architecture is implemented as a 2D grid allowing several interconnected streams to operate at different resolutions. In order to exploit all the paths of the grid, we propose a technique inspired by dropout. In addition, we empirically demonstrate that our architecture generalize many of well-known stateof- the-art networks. We conclude with an analysis of the empirical results obtained with our architecture which, although trained from scratch, reveals very good performances, exceeding popular approaches often pre-trained
Blot, Michaël. "Étude de l'apprentissage et de la généralisation des réseaux profonds en classification d'images". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS412.
Pełny tekst źródłaArtificial intelligence is experiencing a resurgence in recent years. This is due to the growing ability to collect and store a considerable amount of digitized data. These huge databases allow machine learning algorithms to respond to certain tasks through supervised learning. Among the digitized data, images remain predominant in the modern environment. Huge datasets have been created. moreover, the image classification has allowed the development of previously neglected models, deep neural networks or deep learning. This family of algorithms demonstrates a great facility to learn perfectly datasets, even very large. Their ability to generalize remains largely misunderstood, but the networks of convolutions are today the undisputed state of the art. From a research and application point of view of deep learning, the demands will be more and more demanding, requiring to make an effort to bring the performances of the neuron networks to the maximum of their capacities. This is the purpose of our research, whose contributions are presented in this thesis. We first looked at the issue of training and considered accelerating it through distributed methods. We then studied the architectures in order to improve them without increasing their complexity. Finally, we particularly study the regularization of network training. We studied a regularization criterion based on information theory that we deployed in two different ways
Trullo, Ramirez Roger. "Approche basées sur l'apprentissage en profondeur pour la segmentation des organes à risques dans les tomodensitométries thoraciques". Thesis, Normandie, 2018. http://www.theses.fr/2018NORMR063.
Pełny tekst źródłaRadiotherapy is one of the options for treatment currently available for patients affected by cancer, one of the leading cause of deaths worldwide. Before radiotherapy, organs at risk (OAR) located near the target tumor, such as the heart, the lungs, the esophagus, etc. in thoracic cancer, must be outlined, in order to minimize the quantity of irradiation that they receive during treatment. Today, segmentation of the OAR is performed mainly manually by clinicians on Computed Tomography (CT) images, despite some partial software support. It is a tedious task, prone to intra and inter-observer variability. In this work, we present several frameworks using deep learning techniques to automatically segment the heart, trachea, aorta and esophagus. In particular, the esophagus is notably challenging to segment, due to the lack of surrounding contrast and shape variability across different patients. As deep networks and in particular fully convolutional networks offer now state of the art performance for semantic segmentation, we first show how a specific type of architecture based on skip connections can improve the accuracy of the results. As a second contribution, we demonstrate that context information can be of vital importance in the segmentation task, where we propose the use of two collaborative networks. Third, we propose a different, distance aware representation of the data, which is then used in junction with adversarial networks, as another way to constrain the anatomical context. All the proposed methods have been tested on 60 patients with 3D-CT scans, showing good performance compared with other methods
Arige, Abhaya Dhathri. "Simplification of 3D CAD models with deep learning for augmented reality". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS018.
Pełny tekst źródłaAs a part of Industry 4.0 the use of Augmented Reality (AR) devices like HoloLens has gained significant acceptance for training assembly line operators in various industries. When employing Computer-Aided Design (CAD) models to create assembly line instructions for training purposes, preserving all redundant information becomes unnecessary. Utilizing simplified CAD models leads to improved run-time performance of the applications in which they are employed. This specific research project is tasked with developing methods and techniques to streamline complex 3D CAD models, making them suitable for AR applications.In this research, we explain how 3D models play a significant role in augmented reality (AR) by enriching the virtual experience through the superimposition of computer-aided design (CAD models) onto the real world. The study goes on to offer detailed descriptions of numerous applications of AR in operator training. Furthermore, it elucidates how the integration of 3D CAD models contributes to a deeper understanding of instructions and procedures within these training scenarios.We conducted an in-depth literature review in the field of CAD model simplification to determine which simplification techniques are most suitable for integration into augmented reality (AR) scenarios. Our research revealed that mesh-based simplification techniques are particularly effective in preserving the essential features of CAD models while offering the advantages of precise control over the level of detail.Additionally, we have carried out four distinct types of assessments as part of our research. These assessments encompassed objective evaluations that applied mesh-based techniques from existing literature, subjective assessment involving a thorough examination of each simplified model to determine the level of simplification based on vertex ranges, real-world testing conducted with the assistance of the HoloLens2 that demonstrated framerate enhancements when employing simplified CAD models in place of their original versions. To conclude our evaluations, we conducted user assessments, as user experience holds utmost importance in our study. They demonstrated that the simplified models possess a high degree of capability in substituting the original counterparts. However, it was noted that more simplification is required, particularly for intricate CAD models.An innovative approach centered around segmentation and adaptive simplification through the utilization of deep learning methods is proposed as the main methodology. To illustrate this framework, we employed a specific feature called "continuous chains". We subsequently conducted a comparative analysis against established state-of-the-art techniques, demonstrating that our methodology outperforms existing approaches. In our future research, we intend to expand the scope of our framework to encompass multiple features in CAD model
Parekh, Jayneel. "A Flexible Framework for Interpretable Machine Learning : application to image and audio classification". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT032.
Pełny tekst źródłaMachine learning systems and specially neural networks, have rapidly grown in their ability to address complex learning problems. Consequently, they are being integrated into society with an ever-rising influence on all levels of human experience. This has resulted in a need to gain human-understandable insights in their decision making process to ensure the decisions are being made ethically and reliably. The study and development of methods which can generate such insightsbroadly constitutes the field of interpretable machine learning. This thesis aims to develop a novel framework that can tackle two major problem settings in this field, post-hoc and by-design interpretation. Posthoc interpretability devises methods to interpret decisionsof a pre-trained predictive model, while by-design interpretability targets to learn a single model capable of both prediction and interpretation. To this end, we extend the traditional supervised learning formulation to include interpretation as an additional task besides prediction,each addressed by separate but related models, a predictor and an interpreter. Crucially, the interpreter is dependent on the predictor through its hidden layers and utilizes a dictionary of concepts as its representation for interpretation with the capacity to generate local and globalinterpretations. The framework is separately instantiated to address interpretability problems in the context of image and audio classification. Both systems are extensively evaluated for their interpretations on multiple publicly available datasets. We demonstrate high predictiveperformance and fidelity of interpretations in both cases. Despite adhering to the same underlying structure the two systems are designed differently for interpretations.The image interpretability system advances the pipeline for discovering learnt concepts for improvedunderstandability that is qualitatively evaluated. The audio interpretability system instead is designed with a novel representation based on non-negative matrix factorization to facilitate listenable interpretations whilst modeling audio objects composing a scene
Strub, Florian. "Développement de modèles multimodaux interactifs pour l'apprentissage du langage dans des environnements visuels". Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I030.
Pełny tekst źródłaWhile our representation of the world is shaped by our perceptions, our languages, and our interactions, they have traditionally been distinct fields of study in machine learning. Fortunately, this partitioning started opening up with the recent advents of deep learning methods, which standardized raw feature extraction across communities. However, multimodal neural architectures are still at their beginning, and deep reinforcement learning is often limited to constrained environments. Yet, we ideally aim to develop large-scale multimodal and interactive models towards correctly apprehending the complexity of the world. As a first milestone, this thesis focuses on visually grounded language learning for three reasons (i) they are both well-studied modalities across different scientific fields (ii) it builds upon deep learning breakthroughs in natural language processing and computer vision (ii) the interplay between language and vision has been acknowledged in cognitive science. More precisely, we first designed the GuessWhat?! game for assessing visually grounded language understanding of the models: two players collaborate to locate a hidden object in an image by asking a sequence of questions. We then introduce modulation as a novel deep multimodal mechanism, and we show that it successfully fuses visual and linguistic representations by taking advantage of the hierarchical structure of neural networks. Finally, we investigate how reinforcement learning can support visually grounded language learning and cement the underlying multimodal representation. We show that such interactive learning leads to consistent language strategies but gives raise to new research issues
Firmo, Drumond Thalita. "Apports croisées de l'apprentissage hiérarchique et la modélisation du système visuel : catégorisation d'images sur des petits corpus de données". Thesis, Bordeaux, 2020. https://tel.archives-ouvertes.fr/tel-03129189.
Pełny tekst źródłaDeep convolutional neural networks (DCNN) have recently protagonized a revolution in large-scale object recognition. They have changed the usual computer vision practices of hand-engineered features, with their ability to hierarchically learn representative features from data with a pertinent classifier. Together with hardware advances, they have made it possible to effectively exploit the ever-growing amounts of image data gathered online. However, in specific domains like healthcare and industrial applications, data is much less abundant, and expert labeling costs higher than those of general purpose image datasets. This scarcity scenario leads to this thesis' core question: can these limited-data domains profit from the advantages of DCNNs for image classification? This question has been addressed throughout this work, based on an extensive study of literature, divided in two main parts, followed by proposal of original models and mechanisms.The first part reviews object recognition from an interdisciplinary double-viewpoint. First, it resorts to understanding the function of vision from a biological stance, comparing and contrasting to DCNN models in terms of structure, function and capabilities. Second, a state-of-the-art review is established aiming to identify the main architectural categories and innovations in modern day DCNNs. This interdisciplinary basis fosters the identification of potential mechanisms - inspired both from biological and artificial structures — that could improve image recognition under difficult situations. Recurrent processing is a clear example: while not completely absent from the "deep vision" literature, it has mostly been applied to videos — due to their inherently sequential nature. From biology however it is clear such processing plays a role in refining our perception of a still scene. This theme is further explored through a dedicated literature review focused on recurrent convolutional architectures used in image classification.The second part carries on in the spirit of improving DCNNs, this time focusing more specifically on our central question: deep learning over small datasets. First, the work proposes a more detailed and precise discussion of the small sample problem and its relation to learning hierarchical features with deep models. This discussion is followed up by a structured view of the field, organizing and discussing the different possible paths towards adapting deep models to limited data settings. Rather than a raw listing, this review work aims to make sense out of the myriad of approaches in the field, grouping methods with similar intent or mechanism of action, in order to guide the development of custom solutions for small-data applications. Second, this study is complemented by an experimental analysis, exploring small data learning with the proposition of original models and mechanisms (previously published as a journal paper).In conclusion, it is possible to apply deep learning to small datasets and obtain good results, if done in a thoughtful fashion. On the data path, one shall try gather more information from additional related data sources if available. On the complexity path, architecture and training methods can be calibrated in order to profit the most from any available domain-specific side-information. Proposals concerning both of these paths get discussed in detail throughout this document. Overall, while there are multiple ways of reducing the complexity of deep learning with small data samples, there is no universal solution. Each method has its own drawbacks and practical difficulties and needs to be tailored specifically to the target perceptual task at hand
Maignant, Elodie. "Plongements barycentriques pour l'apprentissage géométrique de variétés : application aux formes et graphes". Electronic Thesis or Diss., Université Côte d'Azur, 2023. http://www.theses.fr/2023COAZ4096.
Pełny tekst źródłaAn MRI image has over 60,000 pixels. The largest known human protein consists of around 30,000 amino acids. We call such data high-dimensional. In practice, most high-dimensional data is high-dimensional only artificially. For example, of all the images that could be randomly generated by coloring 256 x 256 pixels, only a very small subset would resemble an MRI image of a human brain. This is known as the intrinsic dimension of such data. Therefore, learning high-dimensional data is often synonymous with dimensionality reduction. There are numerous methods for reducing the dimension of a dataset, the most recent of which can be classified according to two approaches.A first approach known as manifold learning or non-linear dimensionality reduction is based on the observation that some of the physical laws behind the data we observe are non-linear. In this case, trying to explain the intrinsic dimension of a dataset with a linear model is sometimes unrealistic. Instead, manifold learning methods assume a locally linear model.Moreover, with the emergence of statistical shape analysis, there has been a growing awareness that many types of data are naturally invariant to certain symmetries (rotations, reparametrizations, permutations...). Such properties are directly mirrored in the intrinsic dimension of such data. These invariances cannot be faithfully transcribed by Euclidean geometry. There is therefore a growing interest in modeling such data using finer structures such as Riemannian manifolds. A second recent approach to dimension reduction consists then in generalizing existing methods to non-Euclidean data. This is known as geometric learning.In order to combine both geometric learning and manifold learning, we investigated the method called locally linear embedding, which has the specificity of being based on the notion of barycenter, a notion a priori defined in Euclidean spaces but which generalizes to Riemannian manifolds. In fact, the method called barycentric subspace analysis, which is one of those generalizing principal component analysis to Riemannian manifolds, is based on this notion as well. Here we rephrase both methods under the new notion of barycentric embeddings. Essentially, barycentric embeddings inherit the structure of most linear and non-linear dimension reduction methods, but rely on a (locally) barycentric -- affine -- model rather than a linear one.The core of our work lies in the analysis of these methods, both on a theoretical and practical level. In particular, we address the application of barycentric embeddings to two important examples in geometric learning: shapes and graphs. In addition to practical implementation issues, each of these examples raises its own theoretical questions, mostly related to the geometry of quotient spaces. In particular, we highlight that compared to standard dimension reduction methods in graph analysis, barycentric embeddings stand out for their better interpretability. In parallel with these examples, we characterize the geometry of locally barycentric embeddings, which generalize the projection computed by locally linear embedding. Finally, algorithms for geometric manifold learning, novel in their approach, complete this work
Pageaud, Simon. "SmartGov : architecture générique pour la co-construction de politiques urbaines basée sur l'apprentissage par renforcement multi-agent". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1128.
Pełny tekst źródłaIn this thesis, we propose the SmartGov model, coupling multi-agent simulation and multi-agent deep reinforcement learning, to help co-construct urban policies and integrate all stakeholders in the decision process. Smart Cities provide sensor data from the urban areas to increase realism of the simulation in SmartGov.Our first contribution is a generic architecture for multi-agent simulation of the city to study global behavior emergence with realistic agents reacting to political decisions. With a multi-level modeling and a coupling of different dynamics, our tool learns environment specificities and suggests relevant policies. Our second contribution improves autonomy and adaptation of the decision function with multi-agent, multi-level reinforcement learning. A set of clustered agents is distributed over the studied area to learn local specificities without any prior knowledge on the environment. Trust score assignment and individual rewards help reduce non-stationary impact on experience replay in deep reinforcement learning.These contributions bring forth a complete system to co-construct urban policies in the Smart City. We compare our model with different approaches from the literature on a parking fee policy to display the benefits and limits of our contributions
Balikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.
Pełny tekst źródłaText is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements
Klokov, Roman. "Deep learning pour la modélisation de formes 3D". Electronic Thesis or Diss., Université Grenoble Alpes, 2021. http://www.theses.fr/2021GRALM060.
Pełny tekst źródłaApplication of deep learning to geometric 3D data poses various challenges for researchers. The complex nature of geometric 3D data allows to represent it in different forms: occupancy grids, point clouds, meshes, implicit functions, etc. Each of those representations has already spawned streams of deep neural network models, capable of processing and predicting according data samples for further use in various data recognition, generation, and modification tasks.Modern deep learning models force researchers to make various design choices, associated with their architectures, learning algorithms and other specific aspects of the chosen applications. Often, these choices are made with the help of various heuristics and best practice methods discovered through numerous costly experimental evaluations. Probabilistic modeling provides an alternative to these methods that allows to formalize machine learning tasks in a meaningful manner and develop probability-based training objectives. This thesis explores combinations of deep learning based methods and probabilistic modeling in application to geometric 3D data.The first contribution explores how probabilistic modeling could be applied in the context of single-view 3D shape inference task. We propose a family of probabilistic models, Probabilistic Reconstruction Networks (PRNs),which treats the task as image conditioned generation and introduces a global latent variable, encoding shape geometry information. We explore different image conditioning options, and two different training objectives based on Monte Carlo and variational approximations of the model likelihood. Parameters of every distribution are predicted by multi-layered convolutional and fully-connected neural networks from the input images. All the options in the family of models are evaluated in the single-view 3D occupancy grid inference task on synthetic shapes and according image renderings from randomized viewpoints. We show that conditioning the latent variable prior on the input images is sufficient to achieve competitive and state-of-the-art single-view 3D shape inference performance for point cloud based and voxel based metrics, respectively. We additionally demonstrate that probabilistic objective based on variational approximation of the likelihood allows the model to obtain better results compared to Monte Carlo based approximation.The second contribution proposes a probabilistic model for 3D point cloud generation. It treats point clouds as distributions over exchangeable variables and use de Finetti’s representation theorem to define a global latent variable model with conditionally independent distributions for coordinates of each point. To model these point distributions a novel type of conditional normalizing flows is proposed, based on discrete coupling of point coordinate dimensions. These flows update the coordinates of each point sample multiple times by dividing them in two groups and inferring the updates for one group of coordinates from another group and, additionally, global latent variable sample by the means of multi-layered fully-connected neural networks with parameters shared for all the points. We also extend our Discrete Point Flow Networks (DPFNs) from generation to single-view inference task by conditioning the global latent variable prior in a manner similar to PRNs from the first contribution. Resulting generative performance demonstrates that DPFNs produce sets of samples of similar quality and diversity compared to state of the art based on continuous normalizing flows, but are approximately 30 times faster both in training and sampling. Results in autoencoding and single-view inference tasks show competitive and state-of-the-art performance for Chamfer distance, F-score and earth mover’s distance similarity metrics for point clouds
Matcha, Wyao. "Identification des composants prioritaires pour les tests unitaires dans les systèmes OO : une approche basée sur l'apprentissage profond". Thèse, 2020. http://depot-e.uqtr.ca/id/eprint/9420/1/eprint9420.pdf.
Pełny tekst źródła