Tesis sobre el tema "Réseau de croyance profond"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Réseau de croyance profond".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Kaabi, Rabeb. "Apprentissage profond et traitement d'images pour la détection de fumée". Electronic Thesis or Diss., Toulon, 2020. http://www.theses.fr/2020TOUL0017.
Texto completoThis thesis deals with the problem of forest fire detection using image processing and machine learning tools. A forest fire is a fire that spreads over a wooded area. It can be of natural origin (due to lightning or a volcanic eruption) or human. Around the world, the impact of forest fires on many aspects of our daily lives is becoming more and more apparent on the entire ecosystem.Many methods have been shown to be effective in detecting forest fires. The originality of the present work lies in the early detection of fires through the detection of forest smoke and the classification of smoky and non-smoky regions using deep learning and image processing tools. A set of pre-processing techniques helped us to have an important database which allowed us afterwards to test the robustness of the model based on deep belief network we proposed and to evaluate the performance by calculating the following metrics (IoU, Accuracy, Recall, F1 score). Finally, the proposed algorithm is tested on several images in order to validate its efficiency. The simulations of our algorithm have been compared with those processed in the state of the art (Deep CNN, SVM...) and have provided very good results. The results of the proposed methods gave an average classification accuracy of about 96.5% for the early detection of smoke
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0071/document.
Texto completoThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0071.
Texto completoThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Katranji, Mehdi. "Apprentissage profond de la mobilité des personnes". Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA024.
Texto completoKnowledge of mobility is a major challenge for authorities mobility organisers and urban planning. Due to the lack of formal definition of human mobility, the term "people's mobility" will be used in this book. This topic will be introduced by a description of the ecosystem by considering these actors and applications.The creation of a learning model has prerequisites: an understanding of the typologies of the available data sets, their strengths and weaknesses. This state of the art in mobility knowledge is based on the four-step model that has existed and been used since 1970, ending with the renewal of the methodologies of recent years.Our models of people's mobility are then presented. Their common point is the emphasis on the individual, unlike traditional approaches that take the locality as a reference. The models we propose are based on the fact that the intake of individuals' decisions is based on their perception of the environment.This finished book on the study of the deep learning methods of Boltzmann machines restricted. After a state of the art of this family of models, we are looking for strategies to make these models viable in the application world. This last chapter is our contribution main theoretical, by improving robustness and performance of these models
Cheung-Mon-Chan, Pascal. "Réseaux bayésiens et filtres particulaires pour l'égalisation adaptative et le décodage conjoints". Phd thesis, Télécom ParisTech, 2003. http://pastel.archives-ouvertes.fr/pastel-00000732.
Texto completoLe, Cornec Kergann. "Apprentissage Few Shot et méthode d'élagage pour la détection d'émotions sur bases de données restreintes". Thesis, Université Clermont Auvergne (2017-2020), 2020. http://www.theses.fr/2020CLFAC034.
Texto completoEmotion detection plays a major part in human interactions, a goodunderstanding of the speaker's emotional state leading to a betterunderstanding of his speech. It is de facto the same in human-machineinteractions.In the area of emotion detection using computers, deep learning hasemerged as the state of the art. However, classical deep learningtechnics perform poorly when training sets are small. This thesis explores two possible ways for tackling this issue, pruning and fewshot learning.Many pruning methods exist but focus on maximising pruning withoutlosing too much accuracy.We propose a new pruning method, improving the choice of the weightsto remove. This method is based on the rivalry of two networks, theoriginal network and a network we name rival.The idea is to share weights between both models in order to maximisethe accuracy. During training, weights impacting negatively the accuracy will be removed, thus optimising the architecture while improving accuracy. This technic is tested on different networks as well asdifferent databases and achieves state of the art results, improvingaccuracy while pruning a significant percentage of weights.The second area of this thesis is the exploration of matching networks(both siamese and triple), as an answer to learning on small datasets.Sounds and Images were merged to learn their main features, in orderto detect emotions.We show that, while restricting ourselves to 200 training instancesfor each class, triplet network achieves state of the art (trained on hundreds of thousands instances) on some databases.We also show that, in the area of emotion detection, triplet networksprovide a better vectorial embedding of the emotions thansiamese networks, and thusdeliver better results.A new loss function based on triplet loss is also introduced, facilitatingthe training process of the triplet and siamese networks. To allow abetter comparison of our model, different methods are used to provideelements of validation, especially on the vectorial embedding.In the long term, both methods can be combined to propose lighter and optimised networks. As thenumber of parameters is lowered by pruning, the triplet network shouldlearn more easily and could achieve better performances
Azaza, Lobna. "Une approche pour estimer l'influence dans les réseaux complexes : application au réseau social Twitter". Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCK009/document.
Texto completoInfluence in complex networks and in particular Twitter has become recently a hot research topic. Detecting most influential users leads to reach a large-scale information diffusion area at low cost, something very useful in marketing or political campaigns. In this thesis, we propose a new approach that considers the several relations between users in order to assess influence in complex networks such as Twitter. We model Twitter as a multiplex heterogeneous network where users, tweets and objects are represented by nodes, and links model the different relations between them (e.g., retweets, mentions, and replies).The multiplex PageRank is applied to data from two datasets in the political field to rank candidates according to their influence. Even though the candidates' ranking reflects the reality, the multiplex PageRank scores are difficult to interpret because they are very close to each other.Thus, we want to go beyond a quantitative measure and we explore how relations between nodes in the network could reveal about the influence and propose TwitBelief, an approach to assess weighted influence of a certain node. This is based on the conjunctive combination rule from the belief functions theory that allow to combine different types of relations while expressing uncertainty about their importance weights. We experiment TwitBelief on a large amount of data gathered from Twitter during the European Elections 2014 and the French 2017 elections and deduce top influential candidates. The results show that our model is flexible enough to consider multiple interactions combination according to social scientists needs or requirements and that the numerical results of the belief theory are accurate. We also evaluate the approach over the CLEF RepLab 2014 data set and show that our approach leads to quite interesting results. We also propose two extensions of TwitBelief in order to consider the tweets content. The first is the estimation of polarized influence in Twitter network. In this extension, sentiment analysis of the tweets with the algorithm of forest decision trees allows to determine the influence polarity. The second extension is the categorization of communication styles in Twitter, it determines whether the communication style of Twitter users is informative, interactive or balanced
El, Zoghby Nicole. "Fusion distribuée de données échangées dans un réseau de véhicules". Phd thesis, Université de Technologie de Compiègne, 2014. http://tel.archives-ouvertes.fr/tel-01070896.
Texto completoMoukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond". Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.
Texto completoComputer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
Groueix, Thibault. "Learning 3D Generation and Matching". Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC1024.
Texto completoThe goal of this thesis is to develop deep learning approaches to model and analyse 3D shapes. Progress in this field could democratize artistic creation of 3D assets which currently requires time and expert skills with technical software.We focus on the design of deep learning solutions for two particular tasks, key to many 3D modeling applications: single-view reconstruction and shape matching.A single-view reconstruction (SVR) method takes as input a single image and predicts the physical world which produced that image. SVR dates back to the early days of computer vision. In particular, in the 1960s, Lawrence G. Roberts proposed to align simple 3D primitives to the input image under the assumption that the physical world is made of cuboids. Another approach proposed by Berthold Horn in the 1970s is to decompose the input image in intrinsic images and use those to predict the depth of every input pixel.Since several configurations of shapes, texture and illumination can explain the same image, both approaches need to form assumptions on the distribution of images and 3D shapes to resolve the ambiguity. In this thesis, we learn these assumptions from large-scale datasets instead of manually designing them. Learning allows us to perform complete object reconstruction, including parts which are not visible in the input image.Shape matching aims at finding correspondences between 3D objects. Solving this task requires both a local and global understanding of 3D shapes which is hard to achieve explicitly. Instead we train neural networks on large-scale datasets to solve this task and capture this knowledge implicitly through their internal parameters.Shape matching supports many 3D modeling applications such as attribute transfer, automatic rigging for animation, or mesh editing.The first technical contribution of this thesis is a new parametric representation of 3D surfaces modeled by neural networks.The choice of data representation is a critical aspect of any 3D reconstruction algorithm. Until recently, most of the approaches in deep 3D model generation were predicting volumetric voxel grids or point clouds, which are discrete representations. Instead, we present an alternative approach that predicts a parametric surface deformation ie a mapping from a template to a target geometry. To demonstrate the benefits of such a representation, we train a deep encoder-decoder for single-view reconstruction using our new representation. Our approach, dubbed AtlasNet, is the first deep single-view reconstruction approach able to reconstruct meshes from images without relying on an independent post-processing, and can do it at arbitrary resolution without memory issues. A more detailed analysis of AtlasNet reveals it also generalizes better to categories it has not been trained on than other deep 3D generation approaches.Our second main contribution is a novel shape matching approach purely based on reconstruction via deformations. We show that the quality of the shape reconstructions is critical to obtain good correspondences, and therefore introduce a test-time optimization scheme to refine the learned deformations. For humans and other deformable shape categories deviating by a near-isometry, our approach can leverage a shape template and isometric regularization of the surface deformations. As category exhibiting non-isometric variations, such as chairs, do not have a clear template, we learn how to deform any shape into any other and leverage cycle-consistency constraints to learn meaningful correspondences. Our reconstruction-for-matching strategy operates directly on point clouds, is robust to many types of perturbations, and outperforms the state of the art by 15% on dense matching of real human scans
Martinez, Coralie. "Classification précoce de séquences temporelles par de l'apprentissage par renforcement profond". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT123.
Texto completoEarly classification (EC) of time series is a recent research topic in the field of sequential data analysis. It consists in assigning a label to some data that is sequentially collected with new data points arriving over time, and the prediction of a label has to be made using as few data points as possible in the sequence. The EC problem is of paramount importance for supporting decision makers in many real-world applications, ranging from process control to fraud detection. It is particularly interesting for applications concerned with the costs induced by the acquisition of data points, or for applications which seek for rapid label prediction in order to take early actions. This is for example the case in the field of health, where it is necessary to provide a medical diagnosis as soon as possible from the sequence of medical observations collected over time. Another example is predictive maintenance with the objective to anticipate the breakdown of a machine from its sensor signals. In this doctoral work, we developed a new approach for this problem, based on the formulation of a sequential decision making problem, that is the EC model has to decide between classifying an incomplete sequence or delaying the prediction to collect additional data points. Specifically, we described this problem as a Partially Observable Markov Decision Process noted EC-POMDP. The approach consists in training an EC agent with Deep Reinforcement Learning (DRL) in an environment characterized by the EC-POMDP. The main motivation for this approach was to offer an end-to-end model for EC which is able to simultaneously learn optimal patterns in the sequences for classification and optimal strategic decisions for the time of prediction. Also, the method allows to set the importance of time against accuracy of the classification in the definition of rewards, according to the application and its willingness to make this compromise. In order to solve the EC-POMDP and model the policy of the EC agent, we applied an existing DRL algorithm, the Double Deep-Q-Network algorithm, whose general principle is to update the policy of the agent during training episodes, using a replay memory of past experiences. We showed that the application of the original algorithm to the EC problem lead to imbalanced memory issues which can weaken the training of the agent. Consequently, to cope with those issues and offer a more robust training of the agent, we adapted the algorithm to the EC-POMDP specificities and we introduced strategies of memory management and episode management. In experiments, we showed that these contributions improved the performance of the agent over the original algorithm, and that we were able to train an EC agent which compromised between speed and accuracy, on each sequence individually. We were also able to train EC agents on public datasets for which we have no expertise, showing that the method is applicable to various domains. Finally, we proposed some strategies to interpret the decisions of the agent, validate or reject them. In experiments, we showed how these solutions can help gain insight in the choice of action made by the agent
Tong, Zheng. "Evidential deep neural network in the framework of Dempster-Shafer theory". Thesis, Compiègne, 2022. http://www.theses.fr/2022COMP2661.
Texto completoDeep neural networks (DNNs) have achieved remarkable success on many realworld applications (e.g., pattern recognition and semantic segmentation) but still face the problem of managing uncertainty. Dempster-Shafer theory (DST) provides a wellfounded and elegant framework to represent and reason with uncertain information. In this thesis, we have proposed a new framework using DST and DNNs to solve the problems of uncertainty. In the proposed framework, we first hybridize DST and DNNs by plugging a DSTbased neural-network layer followed by a utility layer at the output of a convolutional neural network for set-valued classification. We also extend the idea to semantic segmentation by combining fully convolutional networks and DST. The proposed approach enhances the performance of DNN models by assigning ambiguous patterns with high uncertainty, as well as outliers, to multi-class sets. The learning strategy using soft labels further improves the performance of the DNNs by converting imprecise and unreliable label data into belief functions. We have also proposed a modular fusion strategy using this proposed framework, in which a fusion module aggregates the belief-function outputs of evidential DNNs by Dempster’s rule. We use this strategy to combine DNNs trained from heterogeneous datasets with different sets of classes while keeping at least as good performance as those of the individual networks on their respective datasets. Further, we apply the strategy to combine several shallow networks and achieve a similar performance of an advanced DNN for a complicated task
Ganaye, Pierre-Antoine. "A priori et apprentissage profond pour la segmentation en imagerie cérébrale". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI100.
Texto completoMedical imaging is a vast field guided by advances in instrumentation, acquisition techniques and image processing. Advances in these major disciplines all contribute to the improvement of the understanding of both physiological and pathological phenomena. In parallel, access to broader imaging databases, combined with the development of computing power, has fostered the development of machine learning methodologies for automatic image processing, including approaches based on deep neural networks. Among the applications where deep neural networks provide solutions, we find image segmentation, which consists in locating and delimiting in an image regions with specific properties that will be associated with the same structure. Despite many recent studies in deep learning based segmentation, learning the parameters of a neural network is still guided by quantitative performance measures that do not include high-level knowledge of anatomy. The objective of this thesis is to develop methods to integrate a priori into deep neural networks, targeting the segmentation of brain structures in MRI imaging. Our first contribution proposes a strategy for integrating the spatial position of the patch to be classified, to improve the discriminating power of the segmentation model. This first work considerably corrects segmentation errors that are far away from the anatomical reality, also improving the overall quality of the results. Our second contribution focuses on a methodology to constrain adjacency relationships between anatomical structures, directly while learning network parameters, in order to reinforce the realism of the produced segmentations. Our experiments conclude that the proposed constraint corrects non-admitted adjacencies, thus improving the anatomical consistency of the segmentations produced by the neural network
Bou, Farah Mira. "Méthodes utilisant des fonctions de croyance pour la gestion des informations imparfaites dans les réseaux de véhicules". Thesis, Artois, 2014. http://www.theses.fr/2014ARTO0208/document.
Texto completoThe popularization of vehicles has created safety and environmental problems. Projects havebeen launched worldwide to improve road safety, reduce traffic congestion and bring more comfortto drivers. The vehicle network environment is dynamic and complex, sources are often heterogeneous,and therefore the exchanged information may be imperfect. The theory of belief functionsoffers flexibility in uncertainty modeling and provides rich tools for managing different types of imperfection.It is used to represent uncertainty, manage and fuse the various acquired information.We focus on the management of imperfect information exchanged between vehicles concerningevents on the road. The carried work distinguishes local events and spatial events, which do nothave the same characteristics. In an environment without infrastructure where each vehicle is afusion center and creates its own vision, the goal is to provide to each driver the synthesis of thesituation on the road as close as possible to the reality. Different models using belief functionsare proposed. Different strategies are considered: discount or reinforce towards the absence of theevent to take into account messages ageing, keep the original messages or just the fusion result invehicle database, consider the world update, manage the spatiality of traffic jam events by takinginto account neighborhood. Perspectives remain numerous; some are developed in the manuscriptas the generalization of proposed methods to all spatial events such as fog blankets
Zhang, Jian. "Modèles de Mobilité de Véhicules par Apprentissage Profond dans les Systèmes de Tranport Intelligents". Thesis, Ecole centrale de Lille, 2018. http://www.theses.fr/2018ECLI0015/document.
Texto completoThe intelligent transportation systems gain great research interests in recent years. Although the realistic traffic simulation plays an important role, it has not received enough attention. This thesis is devoted to studying the traffic simulation in microscopic level, and proposes corresponding vehicular mobility models. Using deep learning methods, these mobility models have been proven with a promising credibility to represent the vehicles in real-world. Firstly, a data-driven neural network based mobility model is proposed. This model comes from real-world trajectory data and allows mimicking local vehicle behaviors. By analyzing the performance of this basic learning based mobility model, we indicate that an improvement is possible and we propose its specification. An HMM is then introduced. The preparation of this integration is necessary, which includes an examination of traditional dynamics based mobility models and the adaptation method of “classical” models to our situation. At last, the enhanced model is presented, and a sophisticated scenario simulation is built with it to validate the theoretical results. The performance of our mobility model is promising and implementation issues have also been discussed
Dahmani, Sara. "Synthèse audiovisuelle de la parole expressive : modélisation des émotions par apprentissage profond". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0137.
Texto completo: The work of this thesis concerns the modeling of emotions for expressive audiovisual textto-speech synthesis. Today, the results of text-to-speech synthesis systems are of good quality, however audiovisual synthesis remains an open issue and expressive synthesis is even less studied. As part of this thesis, we present an emotions modeling method which is malleable and flexible, and allows us to mix emotions as we mix shades on a palette of colors. In the first part, we present and study two expressive corpora that we have built. The recording strategy and the expressive content of these corpora are analyzed to validate their use for the purpose of audiovisual speech synthesis. In the second part, we present two neural architectures for speech synthesis. We used these two architectures to model three aspects of speech : 1) the duration of sounds, 2) the acoustic modality and 3) the visual modality. First, we use a fully connected architecture. This architecture allowed us to study the behavior of neural networks when dealing with different contextual and linguistic descriptors. We were also able to analyze, with objective measures, the network’s ability to model emotions. The second neural architecture proposed is a variational auto-encoder. This architecture is able to learn a latent representation of emotions without using emotion labels. After analyzing the latent space of emotions, we presented a procedure for structuring it in order to move from a discrete representation of emotions to a continuous one. We were able to validate, through perceptual experiments, the ability of our system to generate emotions, nuances of emotions and mixtures of emotions, and this for expressive audiovisual text-to-speech synthesis
Chen, Yifu. "Deep learning for visual semantic segmentation". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.
Texto completoIn this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Texto completoMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Donon, Balthazar. "Deep statistical solvers & power systems applications". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG016.
Texto completoFacing with the growing integration of intermittent renewable energies and disruptive market mechanisms, power systems are experiencing profound changes. To overcome this increasing complexity, RTE, the French Transmission System Operator, is investigating the use of methods arising from the Deep Learning literature. Topological changes (which affect the way power lines are interconnected) occur multiple times a day, and should thus be taken into account by the considered neural network architecture, which is made possible by Graph Neural Networks (GNNs). After having proven the ability of GNNs to imitate a power grid simulator, this PhD thesis develops an approach that aims at "learning to optimize" in an unsupervised fashion. A GNN is thus trained by direct minimization of physical laws, and not by imitation. This work is further elaborated by a theoretical analysis, and then extended to a bilevel optimization problem which requires the use of two distinct GNN models, one of them playing the role of an operator, while the other emulates physics
Michelet, Jordan. "Extraction du fouillis de mer dans des images radar marin cohérent : modèles de champ de phases, méthodes de Boltzmann sur réseau, apprentissage". Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS048.
Texto completoWe focus on the problem of sea clutter extraction in marine radar images. The aim is to develop image processing methods allowing us to avoid assumptions about the nature of the sea clutter and the signal of interest. On the one hand, we propose an original algorithm based on a variational approach : a multiphase model with diffuse interface. The results obtained show that the algorithm is efficient when the signal of interest has a sufficiently large signal-to-clutter ratio. On the other hand, we focus on the implementation of lattice Boltzmann schemes for convection-diffusion problems with non-constant advection velocity and non-zero source term. We describe the computation of the consistency obtained by asymptotic analysis at the acoustic scale and with a multiple relaxation time collision operator, and study the stability of these schemes in a particular case. The obtained results show that the proposed schemes allow removing the residual noise and to enhance the signal of interest on the image obtained with the first method. Finally, we propose a learning method allowing us to avoid assumptions on the nature of the signal of interest. Indeed, in addition to the variational approach, we propose an algorithm based on pulse-Doppler processing when the signal of interest is exo-clutter and has a low signal-to-clutter ratio. The results obtained from the proposed double auto-encoder, being comparable to the results provided by each of the two methods, allow validating this approach
Zagoruyko, Sergey. "Weight parameterizations in deep neural networks". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.
Texto completoMultilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them
Estienne, Théo. "Deep learning-based methods for 3D medical image registration". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG055.
Texto completoThis thesis focuses on new deep learning approaches to find the best displacement between two different medical images. This research area, called image registration, have many applications in the clinical pipeline, including the fusion of different imaging types or the temporal follow-up of a patient. This field is studied for many years with various methods, such as diffeomorphic, graph-based or physical-based methods. Recently, deep learning-based methods were proposed using convolutional neural networks.These methods obtained similar results to non-deep learning methods while greatly reducing the computation time and enabling real-time prediction. This improvement comes from the use of graphics processing units (GPU) and a prediction phase where no optimisation is required. However, deep learning-based registration has several limitations, such as the need for large databases to train the network or tuning regularisation hyperparameters to prevent too noisy transformations.In this manuscript, we investigate diverse modifications to deep learning algorithms, working on various imaging types and body parts. We study first the combination of segmentation and registration tasks proposing a new joint architecture. We apply to brain MRI datasets, exploring different cases : brain without and with tumours. Our architecture comprises one encoder and two decoders and the coupling is reinforced by the introduction of a supplementary loss. In the presence of tumour, the similarity loss is modified such as the registration focus only on healthy part ignoring the tumour. Then, we shift to abdominal CT, a more challenging localisation, as there are natural organ's movement and deformation. We improve registration performances thanks to the use of pre-training and pseudo segmentations, the addition of new losses to provide a better regularisation and a multi-steps strategy. Finally, we analyse the explainability of registration networks using a linear decomposition and applying to lung and hippocampus MR. Thanks to our late fusion strategy, we project images to the latent space and calculate a new basis. This basis correspond to elementary transformation witch we study qualitatively
Matteo, Lionel. "De l’image optique "multi-stéréo" à la topographie très haute résolution et la cartographie automatique des failles par apprentissage profond". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4099.
Texto completoSeismogenic faults are the source of earthquakes. The study of their properties thus provides information on some of the properties of the large earthquakes they might produce. Faults are 3D features, forming complex networks generally including one master fault and myriads of secondary faults and fractures that intensely dissect the master fault embedding rocks. I aim in my thesis to develop approaches to help studying this intense secondary faulting/fracturing. To identify, map and measure the faults and fractures within dense fault networks, I have handled two challenges:1) Faults generally form steep topographic escarpments at the ground surface that enclose narrow, deep corridors or canyons, where topography, and hence fault traces, are difficult to measure using the available standard methods (such as stereo and tri-stereo of optical satellite images). To address this challenge, I have thus used multi-stéréo acquisitions with different configuration such as different roll and pitch angles, different date of acquisitions and different mode of acquisitions (mono and tri-stéréo). Our dataset amounting 37 Pléiades images in three different tectonic sites within Western USA (Valley of Fire, Nevada; Granite Dells, Arizona; Bishop Tuff, California) allow us to test different configuration of acquisitions to calculate the topography with three different approaches. Using the free open-source software Micmac (IGN ; Rupnik et al., 2017), I have calculated the topography in the form of Digital Surface Models (DSM): (i) with the combination of 2 to 17 Pleiades images, (ii) stacking and merging DSM built from individual stéréo or tri-stéréo acquisitions avoiding the use of multi-dates combinations, (iii) stacking and merging point clouds built from tri-stereo acquisitions following the multiview pipeline developped by Rupnik et al., 2018. We used the recent multiview stereo pipeling CARS (CNES/CMLA) developped by Michel et al., 2020 as a last approach (iv), combnining tri-stereo acquisitions. From the four different approaches, I have thus calculated more than 200 DSM and my results suggest that combining two tri-stéréo acquisitions or one stéréo and one tri-stéréo acquisitions with opposite roll angles leads to the most accurate DSM (with the most complete and precise topography surface).2) Commonly, faults are mapped manually in the field or from optical images and topographic data through the recognition of the specific curvilinear traces they form at the ground surface. However, manual mapping is time-consuming, which limits our capacity to produce complete representations and measurements of the fault networks. To overcome this problem, we have adopted a machine learning approach, namely a U-Net Convolutional Neural Network, to automate the identification and mapping of fractures and faults in optical images and topographic data. Intentionally, we trained the CNN with a moderate amount of manually created fracture and fault maps of low resolution and basic quality, extracted from one type of optical images (standard camera photographs of the ground surface). Based on the results of a number of performance tests, we select the best performing model, MRef, and demonstrate its capacity to predict fractures and faults accurately in image data of various types and resolutions (ground photographs, drone and satellite images and topographic data). The MRef predictions thus enable the statistical analysis of the fault networks. MRef exhibits good generalization capacities, making it a viable tool for fast and accurate extraction of fracture and fault networks from image and topographic data
Cîrstea, Bogdan-Ionut. "Contribution à la reconnaissance de l'écriture manuscrite en utilisant des réseaux de neurones profonds et le calcul quantique". Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0059.
Texto completoIn this thesis, we provide several contributions from the fields of deep learning and quantum computation to handwriting recognition. We begin by integrating some of the more recent deep learning techniques (such as dropout, batch normalization and different activation functions) into convolutional neural networks and show improved performance on the well-known MNIST dataset. We then propose Tied Spatial Transformer Networks (TSTNs), a variant of Spatial Transformer Networks (STNs) with shared weights, as well as different training variants of the TSTN. We show improved performance on a distorted variant of the MNIST dataset. In another work, we compare the performance of Associative Long Short-Term Memory (ALSTM), a recently introduced recurrent neural network (RNN) architecture, against Long Short-Term Memory (LSTM), on the Arabic handwriting recognition IFN-ENIT dataset. Finally, we propose a neural network architecture, which we name a hybrid classical-quantum neural network, which can integrate and take advantage of quantum computing. While our simulations are performed using classical computation (on a GPU), our results on the Fashion-MNIST dataset suggest that exponential improvements in computational requirements might be achievable, especially for recurrent neural networks trained for sequence classification
Zeghidour, Neil. "Learning representations of speech from the raw waveform". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE004/document.
Texto completoWhile deep neural networks are now used in almost every component of a speech recognition system, from acoustic to language modeling, the input to such systems are still fixed, handcrafted, spectral features such as mel-filterbanks. This contrasts with computer vision, in which a deep neural network is now trained on raw pixels. Mel-filterbanks contain valuable and documented prior knowledge from human auditory perception as well as signal processing, and are the input to state-of-the-art speech recognition systems that are now on par with human performance in certain conditions. However, mel-filterbanks, as any fixed representation, are inherently limited by the fact that they are not fine-tuned for the task at hand. We hypothesize that learning the low-level representation of speech with the rest of the model, rather than using fixed features, could push the state-of-the art even further. We first explore a weakly-supervised setting and show that a single neural network can learn to separate phonetic information and speaker identity from mel-filterbanks or the raw waveform, and that these representations are robust across languages. Moreover, learning from the raw waveform provides significantly better speaker embeddings than learning from mel-filterbanks. These encouraging results lead us to develop a learnable alternative to mel-filterbanks, that can be directly used in replacement of these features. In the second part of this thesis we introduce Time-Domain filterbanks, a lightweight neural network that takes the waveform as input, can be initialized as an approximation of mel-filterbanks, and then learned with the rest of the neural architecture. Across extensive and systematic experiments, we show that Time-Domain filterbanks consistently outperform melfilterbanks and can be integrated into a new state-of-the-art speech recognition system, trained directly from the raw audio signal. Fixed speech features being also used for non-linguistic classification tasks for which they are even less optimal, we perform dysarthria detection from the waveform with Time-Domain filterbanks and show that it significantly improves over mel-filterbanks or low-level descriptors. Finally, we discuss how our contributions fall within a broader shift towards fully learnable audio understanding systems
Pourchot, Aloïs. "Improving Radiographic Diagnosis with Deep Learning in Clinical Settings". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS421.
Texto completoThe impressive successes of deep learning over the course of the past decade have reinforced its establishment as the standard modus operandi to solve difficult machine learning problems, as well as enabled its swift spread to manifold domains of application. One such domain, which is at the heart of this PhD, is medical imaging. Deep learning has made the thrilling perspective of relieving medical experts from a fraction of their burden through automated diagnosis a reality. Over the course of this thesis, we were led to consider two medical problems: the task of fracture detection, and the task of bone age assessment. For both of them, we strove to explore possibilities to improve deep learning tools aimed at facilitating their diagnosis. With this objective in mind, we have explored two different strategies. The first one, ambitious yet arrogant, has led us to investigate the paradigm of neural architecture search, a logical succession to deep learning which aims at learning the very structure of the neural network model used to solve a task. In a second, bleaker but wiser strategy, we have tried to improve a model through the meticulous analysis of the data sources at hands. In both scenarios, a particular care was given to the clinical relevance of our different results and contributions, as we believed that the practical anchoring of our different contrivances was just as important as their theoretical design
Yin, Yuan. "Physics-Aware Deep Learning and Dynamical Systems : Hybrid Modeling and Generalization". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS161.
Texto completoDeep learning has made significant progress in various fields and has emerged as a promising tool for modeling physical dynamical phenomena that exhibit highly nonlinear relationships. However, existing approaches are limited in their ability to make physically sound predictions due to the lack of prior knowledge and to handle real-world scenarios where data comes from multiple dynamics or is irregularly distributed in time and space. This thesis aims to overcome these limitations in the following directions: improving neural network-based dynamics modeling by leveraging physical models through hybrid modeling; extending the generalization power of dynamics models by learning commonalities from data of different dynamics to extrapolate to unseen systems; and handling free-form data and continuously predicting phenomena in time and space through continuous modeling. We highlight the versatility of deep learning techniques, and the proposed directions show promise for improving their accuracy and generalization power, paving the way for future research in new applications
Boutiba, Karim. "On enforcing Network Slicing in the new generation of Radio Access Networks". Electronic Thesis or Diss., Sorbonne université, 2024. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2024SORUS003.pdf.
Texto completoThe emerging 5G networks and beyond promise to support novel use cases such as immersive holographic communication, Internet of Skills, and 4D Interactive mapping [usecases]. These use cases require stringent requirements in terms of Quality of Service (QoS), such as low latency, high Downlink (DL)/Uplink (UL) throughput and low energy consumption. The 3rd Generation Partnership Project (3GPP) specifications introduced many features in 5G New Radio (NR) to improve the physical efficiency of 5G to meet the stringent and heterogeneous requirements of beyond 5G services. Among the key 5G NR features, we can mention the numerology, BandWidth Part (BWP), dynamic Time Duplex Division (TDD) and Connected-mode Discontinuous Reception (C-DRX). However, the specifications do not provide how to configure the next Generation Node B (gNB)/User Equipment (UE) in order to optimize the usage of the 5G NR features. We enforce the 5G NR features by applying Machine Learning (ML), particularly Deep Reinforcement Learning (DRL), to fill this gap. Indeed, Artificial Intelligence (AI)/ML is playing a vital role in communications and networking [1] thanks to its ability to provide a self-configuring and self-optimizing network.In this thesis, different solutions are proposed to enable intelligent configuration of the Radio Access Network (RAN). We divided the solutions into three different parts. The first part concerns RAN slicing leveraging numerology and BWPs. In contrast, the second part tackles dynamic TDD, and the last part goes through different RAN optimizations to support Ultra-Reliable and Low-Latency Communication (URLLC) services.In the first part, we propose two contributions. First, we introduce NRflex, a RAN slicing framework aligned with Open RAN (O-RAN) architecture. NRflex dynamically assigns BWPs to the running slices and their associated User Equipment (UE) to fulfill the slices' required QoS. Then, we model the RAN slicing problem as a Mixed-Integer Linear Programming (MILP) problem. To our best knowledge, this is the first MILP modeling of the radio resource management featuring network slicing, taking into account (i) Mixed-numerology, (ii) both latency and throughput requirements (iii) multiple slices attach per UE (iv) Inter-Numerology Interference (INI). After showing that solving the problem takes an exponential time, we consider a new approach in a polynomial time, which is highly required when scheduling radio resources. The new approach consists of formalizing this problem using a DRL-based solver.In the second part of this thesis, we propose a DRL-based solution to enable dynamic TDD in a single 5G NR cell. The solution is implemented in OAI and tested using real UEs. Then, we extend the solution by leveraging Multi-Agent Deep Reinforcement Learning (MADRL) to support multiple cells, considering cross-link interference between cells.In the last part, we propose three solutions to optimize the RAN to support URLLC services. First, we propose a two-step ML-based solution to predict Radio Link Failure (RLF). We combine Long Short-Term Memory (LSTM) and Support Vector Machine (SVM) to find the correlation between radio measurements and RLF. The RLF prediction model was trained with real data obtained from a 5G testbed. In the second contribution, we propose a DRL-based solution to reduce UL latency. Our solution dynamically allocates the future UL grant by learning from the dynamic traffic pattern. In the last contribution, we introduce a DRL-based solution to balance latency and energy consumption by jointly deriving the C-DRX parameters and the BWP configuration
Foroughmand, Aarabi Hadrien. "Towards global tempo estimation and rhythm-oriented genre classification based on harmonic characteristics of rhythm". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS018.
Texto completoAutomatic detection of the rhythmic structure within music is one of the challenges of the "Music Information Retrieval" research area. The advent of technology dedicated to the arts has allowed the emergence of new musical trends generally described by the term "Electronic/Dance Music" (EDM) which encompasses a plethora of sub-genres. This type of music often dedicated to dance is characterized by its rhythmic structure. We propose a rhythmic analysis of what defines certain musical genres including those of EDM. To do so, we want to perform an automatic global tempo estimation task and a genre classification task based on rhythm. Tempo and genre are two intertwined aspects since genres are often associated with rhythmic patterns that are played in specific tempo ranges. Some so-called "handcrafted" tempo estimation systems have been shown to be effective based on the extraction of rhythm-related characteristics. Recently, with the appearance of annotated databases, so-called "data-driven" systems and deep learning approaches have shown progress in the automatic estimation of these tasks. In this thesis, we propose methods at the crossroads between " handcrafted " and " data-driven " systems. The development of a new representation of rhythm combined with deep learning by convolutional neural network is at the basis of all our work. We present in detail our Deep Rhythm method in this thesis and we also present several extensions based on musical intuitions that allow us to improve our results
Shahid, Mustafizur Rahman. "Deep learning for Internet of Things (IoT) network security". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS003.
Texto completoThe growing Internet of Things (IoT) introduces new security challenges for network activity monitoring. Most IoT devices are vulnerable because of a lack of security awareness from device manufacturers and end users. As a consequence, they have become prime targets for malware developers who want to turn them into bots. Contrary to general-purpose devices, an IoT device is designed to perform very specific tasks. Hence, its networking behavior is very stable and predictable making it well suited for data analysis techniques. Therefore, the first part of this thesis focuses on leveraging recent advances in the field of deep learning to develop network monitoring tools for the IoT. Two types of network monitoring tools are explored: IoT device type recognition systems and IoT network Intrusion Detection Systems (NIDS). For IoT device type recognition, supervised machine learning algorithms are trained to perform network traffic classification and determine what IoT device the traffic belongs to. The IoT NIDS consists of a set of autoencoders, each trained for a different IoT device type. The autoencoders learn the legitimate networking behavior profile and detect any deviation from it. Experiments using network traffic data produced by a smart home show that the proposed models achieve high performance.Despite yielding promising results, training and testing machine learning based network monitoring systems requires tremendous amount of IoT network traffic data. But, very few IoT network traffic datasets are publicly available. Physically operating thousands of real IoT devices can be very costly and can rise privacy concerns. In the second part of this thesis, we propose to leverage Generative Adversarial Networks (GAN) to generate bidirectional flows that look like they were produced by a real IoT device. A bidirectional flow consists of the sequence of the sizes of individual packets along with a duration. Hence, in addition to generating packet-level features which are the sizes of individual packets, our developed generator implicitly learns to comply with flow-level characteristics, such as the total number of packets and bytes in a bidirectional flow or the total duration of the flow. Experimental results using data produced by a smart speaker show that our method allows us to generate high quality and realistic looking synthetic bidirectional flows
Loiseau, Romain. "Real-World 3D Data Analysis : Toward Efficiency and Interpretability". Electronic Thesis or Diss., Marne-la-vallée, ENPC, 2023. http://www.theses.fr/2023ENPC0028.
Texto completoThis thesis explores new deep-learning approaches for modeling and analyzing real-world 3D data. 3D data processing is helpful for numerous high-impact applications such as autonomous driving, territory management, industry facilities monitoring, forest inventory, and biomass measurement. However, annotating and analyzing 3D data can be demanding. Specifically, matching constraints regarding computing resources or annotation efficiency is often challenging. The difficulty of interpreting and understanding the inner workings of deep learning models can also limit their adoption.The computer vision community has made significant efforts to design methods to analyze 3D data, to perform tasks such as shape classification, scene segmentation, and scene decomposition. Early automated analysis relied on hand-crafted descriptors and incorporated prior knowledge about real-world acquisitions. Modern deep learning techniques demonstrate the best performances but are often computationally expensive, rely on large annotated datasets, and have low interpretability. In this thesis, we propose contributions that address these limitations.The first contribution of this thesis is an efficient deep-learning architecture for analyzing LiDAR sequences in real time. Our approach explicitly considers the acquisition geometry of rotating LiDAR sensors, which many autonomous driving perception pipelines use. Compared to previous work, which considers complete LiDAR rotations individually, our model processes the acquisition in smaller increments. Our proposed architecture achieves accuracy on par with the best methods while reducing processing time by more than five times and model size by more than fifty times.The second contribution is a deep learning method to summarize extensive 3D shape collections with a small set of 3D template shapes. We learn end-to-end a small number of 3D prototypical shapes that are aligned and deformed to reconstruct input point clouds. The main advantage of our approach is that its representations are in the 3D space and can be viewed and manipulated. They constitute a compact and interpretable representation of 3D shape collections and facilitate annotation, leading to emph{state-of-the-art} results for few-shot semantic segmentation.The third contribution further expands unsupervised analysis for parsing large real-world 3D scans into interpretable parts. We introduce a probabilistic reconstruction model to decompose an input 3D point cloud using a small set of learned prototypical shapes. Our network determines the number of prototypes to use to reconstruct each scene. We outperform emph{state-of-the-art} unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. We offer significant advantages over existing approaches as our model does not require manual annotations.This thesis also introduces two open-access annotated real-world datasets, HelixNet and the Earth Parser Dataset, acquired with terrestrial and aerial LiDARs, respectively. HelixNet is the largest LiDAR autonomous driving dataset with dense annotations and provides point-level sensor metadata crucial for precisely measuring the latency of semantic segmentation methods. The Earth Parser Dataset consists of seven aerial LiDAR scenes, which can be used to evaluate 3D processing techniques' performances in diverse environments.We hope that these datasets and reliable methods considering the specificities of real-world acquisitions will encourage further research toward more efficient and interpretable models
Mehr, Éloi. "Unsupervised Learning of 3D Shape Spaces for 3D Modeling". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS566.
Texto completoEven though 3D data is becoming increasingly more popular, especially with the democratization of virtual and augmented experiences, it remains very difficult to manipulate a 3D shape, even for designers or experts. Given a database containing 3D instances of one or several categories of objects, we want to learn the manifold of plausible shapes in order to develop new intelligent 3D modeling and editing tools. However, this manifold is often much more complex compared to the 2D domain. Indeed, 3D surfaces can be represented using various embeddings, and may also exhibit different alignments and topologies. In this thesis we study the manifold of plausible shapes in the light of the aforementioned challenges, by deepening three different points of view. First of all, we consider the manifold as a quotient space, in order to learn the shapes’ intrinsic geometry from a dataset where the 3D models are not co-aligned. Then, we assume that the manifold is disconnected, which leads to a new deep learning model that is able to automatically cluster and learn the shapes according to their typology. Finally, we study the conversion of an unstructured 3D input to an exact geometry, represented as a structured tree of continuous solid primitives
Gal, Viviane. "Vers une nouvelle Interaction Homme Environnement dans les jeux vidéo et pervasifs : rétroaction biologique et états émotionnels : apprentissage profond non supervisé au service de l'affectique". Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1269.
Texto completoLiving exceptional moments, experiencing thrills, well-being, blooming, are often part of our dreams or aspirations. We choose various ways to get there like games. Whether the player is looking for originality, challenges, discovery, a story, or other goals, emotional states are the purpose of his quest. He remains until the game gives him pleasure, sensations. How bring them there? We are developing a new human environment interaction that takes into account and adapts to emotions. We address video or pervasive games or other applications. Through this goal, players should not be bothered by interfaces, or biosensors invasivness. This work raises two questions:- Can we discover emotional states based on physiological measurements from contact biosensors?- If so, can these sensors be replaced by remote, non-invasive devices and produce the same results?The models we have developed propose solutions based on unsupervised machine learning methods. We also present remote measurements technics and explain our future works in a new field we call affectics
Messaoud, Kaouther. "Deep learning based trajectory prediction for autonomous vehicles". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS048.
Texto completoThe trajectory prediction of neighboring agents of an autonomous vehicle is essential for autonomous driving in order to perform trajectory planning in an efficient manner. In this thesis, we tackle the problem of predicting the trajectory of a target vehicle in two different environments; a highway and an urban area (intersection, roundabout, etc.). To this end, we develop solutions based on deep machine learning by phasing the interactions between the target vehicle and the static and dynamic elements of the scene. In addition, in order to take into account the uncertainty of the future, we generate multiple plausible trajectories and the probability of occurrence of each. We also make sure that the predicted trajectories are realistic and conform to the structure of the scene. The solutions developed are evaluated using real driving datasets
Bayerlein, Harald. "Machine Learning Methods for UAV-aided Wireless Networks". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS154.
Texto completoAutonomous unmanned aerial vehicles (UAVs), spurred by rapid innovation in drone hardware and regulatory frameworks during the last decade, are envisioned for a multitude of applications in service of the society of the future. From the perspective of next-generation wireless networks, UAVs are not only anticipated in the role of passive cellular-connected users, but also as active enablers of connectivity as part of UAV-aided networks. The defining advantage of UAVs in all potential application scenarios is their mobility. To take full advantage of their capabilities, flexible and efficient path planning methods are necessary. This thesis focuses on exploring machine learning (ML), specifically reinforcement learning (RL), as a promising class of solutions to UAV mobility management challenges. Deep RL is one of the few frameworks that allows us to tackle the complex task of UAV control and deployment in communication scenarios directly, given that these are generally NP-hard optimization problems and badly affected by non-convexity. Furthermore, deep RL offers the possibility to balance multiple objectives of UAV-aided networks in a straightforward way, it is very flexible in terms of the availability of prior or model information, while deep RL inference is computationally efficient. This thesis also explores the challenges of severely limited flying time, cooperation between multiple UAVs, and reducing the training data demand of DRL methods. The thesis also explores the connection between drone-assisted networks and robotics, two generally disjoint research communities
Zhao, Zhou. "Heart Segmentation and Evaluation of Fibrosis". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS003.
Texto completoAtrial fibrillation is the most common heart rhythm disease. Due to a lack of understanding in the matter of underlying atrial structures, current treatments are still not satisfying. Recently, with the popularity of deep learning, many segmentation methods based on deep learning have been proposed to analyze atrial structures, especially from late gadolinium-enhanced magnetic resonance imaging. However, two problems still occur: 1) segmentation results include the atrial-like background; 2) boundaries are very hard to segment. Most segmentation approaches design a specific network that mainly focuses on the regions, to the detriment of the boundaries. Therefore, in this dissertation, we propose two different methods to segment the heart, one two-stage and one end-to-end trainable method. And then, for evaluating the fibrosis degree, we also proposed two methods, one is to combine deep learning with morphology, and the other is to use deep learning directly. Finally, the efficiency of the proposed approach is verified on some public datasets
Wu, Dawen. "Solving Some Nonlinear Optimization Problems with Deep Learning". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG083.
Texto completoThis thesis considers four types of nonlinear optimization problems, namely bimatrix games, nonlinear projection equations (NPEs), nonsmooth convex optimization problems (NCOPs), and chance-constrained games (CCGs).These four classes of nonlinear optimization problems find extensive applications in various domains such as engineering, computer science, economics, and finance.We aim to introduce deep learning-based algorithms to efficiently compute the optimal solutions for these nonlinear optimization problems.For bimatrix games, we use Convolutional Neural Networks (CNNs) to compute Nash equilibria.Specifically, we design a CNN architecture where the input is a bimatrix game and the output is the predicted Nash equilibrium for the game.We generate a set of bimatrix games by a given probability distribution and use the Lemke-Howson algorithm to find their true Nash equilibria, thereby constructing a training dataset.The proposed CNN is trained on this dataset to improve its accuracy. Upon completion of training, the CNN is capable of predicting Nash equilibria for unseen bimatrix games.Experimental results demonstrate the exceptional computational efficiency of our CNN-based approach, at the cost of sacrificing some accuracy.For NPEs, NCOPs, and CCGs, which are more complex optimization problems, they cannot be directly fed into neural networks.Therefore, we resort to advanced tools, namely neurodynamic optimization and Physics-Informed Neural Networks (PINNs), for solving these problems.Specifically, we first use a neurodynamic approach to model a nonlinear optimization problem as a system of Ordinary Differential Equations (ODEs).Then, we utilize a PINN-based model to solve the resulting ODE system, where the end state of the model represents the predicted solution to the original optimization problem.The neural network is trained toward solving the ODE system, thereby solving the original optimization problem.A key contribution of our proposed method lies in transforming a nonlinear optimization problem into a neural network training problem.As a result, we can now solve nonlinear optimization problems using only PyTorch, without relying on classical convex optimization solvers such as CVXPY, CPLEX, or Gurobi
Dolz, Jose. "Vers la segmentation automatique des organes à risque dans le contexte de la prise en charge des tumeurs cérébrales par l’application des technologies de classification de deep learning". Thesis, Lille 2, 2016. http://www.theses.fr/2016LIL2S059/document.
Texto completoBrain cancer is a leading cause of death and disability worldwide, accounting for 14.1 million of new cancer cases and 8.2 million deaths only in 2012. Radiotherapy and radiosurgery are among the arsenal of available techniques to treat it. Because both techniques involve the delivery of a very high dose of radiation, tumor as well as surrounding healthy tissues must be precisely delineated. In practice, delineation is manually performed by experts, or with very few machine assistance. Thus, it is a highly time consuming process with significant variation between labels produced by different experts. Radiation oncologists, radiology technologists, and other medical specialists spend, therefore, a substantial portion of their time to medical image segmentation. If by automating this process it is possible to achieve a more repeatable set of contours that can be agreed upon by the majority of oncologists, this would improve the quality of treatment. Additionally, any method that can reduce the time taken to perform this step will increase patient throughput and make more effective use of the skills of the oncologist.Nowadays, automatic segmentation techniques are rarely employed in clinical routine. In case they are, they typically rely on registration approaches. In these techniques, anatomical information is exploited by means of images already annotated by experts, referred to as atlases, to be deformed and matched on the patient under examination. The quality of the deformed contours directly depends on the quality of the deformation. Nevertheless, registration techniques encompass regularization models of the deformation field, whose parameters are complex to adjust, and its quality is difficult to evaluate. Integration of tools that assist in the segmentation task is therefore highly expected in clinical practice.The main objective of this thesis is therefore to provide radio-oncology specialists with automatic tools to delineate organs at risk of patients undergoing brain radiotherapy or stereotactic radiosurgery. To achieve this goal, main contributions of this thesis are presented on two major axes. First, we consider the use of one of the latest hot topics in artificial intelligence to tackle the segmentation problem, i.e. deep learning. This set of techniques presents some advantages with respect to classical machine learning methods, which will be exploited throughout this thesis. The second axis is dedicated to the consideration of proposed image features mainly associated with texture and contextual information of MR images. These features, which are not present in classical machine learning based methods to segment brain structures, led to improvements on the segmentation performance. We therefore propose the inclusion of these features into a deep network.We demonstrate in this work the feasibility of using such deep learning based classification scheme for this particular problem. We show that the proposed method leads to high performance, both in accuracy and efficiency. We also show that automatic segmentations provided by our method lie on the variability of the experts. Results demonstrate that our method does not only outperform a state-of-the-art classifier, but also provides results that would be usable in the radiation treatment planning
Yang, Lixuan. "Structuring of image databases for the suggestion of products for online advertising". Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1102/document.
Texto completoThe topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered
Cárdenas, Chapellín Julio José. "Inversion of geophysical data by deep learning". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS185.
Texto completoThis thesis presents the characterization ofmagnetic anomalies using convolutional neural networks, and the application of visualization tools to understand and validate their predictions. The developed approach allows the localization of magnetic dipoles, including counting the number of dipoles, their geographical position, and the prediction of their parameters (magnetic moment, depth, and declination). Our results suggest that the combination of two deep learning models, "YOLO" and "DenseNet", performs best in achieving our classification and regression goals. Additionally, we applied visualization tools to understand our model’s predictions and its working principle. We found that the Grad-CAM tool improved prediction performance by identifying several layers that had no influence on the prediction and the t-SNE tool confirmed the good ability of our model to differentiate among different parameter combinations. Then, we tested our model with real data to establish its limitations and application domain. Results demonstrate that our model detects dipolar anomalies in a real magnetic map even after learning from a synthetic database with a lower complexity, which indicates a significant generalization capability. We also noticed that it is not able to identify dipole anomalies of shapes and sizes different from those considered for the creation of the synthetic database. Our current work consists in creating new databases by combining synthetic and real data to compare their potential influence in improving predictions. Finally, the perspectives of this work consist in validating the operational relevance and adaptability of our model under realistic conditions and in testing other applications with alternative geophysical methods
Esta tesis presenta la caracterización de anomalías magnéticas mediante redes neuronales convolucionales, y la aplicación de herramientas de visualización para entender y validar sus predicciones. El enfoque desarrollado permite la localización de dipolos magnéticos, incluyendo el recuento delnúmero de dipolos, su posición geográfica y la predicción de sus parámetros (momento magnético, profundidad y declinación). Nuestros resultados sugieren que la combinación de dos modelos de aprendizaje profundo, "YOLO" y "DenseNet", es la que mejor se ajusta a nuestros objetivos de clasificación y regresión. Adicionalmente, aplicamos herramientas de visualización para entender las predicciones de nuestromodelo y su principio de funcionamiento. Descubrimos que la herramienta Grad-CAM mejoraba el rendimiento de la predicción al identificar varias capas que no influían enla predicción y la herramienta t-SNE confirmaba la buena capacidad de nuestro modelo para diferenciar entre distintas combinaciones de parámetros. Seguidamente, probamos nuestro modelo con datos reales para establecer sus limitaciones y su rango de aplicación. Los resultados demuestran quenuestro modelo detecta anomalías dipolares en unmapa magnético real incluso después de aprender de una base de datos sintética con una complejidad menor, lo que indica una capacidad de generalización significativa. También observamos que no es capaz de identificar anomalías dipolares de formas y tamaños diferentes a los considerados para la creación de la base de datos sintética. Nuestro trabajo actual consiste en crear nuevas bases de datos combinando datos sintéticos y reales para comparar su posible influencia en la mejora de las predicciones. Por último, las perspectivas de este trabajo consisten en validar la pertinencia operativa y la adaptabilidad de nuestro modelo en condiciones realistas y en probar otras aplicaciones con métodos geofísicos alternativos
Esteves, José Jurandir Alves. "Optimization of network slice placement in distributed large-scale infrastructures : from heuristics to controlled deep reinforcement learning". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS325.
Texto completoThis PhD thesis investigates how to optimize Network Slice Placement in distributed large-scale infrastructures focusing on online heuristic and Deep Reinforcement Learning (DRL) based approaches. First, we rely on Integer Linear Programming (ILP) to propose a data model for enabling on-Edge and on-Network Slice Placement. In contrary to most studies related to placement in the NFV context, the proposed ILP model considers complex Network Slice topologies and pays special attention to the geographic location of Network Slice Users and its impact on the End-to-End (E2E) latency. Extensive numerical experiments show the relevance of taking into account the user location constraints. Then, we rely on an approach called the “Power of Two Choices"(P2C) to propose an online heuristic algorithm for the problem which is adapted to support placement on large-scale distributed infrastructures while integrating Edge-specific constraints. The evaluation results show the good performance of the heuristic that solves the problem in few seconds under a large-scale scenario. The heuristic also improves the acceptance ratio of Network Slice Placement Requests when compared against a deterministic online ILP-based solution. Finally, we investigate the use of ML methods, more specifically DRL, for increasing scalability and automation of Network Slice Placement considering a multi-objective optimization approach to the problem. We first propose a DRL algorithm for Network Slice Placement which relies on the Advantage Actor Critic algorithm for fast learning, and Graph Convolutional Networks for feature extraction automation. Then, we propose an approach we call Heuristically Assisted Deep Reinforcement Learning (HA-DRL), which uses heuristics to control the learning and execution of the DRL agent. We evaluate this solution trough simulations under stationary, cycle-stationary and non-stationary network load conditions. The evaluation results show that heuristic control is an efficient way of speeding up the learning process of DRL, achieving a substantial gain in resource utilization, reducing performance degradation, and is more reliable under unpredictable changes in network load than non-controlled DRL algorithms
Feng, Yuting. "Diffusion-Aware Recommendation in Social Media". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG009.
Texto completoWith the increasing popularity of social media as pathways to information, making recommendations in specific social scenarios deserves attention, where the information diffusion patterns and influence mechanisms therein are exploited. We strive in our work to develop models and algorithms for serving information to users in social media, either in a direct user-based (personalized) way or in an indirect audience-based way, with the former pertaining to news recommendation and the latter referring to fairness in influence maximization. News recommendation systems are generally based on the semantic content of news items and user profiles, whereas the underlying recommendation scenario is ignored. We consider in our PhD work a diffusion and influence-aware perspective on the news recommendation problem, and we first propose a lightweight deep learning approach for it, called DSN. This approach targets news recommendation in micro-blogging platforms, such as Twitter or Weibo, whose extreme data velocity demands a satisfactory trade-off between the model's complexity and its effectiveness. We use graph embeddings -- node representations that are indicative of news diffusion patterns -- leading to valuable social-related information for recommendations. To merge the semantics and social-related representations of news, a specially designed convolutional neural network for joint feature representation (SCNN) is used as the news encoder, while an attention model automatically aggregates the different interests of users. To further exploit the time dimension, with a sequential recommendation perspective on news recommendation in the micro-blogging scenario, we propose secondly in our PhD work an alternative deep-learning based recommendation model, which is also diffusion and influence-aware, called Influence-Graph News Recommender (IGNteR). It is a content-based deep recommendation model that jointly exploits all the data facets that may impact adoption decisions, namely semantics, diffusion-related features pertaining to local and global influence among users, temporal attractiveness, and timeliness, as well as dynamic user preferences. We perform extensive experiments on the same real-world datasets, showing that IGNiteR outperforms the state-of-the-art deep-learning based news recommendation methods.For the indirect and audience-based recommendation setting, we focus on influence maximization with fairness, which aims to select k influential nodes to maximise the spread of information in a network, while ensuring that selected sensitive user attributes (e.g., gender location, origin, race, etc.) are fairly affected, i.e., are proportionally similar between the original network and the affected users. We propose two data-driven approaches: (a) fairness-based participant sampling (FPS) and (b) fairness as context (FAC), which are based on learning node representations (embeddings) to extract spread-related user features from diffusion cascades information, instead of the social connectivity, and in this way we can deal with very large graphs. The extracted features are then used in selecting influencers that maximize the influence spread, while also being fair with respect to the chosen sensitive attributes. In FPS, fairness and cascade length information are considered independently in the decision-making process, while FAC considers these information facets jointly and takes into account correlations between them. The proposed algorithms are generic and represent the first policy-driven solutions that can be applied to arbitrary sets of sensitive attributes at scale
Kang, Chen. "Image Aesthetic Quality Assessment Based on Deep Neural Networks". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG004.
Texto completoWith the development of capture devices and the Internet, people access to an increasing amount of images. Assessing visual aesthetics has important applications in several domains, from image retrieval and recommendation to enhancement. Image aesthetic quality assessment aims at determining how beautiful an image looks to human observers. Many problems in this field are not studied well, including the subjectivity of aesthetic quality assessment, explanation of aesthetics and the human-annotated data collection. Conventional image aesthetic quality prediction aims at predicting the average score or aesthetic class of a picture. However, the aesthetic prediction is intrinsically subjective, and images with similar mean aesthetic scores/class might display very different levels of consensus by human raters. Recent work has dealt with aesthetic subjectivity by predicting the distribution of human scores, but predicting the distribution is not directly interpretable in terms of subjectivity, and might be sub-optimal compared to directly estimating subjectivity descriptors computed from ground-truth scores. Furthermore, labels in existing datasets are often noisy, incomplete or they do not allow more sophisticated tasks such as understanding why an image looks beautiful or not to a human observer. In this thesis, we first propose several measures of subjectivity, ranging from simple statistical measures such as the standard deviation of the scores, to newly proposed descriptors inspired by information theory. We evaluate the prediction performance of these measures when they are computed from predicted score distributions and when they are directly learned from ground-truth data. We find that the latter strategy provides in general better results. We also use the subjectivity to improve predicting aesthetic scores, showing that information theory inspired subjectivity measures perform better than statistical measures. Then, we propose an Explainable Visual Aesthetics (EVA) dataset, which contains 4070 images with at least 30 votes per image. EVA has been crowd-sourced using a more disciplined approach inspired by quality assessment best practices. It also offers additional features, such as the degree of difficulty in assessing the aesthetic score, rating for 4 complementary aesthetic attributes, as well as the relative importance of each attribute to form aesthetic opinions. The publicly available dataset is expected to contribute to future research on understanding and predicting visual quality aesthetics. Additionally, we studied the explainability of image aesthetic quality assessment. A statistical analysis on EVA demonstrates that the collected attributes and relative importance can be linearly combined to explain effectively the overall aesthetic mean opinion scores. We found subjectivity has a limited correlation to average personal difficulty in aesthetic assessment, and the subject's region, photographic level and age affect the user's aesthetic assessment significantly
Mercadier, Yves. "Classification automatique de textes par réseaux de neurones profonds : application au domaine de la santé". Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS068.
Texto completoThis Ph.D focuses on the analysis of textual data in the health domain and in particular on the supervised multi-class classification of data from biomedical literature and social media.One of the major difficulties when exploring such data by supervised learning methods is to have a sufficient number of data sets for models training. Indeed, it is generally necessary to label manually the data before performing the learning step. The large size of the data sets makes this labellisation task very expensive, which should be reduced with semi-automatic systems.In this context, active learning, in which the Oracle intervenes to choose the best examples to label, is promising. The intuition is as follows: by choosing the smartly the examples and not randomly, the models should improve with less effort for the oracle and therefore at lower cost (i.e. with less annotated examples). In this PhD, we will evaluate different active learning approaches combined with recent deep learning models.In addition, when small annotated data set is available, one possibility of improvement is to artificially increase the data quantity during the training phase, by automatically creating new data from existing data. More precisely, we inject knowledge by taking into account the invariant properties of the data with respect to certain transformations. The augmented data can thus cover an unexplored input space, avoid overfitting and improve the generalization of the model. In this Ph.D, we will propose and evaluate a new approach for textual data augmentation.These two contributions will be evaluated on different textual datasets in the medical domain
Yang, Lixuan. "Structuring of image databases for the suggestion of products for online advertising". Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1102.
Texto completoThe topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered
Zhao, Xi. "3D face analysis : landmarking, expression recognition and beyond". Phd thesis, Ecole Centrale de Lyon, 2010. http://tel.archives-ouvertes.fr/tel-00599660.
Texto completoSivasankaran, Sunit. "Séparation de la parole guidée par la localisation". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0078.
Texto completoVoice based personal assistants are part of our daily lives. Their performance suffers in the presence of signal distortions, such as noise, reverberation, and competing speakers. This thesis addresses the problem of extracting the signal of interest in such challenging conditions by first localizing the target speaker and using the location to extract the target speech. In a first stage, a common situation is considered when the target speaker utters a known word or sentence such as the wake-up word of a distant-microphone voice command system. A method that exploits this text information in order to improve the speaker localization performance in the presence of competing speakers is proposed. The proposed solution uses a speech recognition system to align the wake-up word to the corrupted speech signal. A model spectrum representing the aligned phones is used to compute an identifier which is then used by a deep neural network to localize the target speaker. Results on simulated data show that the proposed method reduces the localization error rate compared to the classical GCC-PHAT method. Similar improvements are observed on real data. Given the estimated location of the target speaker, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from that location which is then used in the second stage to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second-order statistics and to derive an adaptive beamformer in the third stage. A multichannel, multispeaker, reverberated, noisy dataset --- inspired from the famous WSJ0-2mix dataset --- was generated and the performance of the proposed pipeline was investigated in terms of the word error rate (WER). To make the system robust to localization errors, a Speaker LOcalization Guided Deflation (SLOGD) based approach which estimates the sources iteratively is proposed. At each iteration the location of one speaker is estimated and used to estimate a mask corresponding to that speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. The proposed method is shown to outperform Conv-TasNet. Finally, we consider the problem of explaining the robustness of neural networks used to compute time-frequency masks to mismatched noise conditions. We employ the so-called SHAP method to quantify the contribution of every time-frequency bin in the input signal to the estimated time-frequency mask. We define a metric that summarizes the SHAP values and show that it correlates with the WER achieved on separated speech. To the best of our knowledge, this is the first known study on neural network explainability in the context of speech separation
Sahin, Serdar. "Advanced receivers for distributed cooperation in mobile ad hoc networks". Thesis, Toulouse, INPT, 2019. http://www.theses.fr/2019INPT0089.
Texto completoMobile ad hoc networks (MANETs) are rapidly deployable wireless communications systems, operating with minimal coordination in order to avoid spectral efficiency losses caused by overhead. Cooperative transmission schemes are attractive for MANETs, but the distributed nature of such protocols comes with an increased level of interference, whose impact is further amplified by the need to push the limits of energy and spectral efficiency. Hence, the impact of interference has to be mitigated through with the use PHY layer signal processing algorithms with reasonable computational complexity. Recent advances in iterative digital receiver design techniques exploit approximate Bayesian inference and derivative message passing techniques to improve the capabilities of well-established turbo detectors. In particular, expectation propagation (EP) is a flexible technique which offers attractive complexity-performance trade-offs in situations where conventional belief propagation is limited by computational complexity. Moreover, thanks to emerging techniques in deep learning, such iterative structures are cast into deep detection networks, where learning the algorithmic hyper-parameters further improves receiver performance. In this thesis, EP-based finite-impulse response decision feedback equalizers are designed, and they achieve significant improvements, especially in high spectral efficiency applications, over more conventional turbo-equalization techniques, while having the advantage of being asymptotically predictable. A framework for designing frequency-domain EP-based receivers is proposed, in order to obtain detection architectures with low computational complexity. This framework is theoretically and numerically analysed with a focus on channel equalization, and then it is also extended to handle detection for time-varying channels and multiple-antenna systems. The design of multiple-user detectors and the impact of channel estimation are also explored to understand the capabilities and limits of this framework. Finally, a finite-length performance prediction method is presented for carrying out link abstraction for the EP-based frequency domain equalizer. The impact of accurate physical layer modelling is evaluated in the context of cooperative broadcasting in tactical MANETs, thanks to a flexible MAC-level simulator
Wei, Wen. "Apprentissage automatique des altérations cérébrales causées par la sclérose en plaques en neuro-imagerie multimodale". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4021.
Texto completoMultiple Sclerosis (MS) is the most common progressive neurological disease of young adults worldwide and thus represents a major public health issue with about 90,000 patients in France and more than 500,000 people affected with MS in Europe. In order to optimize treatments, it is essential to be able to measure and track brain alterations in MS patients. In fact, MS is a multi-faceted disease which involves different types of alterations, such as myelin damage and repair. Under this observation, multimodal neuroimaging are needed to fully characterize the disease. Magnetic resonance imaging (MRI) has emerged as a fundamental imaging biomarker for multiple sclerosis because of its high sensitivity to reveal macroscopic tissue abnormalities in patients with MS. Conventional MR scanning provides a direct way to detect MS lesions and their changes, and plays a dominant role in the diagnostic criteria of MS. Moreover, positron emission tomography (PET) imaging, an alternative imaging modality, can provide functional information and detect target tissue changes at the cellular and molecular level by using various radiotracers. For example, by using the radiotracer [11C]PIB, PET allows a direct pathological measure of myelin alteration. However, in clinical settings, not all the modalities are available because of various reasons. In this thesis, we therefore focus on learning and predicting missing-modality-derived brain alterations in MS from multimodal neuroimaging data
Ghrissi, Amina. "Ablation par catheter de fibrillation atriale persistante guidée par dispersion spatiotemporelle d’électrogrammes : Identification automatique basée sur l’apprentissage statistique". Thesis, Université Côte d'Azur, 2021. http://www.theses.fr/2021COAZ4026.
Texto completoCatheter ablation is increasingly used to treat atrial fibrillation (AF), the most common sustained cardiac arrhythmia encountered in clinical practice. A recent patient-tailored AF ablation therapy, giving 95% of procedural success rate, is based on the use of a multipolar mapping catheter called PentaRay. It targets areas of spatiotemporal dispersion (STD) in the atria as potential AF drivers. STD stands for a delay of the cardiac activation observed in intracardiac electrograms (EGMs) across contiguous leads.In practice, interventional cardiologists localize STD sites visually using the PentaRay multipolar mapping catheter. This thesis aims to automatically characterize and identify ablation sites in STD-based ablation of persistent AF using machine learning (ML) including deep learning (DL) techniques. In the first part, EGM recordings are classified into STD vs. non-STD groups. However, highly imbalanced dataset ratio hampers the classification performance. We tackle this issue by using adapted data augmentation techniques that help achieve good classification. The overall performance is high with values of accuracy and AUC around 90%. First, two approaches are benchmarked, feature engineering and automatic feature extraction from a time series, called maximal voltage absolute values at any of the bipoles (VAVp). Statistical features are extracted and fed to ML classifiers but no important dissimilarity is obtained between STD and non-STD categories. Results show that the supervised classification of raw VAVp time series itself into the same categories is promising with values of accuracy, AUC, sensi-tivity and specificity around 90%. Second, the classification of raw multichannel EGM recordings is performed. Shallow convolutional arithmetic circuits are investigated for their promising theoretical interest but experimental results on synthetic data are unsuccessful. Then, we move forward to more conventional supervised ML tools. We design a selection of data representations adapted to different ML and DL models, and benchmark their performance in terms of classification and computational cost. Transfer learning is also assessed. The best performance is achieved with a convolutional neural network (CNN) model for classifying raw EGM matrices. The average performance over cross-validation reaches 94% of accuracy and AUC added to an F1-score of 60%. In the second part, EGM recordings acquired during mapping are labeled ablated vs. non-ablated according to their proximity to the ablation sites then classified into the same categories. STD labels, previously defined by interventional cardiologists at the ablation procedure, are also aggregated as a prior probability in the classification task.Classification results on the test set show that a shallow CNN gives the best performance with an F1-score of 76%. Aggregating STD label does not help improve the model’s performance. Overall, this work is among the first attempts at the application of statistical analysis and ML tools to automatically identify successful ablation areas in STD-based ablation. By providing interventional cardiologists with a real-time objective measure of STD, the proposed solution offers the potential to improve the efficiency and effectiveness of this fully patient-tailored catheter ablation approach for treating persistent AF