Зміст
Добірка наукової літератури з теми "Apprentissage métrique profond"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Apprentissage métrique profond".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Дисертації з теми "Apprentissage métrique profond"
Bhattarai, Binod. "Développement de méthodes de rapprochement physionomique par apprentissage machine." Caen, 2016. https://hal.archives-ouvertes.fr/tel-01467985.
Повний текст джерелаThe work presented in this PhD thesis takes place in the general context of face matching. More precisely, our goal is to design and develop novel algorithms to learn compact, discriminative, domain invariant or de-identifying representations of faces. Searching and indexing faces open the door to many interesting applications. However, this is made day after day more challenging due to the rapid growth of the volume of faces to analyse. Representing faces by compact and discriminative features is consequently es- sential to deal with such very large datasets. Moreover, this volume is increasing without any apparent limits; this is why it is also relevant to propose solutions to organise faces in meaningful ways, in order to reduce the search space and improve efficiency of the retrieval. Although the volume of faces available on the internet is increasing, it is still difficult to find annotated examples to train models for each possible use cases e. G. For different races, sexes, etc. For every specifie task. Learning a model with training examples from a group of people can fail to predict well in another group due to the uneven rate of changes of biometrie dimensions e. G. , ageing, among them. Similarly, a modellean1ed from a type of feature can fail to make good predictions when tested with another type of feature. It would be ideal to have models producing face representations that would be invariant to these discrepancies. Learning common representations ultimately helps to reduce the domain specifie parameters and, more important!y, allows to use training examples from domains weil represented to other demains. Hence, there is a need for designing algorithms to map the features from different domains to a common subspace -bringing faces bearing same properties closer. On the other band, as automatic face matching tools are getting smarter and smarter, there is an increasing threat on privacy. The popularity in photo sharing on the social networks has exacerbated this risk. In such a context, altering the representations of faces so that the faces cannot be identified by automatic face matchers -while the faces look as similar as before -has become an interesting perspective toward privacy protection. It allows users to limit the risk of sharing their photos in social networks. In ali these scenarios, we explored how the use of Metric Leaming methods as weil as those of Deep Learning can help us to leam compact and discriminative representations of faces. We build on these tools, proposing compact, discriminative, domain invariant representations and de-identifying representations of faces crawled from Flicker. Corn to LFW and generated a novel and more challenging dataset to evaluate our algorithms in large-scale. We applied the proposed methods on a wide range of facial analysing applications. These applications include: large-scale face retrieval, age estimation, attribute predictions and identity de-identification. We have evaluated our algorithms on standard and challenging public datasets such as: LFW, CelebA, MORPH II etc. Moreover, we appended lM faces crawled from Flicker. Corn to LFW and generated a novel and more challenging dataset to evaluate our algorithms in large-scale. Our experiments show that the proposed methods are more accurate and more efficient than compared competitive baselines and existing state-of-art methods, and attain new state-of-art performance
Habib, Yassine. "Monocular SLAM densification for 3D mapping and autonomous drone navigation." Electronic Thesis or Diss., Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2024. http://www.theses.fr/2024IMTA0390.
Повний текст джерелаAerial drones are essential in search and rescue missions as they provide fast reconnaissance of the mission area, such as a collapsed building. Creating a dense and metric 3D map in real-time is crucial to capture the structure of the environment and enable autonomous navigation. The recommended approach for this task is to use Simultaneous Localization and Mapping (SLAM) from a monocular camera synchronized with an Inertial Measurement Unit (IMU). Current state-of-the-art algorithms maximize efficiency by triangulating a minimum number of points, resulting in a sparse 3D point cloud. Few works address monocular SLAM densification, typically by using deep neural networks to predict a dense depth map from a single image. Most are not metric or are too complex for use in embedded applications. In this thesis, we identify and evaluate a state of-the-art monocular SLAM baseline under challenging drone conditions. We present a practical pipeline for densifying monocular SLAM by applying monocular depth prediction to construct a dense and metric 3D voxel map. Using voxels allows the efficient construction and maintenance of the map through raycasting, and allows for volumetric multi-view fusion. Finally, we propose a scale recovery procedure that uses the sparse and metric depth estimates of SLAM to refine the predicted dense depth maps. Our approach has been evaluated on conventional benchmarks and shows promising results for practical applications
Venkataramanan, Shashanka. "Metric learning for instance and category-level visual representation." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS022.
Повний текст джерелаThe primary goal in computer vision is to enable machines to extract meaningful information from visual data, such as images and videos, and leverage this information to perform a wide range of tasks. To this end, substantial research has focused on developing deep learning models capable of encoding comprehensive and robust visual representations. A prominent strategy in this context involves pretraining models on large-scale datasets, such as ImageNet, to learn representations that can exhibit cross-task applicability and facilitate the successful handling of diverse downstream tasks with minimal effort. To facilitate learning on these large-scale datasets and encode good representations, com- plex data augmentation strategies have been used. However, these augmentations can be limited in their scope, either being hand-crafted and lacking diversity, or generating images that appear unnatural. Moreover, the focus of these augmentation techniques has primarily been on the ImageNet dataset and its downstream tasks, limiting their applicability to a broader range of computer vision problems. In this thesis, we aim to tackle these limitations by exploring different approaches to en- hance the efficiency and effectiveness in representation learning. The common thread across the works presented is the use of interpolation-based techniques, such as mixup, to generate diverse and informative training examples beyond the original dataset. In the first work, we are motivated by the idea of deformation as a natural way of interpolating images rather than using a convex combination. We show that geometrically aligning the two images in the fea- ture space, allows for more natural interpolation that retains the geometry of one image and the texture of the other, connecting it to style transfer. Drawing from these observations, we explore the combination of mixup and deep metric learning. We develop a generalized formu- lation that accommodates mixup in metric learning, leading to improved representations that explore areas of the embedding space beyond the training classes. Building on these insights, we revisit the original motivation of mixup and generate a larger number of interpolated examples beyond the mini-batch size by interpolating in the embedding space. This approach allows us to sample on the entire convex hull of the mini-batch, rather than just along lin- ear segments between pairs of examples. Finally, we investigate the potential of using natural augmentations of objects from videos. We introduce a "Walking Tours" dataset of first-person egocentric videos, which capture a diverse range of objects and actions in natural scene transi- tions. We then propose a novel self-supervised pretraining method called DoRA, which detects and tracks objects in video frames, deriving multiple views from the tracks and using them in a self-supervised manner
Carvalho, Micael. "Deep representation spaces." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS292.
Повний текст джерелаIn recent years, Deep Learning techniques have swept the state-of-the-art of many applications of Machine Learning, becoming the new standard approach for them. The architectures issued from these techniques have been used for transfer learning, which extended the power of deep models to tasks that did not have enough data to fully train them from scratch. This thesis' subject of study is the representation spaces created by deep architectures. First, we study properties inherent to them, with particular interest in dimensionality redundancy and precision of their features. Our findings reveal a strong degree of robustness, pointing the path to simple and powerful compression schemes. Then, we focus on refining these representations. We choose to adopt a cross-modal multi-task problem, and design a loss function capable of taking advantage of data coming from multiple modalities, while also taking into account different tasks associated to the same dataset. In order to correctly balance these losses, we also we develop a new sampling scheme that only takes into account examples contributing to the learning phase, i.e. those having a positive loss. Finally, we test our approach in a large-scale dataset of cooking recipes and associated pictures. Our method achieves a 5-fold improvement over the state-of-the-art, and we show that the multi-task aspect of our approach promotes a semantically meaningful organization of the representation space, allowing it to perform subtasks never seen during training, like ingredient exclusion and selection. The results we present in this thesis open many possibilities, including feature compression for remote applications, robust multi-modal and multi-task learning, and feature space refinement. For the cooking application, in particular, many of our findings are directly applicable in a real-world context, especially for the detection of allergens, finding alternative recipes due to dietary restrictions, and menu planning
Nagorny, Pierre. "Contrôle automatique non-invasif de la qualité des produits : Application au procédé d'injection-moulage des thermoplastiques." Thesis, Chambéry, 2020. http://www.theses.fr/2020CHAMA008.
Повний текст джерелаInline quality control of the product is an important objective for industries growth. Controlling a product quality requires measurements of its quality characteristics. One hundred percent control is an important objective to overcome the limits of the control by sampling, in the case of defects related to exceptional causes. However, industrial constraints have limited the deployment of measurement of product characteristics directly within production lines. Human visual control is limited by its duration incompatible with the production cycle at high speed productions, by its cost and its variability. Computer vision systems present a cost that reserves them for productions with high added value. In addition, the automatic control of the quality of the appearance of the products remains an open research topic.Our work aims to meet these constraints, as part of the injection-molding process of thermoplastics. We propose a control system that is non-invasive for the production process. Parts are checked right out of the injection molding machine.We will study the contribution of non-conventional imaging. Thermography of a hot molded part provides information on its geometry, which is complementary to conventional imaging. Polarimetry makes it possible to discriminate curvature defects of surfaces that change the polarization angle of reflected light and defects in the structure of the material that diffuse light.Furthermore, specifications on products are more and more tighter. Specifications include complex geometric features, as well as appearance features, which are difficult to formalize. However, the appearance characteristics are difficult to formalize. To automate aspect control, it is necessary to model the notion of quality of a part. In order to exploit the measurements made on the hot parts, our approach uses statistical learning methods. Thus, the human expert who knows the notion of quality of a piece transmits his knowledge to the system, by the annotation of a set of learning data. Our control system then learns a metric of the quality of a part, from raw data from sensors. We favor a deep convolutional network approach (Deep Learning) in order to obtain the best performances in fairness of discrimination of the compliant parts. The small amount of annotated samples available in our industrial context has led us to use domain transfer learning methods.Finally, in order to meet all the constraints and validate our propositions, we realized the vertical integration of a prototype of device of measure of the parts and the software solution of treatment by statistical learning. The device integrates thermal imaging, polarimetric imaging, lighting and the on-board processing system necessary for sending data to a remote analysis server.Two application cases make it possible to evaluate the performance and viability of the proposed solution
Leclerc, Sarah Marie-Solveig. "Automatisation de la segmentation sémantique de structures cardiaques en imagerie ultrasonore par apprentissage supervisé." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI121.
Повний текст джерелаThe analysis of medical images plays a critical role in cardiology. Ultrasound imaging, as a real-time, low cost and bed side applicable modality, is nowadays the most commonly used image modality to monitor patient status and perform clinical cardiac diagnosis. However, the semantic segmentation (i.e the accurate delineation and identification) of heart structures is a difficult task due to the low quality of ultrasound images, characterized in particular by the lack of clear boundaries. To compensate for missing information, the best performing methods before this thesis relied on the integration of prior information on cardiac shape or motion, which in turns reduced the adaptability of the corresponding methods. Furthermore, such approaches require man- ual identifications of key points to be adapted to a given image, which makes the full process difficult to reproduce. In this thesis, we propose several original fully-automatic algorithms for the semantic segmentation of echocardiographic images based on supervised learning ap- proaches, where the resolution of the problem is automatically set up using data previously analyzed by trained cardiologists. From the design of a dedicated dataset and evaluation platform, we prove in this project the clinical applicability of fully-automatic supervised learning methods, in particular deep learning methods, as well as the possibility to improve the robustness by incorporating in the full process the prior automatic detection of regions of interest
Schmitt, Thomas. "Appariements collaboratifs des offres et demandes d’emploi." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS210/document.
Повний текст джерелаOur research focuses on the recommendation of new job offers that have just been posted and have no interaction history (cold start). To this objective, we adapt well-knowns recommendations systems in the field of e-commerce by exploiting the record of use of all job seekers on previous offers. One of the specificities of the work presented is to have considered real data, and to have tackled the challenges of heterogeneity and noise of textual documents. The presented contribution integrates the information of the collaborative data to learn a new representation of text documents, which is required to make the so-called cold start recommendation of a new offer. The new representation essentially aims to build a good metric. The search space considered is that of neural networks. Neural networks are trained by defining two loss functions. The first seeks to preserve the local structure of collaborative information, drawing on non-linear dimension reduction approaches. The second is inspired by Siamese networks to reproduce the similarities from the collaborative matrix. The scaling up of the approach and its performance are based on the sampling of pairs of offers considered similar. The interest of the proposed approach is demonstrated empirically on the real and proprietary data as well as on the CiteULike public benchmark. Finally, the interest of the approach followed is attested by our participation in a good rank in the international challenge RecSys 2017 (15/100, with millions of users and millions of offers)
Doras, Guillaume. "Automatic cover detection using deep learning." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS299.
Повний текст джерелаCovers are different interpretations of the same original musical work. They usually share a similar melodic line or harmonic structure, but typically differ greatly in one or several other dimensions, such as structure, tempo, key, instrumentation, genre, etc. Automatic cover detection – the task of finding and retrieving from an audio corpus all covers of one or several query tracks – has long been seen as a challenging theoretical problem. It also became an acute practical problem for with the ever-growing size of modern audio corpora.In this work, we propose to address the cover detection problem with a solution based on the metric learning paradigm. We show that this approach allows training of simple neural networks to extract out of a song an expressive and compact representation – its embedding – suitable for fast and effective retrieval in large audio corpora. We then propose a comparative study of different audio representations and show that systems combining melodic and harmonic features drastically outperform those relying on a single input representation. We illustrate how these features complement each other with both quantitative and qualitative analyses. We describe various fusion schemes and propose methods yielding state-of-the-art performances on publicly available large datasets. Finally, we describe theoretically how the embedding space is structured during training, and introduce an adaptation of the standard triplet loss which improves the results further. We finally describe an operational implementation of the method, and demonstrate its efficiency both in terms of accuracy and scalability in a real industrial context
Kaabi, Rabeb. "Apprentissage profond et traitement d'images pour la détection de fumée." Electronic Thesis or Diss., Toulon, 2020. http://www.theses.fr/2020TOUL0017.
Повний текст джерелаThis thesis deals with the problem of forest fire detection using image processing and machine learning tools. A forest fire is a fire that spreads over a wooded area. It can be of natural origin (due to lightning or a volcanic eruption) or human. Around the world, the impact of forest fires on many aspects of our daily lives is becoming more and more apparent on the entire ecosystem.Many methods have been shown to be effective in detecting forest fires. The originality of the present work lies in the early detection of fires through the detection of forest smoke and the classification of smoky and non-smoky regions using deep learning and image processing tools. A set of pre-processing techniques helped us to have an important database which allowed us afterwards to test the robustness of the model based on deep belief network we proposed and to evaluate the performance by calculating the following metrics (IoU, Accuracy, Recall, F1 score). Finally, the proposed algorithm is tested on several images in order to validate its efficiency. The simulations of our algorithm have been compared with those processed in the state of the art (Deep CNN, SVM...) and have provided very good results. The results of the proposed methods gave an average classification accuracy of about 96.5% for the early detection of smoke
Cuan, Bonan. "Deep similarity metric learning for multiple object tracking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.
Повний текст джерелаMultiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy