Tesi sul tema "Apprentissage automatique – Évaluation"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-50 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Apprentissage automatique – Évaluation".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Bove, Clara. "Conception et évaluation d’interfaces utilisateur explicatives pour systèmes complexes en apprentissage automatique". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS247.pdf.
Testo completoThis thesis focuses on human-centered eXplainable AI (XAI) and more specif- ically on the intelligibility of Machine Learning (ML) explanations for non-expert users. The technical context is as follows: on one side, either an opaque classifier or regressor provides a prediction, with an XAI post-hoc approach that generates pieces of information as explanations; on the other side, the user receives both the prediction and the explanations. Within this XAI technical context, several is- sues might lessen the quality of explanations. The ones we focus on are: the lack of contextual information in ML explanations, the unguided design of function- alities or the user’s exploration, as well as confusion that could be caused when delivering too much information. To solve these issues, we develop an experimental procedure to design XAI functional interfaces and evaluate the intelligibility of ML explanations by non-expert users. Doing so, we investigate the XAI enhancements provided by two types of local explanation components: feature importance and counterfac- tual examples. Thus, we propose generic XAI principles for contextualizing and allowing exploration on feature importance; and for guiding users in their com- parative analysis of counterfactual explanations with plural examples. We pro- pose an implementation of such principles into two distinct explanation-based user interfaces, respectively for an insurance and a financial scenarios. Finally, we use the enhanced interfaces to conduct users studies in lab settings and to measure two dimensions of intelligibility, namely objective understanding and subjective satisfaction. For local feature importance, we demonstrate that con- textualization and exploration improve the intelligibility of such explanations. Similarly for counterfactual examples, we demonstrate that the plural condition improve the intelligibility as well, and that comparative analysis appears to be a promising tool for users’ satisfaction. At a fundamental level, we consider the issue of inconsistency within ML explanations from a theoretical point of view. In the explanation process consid- ered for this thesis, the quality of an explanation relies both on the ability of the Machine Learning system to generate a coherent explanation and on the ability of the end user to make a correct interpretation of these explanations. Thus, there can be limitations: on one side, as reported in the literature, technical limitations of ML systems might produce potentially inconsistent explanations; on the other side, human inferences can be inaccurate, even if users are presented with con- sistent explanations. Investigating such inconsistencies, we propose an ontology to structure the most common ones from the literature. We advocate that such an ontology can be useful to understand current XAI limitations for avoiding explanations pitfalls
Pomorski, Denis. "Apprentissage automatique symbolique/numérique : construction et évaluation d'un ensemble de règles à partir des données". Lille 1, 1991. http://www.theses.fr/1991LIL10117.
Testo completoDang, Quang Vinh. "Évaluation de la confiance dans la collaboration à large échelle". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0002/document.
Testo completoLarge-scale collaborative systems wherein a large number of users collaborate to perform a shared task attract a lot of attention from both academic and industry. Trust is an important factor for the success of a large-scale collaboration. It is difficult for end-users to manually assess the trust level of each partner in this collaboration. We study the trust assessment problem and aim to design a computational trust model for collaborative systems. We focused on three research questions. 1. What is the effect of deploying a trust model and showing trust scores of partners to users? We designed and organized a user-experiment based on trust game, a well-known money-exchange lab-control protocol, wherein we introduced user trust scores. Our comprehensive analysis on user behavior proved that: (i) showing trust score to users encourages collaboration between them significantly at a similar level with showing nick- name, and (ii) users follow the trust score in decision-making. The results suggest that a trust model can be deployed in collaborative systems to assist users. 2. How to calculate trust score between users that experienced a collaboration? We designed a trust model for repeated trust game that computes user trust scores based on their past behavior. We validated our trust model against: (i) simulated data, (ii) human opinion, and (iii) real-world experimental data. We extended our trust model to Wikipedia based on user contributions to the quality of the edited Wikipedia articles. We proposed three machine learning approaches to assess the quality of Wikipedia articles: the first one based on random forest with manually-designed features while the other two ones based on deep learning methods. 3. How to predict trust relation between users that did not interact in the past? Given a network in which the links represent the trust/distrust relations between users, we aim to predict future relations. We proposed an algorithm that takes into account the established time information of the links in the network to predict future user trust/distrust relationships. Our algorithm outperforms state-of-the-art approaches on real-world signed directed social network datasets
Soumm, Michaël. "Refining machine learning evaluation : statistical insights into model performance and fairness". Electronic Thesis or Diss., université Paris-Saclay, 2024. https://theses.hal.science/tel-04951896.
Testo completoThis thesis addresses limitations in machine learning evaluation methodologies by introducing rigorous statistical approaches adapted from econometrics. Through applications in three distinct machine learning do-mains, we demonstrate how statistical tools can enhance model evaluation robustness, interpretability, and fairness. In class incremental learning, we examine the importance of pretraining methods compared to the choice of the incremental algorithm and show that these methods are crucial in determining final performance ; in face recognition systems, we quantify demographic biases and show that demographically-balanced synthetic data can significantly reduce performance disparities across ethnic groups ; in recommender systems, we develop novel information theory-based measures to analyze performance variations across user profiles, revealing that deep learning methods don’t consistently out-perform traditional approaches and highlighting the importance of user behavior patterns. These findings demonstrate the value of statistical rigor in machine learning evaluation and provide practical guidelines for improving model assessment across diverse applications
Choquette, Philippe. "Nouveaux algorithmes d'apprentissage pour classificateurs de type SCM". Master's thesis, Québec : Université Laval, 2007. http://www.theses.ulaval.ca/2007/24840/24840.pdf.
Testo completoBawden, Rachel. "Going beyond the sentence : Contextual Machine Translation of Dialogue". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS524/document.
Testo completoWhile huge progress has been made in machine translation (MT) in recent years, the majority of MT systems still rely on the assumption that sentences can be translated in isolation. The result is that these MT models only have access to context within the current sentence; context from other sentences in the same text and information relevant to the scenario in which they are produced remain out of reach. The aim of contextual MT is to overcome this limitation by providing ways of integrating extra-sentential context into the translation process. Context, concerning the other sentences in the text (linguistic context) and the scenario in which the text is produced (extra-linguistic context), is important for a variety of cases, such as discourse-level and other referential phenomena. Successfully taking context into account in translation is challenging. Evaluating such strategies on their capacity to exploit context is also a challenge, standard evaluation metrics being inadequate and even misleading when it comes to assessing such improvement in contextual MT. In this thesis, we propose a range of strategies to integrate both extra-linguistic and linguistic context into the translation process. We accompany our experiments with specifically designed evaluation methods, including new test sets and corpora. Our contextual strategies include pre-processing strategies designed to disambiguate the data on which MT models are trained, post-processing strategies to integrate context by post-editing MT outputs and strategies in which context is exploited during translation proper. We cover a range of different context-dependent phenomena, including anaphoric pronoun translation, lexical disambiguation, lexical cohesion and adaptation to properties of the scenario such as speaker gender and age. Our experiments for both phrase-based statistical MT and neural MT are applied in particular to the translation of English to French and focus specifically on the translation of informal written dialogues
Ghidalia, Sarah. "Etude sur les mesures d'évaluation de la cohérence entre connaissance et compréhension dans le domaine de l'intelligence artificielle". Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCK001.
Testo completoThis thesis investigates the concept of coherence within intelligent systems, aiming to assess how coherence can be understood and measured in artificial intelligence, with a particular focus on pre-existing knowledge embedded in these systems. This research is funded as part of the European H2020 RESPONSE project and is set in the context of smart cities, where assessing the consistency between AI predictions and real-world data is a fundamental prerequisite for policy initiatives. The main objective of this work is to examine consistency in the field of artificial intelligence meticulously and to conduct a thorough exploration of prior knowledge. To this end, we conduct a systematic literature review to map the current landscape, focusing on the convergence and interaction between machine learning and ontologies, and highlighting, in particular, the algorithmic techniques employed. In addition, our comparative analysis positions our research in the broader context of important work in the field.An in-depth study of different knowledge integration methods is undertaken to analyze how consistency can be assessed based on the learning techniques employed. The overall quality of artificial intelligence systems, with particular emphasis on consistency assessment, is also examined. The whole study is then applied to the coherence evaluation of models concerning the representation of physical laws in ontologies. We present two case studies, one on predicting the motion of a harmonic oscillator and the other on estimating the lifetime of materials, to highlight the importance of respecting physical constraints in consistency assessment. In addition, we propose a new method for formalizing knowledge within an ontology and evaluate its effectiveness. This research aims to provide new perspectives in the evaluation of machine learning algorithms by introducing a coherence evaluation method. This thesis aspires to make a substantial contribution to the field of artificial intelligence by highlighting the critical role of consistency in the development of reliable and relevant intelligent systems
Douwes, Constance. "On the Environmental Impact of Deep Generative Models for Audio". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS074.
Testo completoIn this thesis, we investigate the environmental impact of deep learning models for audio generation and we aim to put computational cost at the core of the evaluation process. In particular, we focus on different types of deep learning models specialized in raw waveform audio synthesis. These models are now a key component of modern audio systems, and their use has increased significantly in recent years. Their flexibility and generalization capabilities make them powerful tools in many contexts, from text-to-speech synthesis to unconditional audio generation. However, these benefits come at the cost of expensive training sessions on large amounts of data, operated on energy-intensive dedicated hardware, which incurs large greenhouse gas emissions. The measures we use as a scientific community to evaluate our work are at the heart of this problem. Currently, deep learning researchers evaluate their works primarily based on improvements in accuracy, log-likelihood, reconstruction, or opinion scores, all of which overshadow the computational cost of generative models. Therefore, we propose using a new methodology based on Pareto optimality to help the community better evaluate their work's significance while bringing energy footprint -- and in fine carbon emissions -- at the same level of interest as the sound quality. In the first part of this thesis, we present a comprehensive report on the use of various evaluation measures of deep generative models for audio synthesis tasks. Even though computational efficiency is increasingly discussed, quality measurements are the most commonly used metrics to evaluate deep generative models, while energy consumption is almost never mentioned. Therefore, we address this issue by estimating the carbon cost of training generative models and comparing it to other noteworthy carbon costs to demonstrate that it is far from insignificant. In the second part of this thesis, we propose a large-scale evaluation of pervasive neural vocoders, which are a class of generative models used for speech generation, conditioned on mel-spectrogram. We introduce a multi-objective analysis based on Pareto optimality of both quality from human-based evaluation and energy consumption. Within this framework, we show that lighter models can perform better than more costly models. By proposing to rely on a novel definition of efficiency, we intend to provide practitioners with a decision basis for choosing the best model based on their requirements. In the last part of the thesis, we propose a method to reduce the inference costs of neural vocoders, based on quantizated neural networks. We show a significant gain on the memory size and give some hints for the future use of these models on embedded hardware. Overall, we provide keys to better understand the impact of deep generative models for audio synthesis as well as a new framework for developing models while accounting for their environmental impact. We hope that this work raises awareness on the need to investigate energy-efficient models simultaneously with high perceived quality
Pavão, Adrien. "Methodology for Design and Analysis of Machine Learning Competitions". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG088.
Testo completoWe develop and study a systematic and unified methodology to organize and use scientific challenges in research, particularly in the domain of machine learning (data-driven artificial intelligence). As of today, challenges are becoming more and more popular as a pedagogic tool and as a means of pushing the state-of-the-art by engaging scientists of all ages, within or outside academia. This can be thought of as a form of citizen science. There is the promise that this form of community involvement in science might contribute to reproducible research and democratize artificial intelligence. However, while the distinction between organizers and participants may mitigate certain biases, there exists a risk that biases in data selection, scoring metrics, and other experimental design elements could compromise the integrity of the outcomes and amplify the influence of randomness. In extreme cases, the results could range from being useless to detrimental for the scientific community and, ultimately, society at large. Our objective is to structure challenge organization within a rigorous framework and offer the community insightful guidelines. In conjunction with the tools of challenge organization that we are developing as part of the CodaLab project, we aim to provide a valuable contribution to the community. This thesis includes theoretical fundamental contributions drawing on experimental design, statistics and game theory, and practical empirical findings resulting from the analysis of data from previous challenges
Dang, Quang Vinh. "Évaluation de la confiance dans la collaboration à large échelle". Electronic Thesis or Diss., Université de Lorraine, 2018. http://www.theses.fr/2018LORR0002.
Testo completoLarge-scale collaborative systems wherein a large number of users collaborate to perform a shared task attract a lot of attention from both academic and industry. Trust is an important factor for the success of a large-scale collaboration. It is difficult for end-users to manually assess the trust level of each partner in this collaboration. We study the trust assessment problem and aim to design a computational trust model for collaborative systems. We focused on three research questions. 1. What is the effect of deploying a trust model and showing trust scores of partners to users? We designed and organized a user-experiment based on trust game, a well-known money-exchange lab-control protocol, wherein we introduced user trust scores. Our comprehensive analysis on user behavior proved that: (i) showing trust score to users encourages collaboration between them significantly at a similar level with showing nick- name, and (ii) users follow the trust score in decision-making. The results suggest that a trust model can be deployed in collaborative systems to assist users. 2. How to calculate trust score between users that experienced a collaboration? We designed a trust model for repeated trust game that computes user trust scores based on their past behavior. We validated our trust model against: (i) simulated data, (ii) human opinion, and (iii) real-world experimental data. We extended our trust model to Wikipedia based on user contributions to the quality of the edited Wikipedia articles. We proposed three machine learning approaches to assess the quality of Wikipedia articles: the first one based on random forest with manually-designed features while the other two ones based on deep learning methods. 3. How to predict trust relation between users that did not interact in the past? Given a network in which the links represent the trust/distrust relations between users, we aim to predict future relations. We proposed an algorithm that takes into account the established time information of the links in the network to predict future user trust/distrust relationships. Our algorithm outperforms state-of-the-art approaches on real-world signed directed social network datasets
Sheeren, David. "Méthodologie d' évaluation de la cohérence inter-représentations pour l'intégration de bases de données spatiales : une approche combinant l' utilisation de métadonnées et l' apprentissage automatique". Paris 6, 2005. https://tel.archives-ouvertes.fr/tel-00085693.
Testo completoBenamar, Alexandra. "Évaluation et adaptation de plongements lexicaux au domaine à travers l'exploitation de connaissances syntaxiques et sémantiques". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG035.
Testo completoWord embeddings have established themselves as the most popular representation in NLP. To achieve good performance, they require training on large data sets mainly from the general domain and are frequently finetuned for specialty data. However, finetuning is a resource-intensive practice and its effectiveness is controversial.In this thesis, we evaluate the use of word embedding models on specialty corpora and show that proximity between the vocabularies of the training and application data plays a major role in the representation of out-of-vocabulary terms. We observe that this is mainly due to the initial tokenization of words and propose a measure to compute the impact of the tokenization of words on their representation. To solve this problem, we propose two methods for injecting linguistic knowledge into representations generated by Transformers: one at the data level and the other at the model level. Our research demonstrates that adding syntactic and semantic context can improve the application of self-supervised models to specialty domains, both for vocabulary representation and for NLP tasks.The proposed methods can be used for any language with linguistic information or external knowledge available. The code used for the experiments has been published to facilitate reproducibility and measures have been taken to limit the environmental impact by reducing the number of experiments
Nouradine, Haroun. "Évaluation des ressources en eau dans les aquifères de socle dans la région du Guéra (Tchad) : combinaison d'approches géologiques, hydrogéologiques, géophysiques, géochimiques et d'apprentissage automatique". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS665.pdf.
Testo completoThe crystalline basement aquifers present a major challenge for today's hydrogeologists due to their heterogeneity and discontinuity. They are the main source of drinking water in several regions of the world, particularly in sub-Saharan Africa and Chad. However, the crystalline basement aquifers in Chad have been poorly studied, making it difficult to exploit them to meet the water needs of the population. Our study focuses on the Guéra region, located in the Lake Chad Basin, which is characterized by a crystalline basement composed of 90% granitoids and metamorphic rocks, and subjected to a Sahelian-Sudanian climate. This region was chosen for this study due to the availability of existing data. Despite efforts to improve access to the resource using hydrogeological and geophysical techniques based on 1D and 2D electrical methods combined with lineaments, the failure rate of water wells remains high.In order to better understand the functioning of the crystalline basement aquifers and improve access to drinking water in this region, we propose in this thesis a multidimensional approach, combining geology, hydrogeology, geophysics, geochemistry, and machine learning. The hydrogeophysical approach, based on the in-depth exploitation of numerous existing data (technical data from 798 wells, 700 EM34 profiles, and 592 electrical panels), has allowed us to identify the main formations on which the local hydrogeological conceptual model is based, as well as their range of electrical resistivity, and to determine the factors that control the productivity of the aquifers. The installation of a preliminary automated piezometric monitoring network since 2021 has addressed the dynamics of groundwater fluctuations. Geochemical and isotopic methods, applied to 211 samples, have allowed us to identify and understand the processes of groundwater mineralization, differentiate between different aquifer formations, validate the conceptual model, assess vulnerability, and understand recharge mechanisms and groundwater age. Finally, a machine learning method has been tested using the data produced in this thesis to evaluate the potential of this approach to identify productivity criteria and map on a large scale the areas where the potential for groundwater is favorable for well installation.Keywords: Crystalline basement aquifer, geophysics, hydrogeology, conceptual model, geochemistry, machine learning, Guéra (Chad)
Benayache, Ahcène. "Construction d'une mémoire organisationnelle de formation et évaluation dans un contexte e-learning : Le projet MEMORAe". Compiègne, 2005. http://www.theses.fr/2005COMP1591.
Testo completoMany documents and resources are now available in order to support e-Iearning. Some are internaI and made by several actors implied in the e-Iearning. Others are available on the web: on-line courses, course supports, slides, bibliographies, frequently asked questions, lecture notes, etc. The increasing number of available resources is a real problem in content management systems. Ln This PhD, we consider a course like an organization, in which different actors are involved. We proposes to manage the informations, documents and knowledge of this organization by means of a learning organizational memory based on ontologies. It was carried out in the context of the MEMORAe project focusing on two application scenarios: the contribution of the knowledge engineering in the educational domain and the learning by exploration based on ontologies. Three aspects were essentially developed in this work : the contribution of an organizational memory in the e-Iearning context ; the choices of (a) using ontologies to model metadata, and (b) to represent them with the Topic Maps formalism ; the design and implementation of the E-MEMORAe, an environment assistance for e-Iearning, and the evaluation of this environment with students in the framework of : the B31. 1 applied mathematics course at the University of Picardy in France, and the NF01 algorithms and programming course at the University of Technology of Compiègne
Nikoulina, Vassilina. "Modèle de traduction statistique à fragments enrichi par la syntaxe". Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM008.
Testo completoTraditional Statistical Machine Translation models are not aware of linguistic structure. Thus, target lexical choices and word order are controlled only by surface-based statistics learned from the training corpus. However, knowledge of linguistic structure can be beneficial since it provides generic information compensating data sparsity. The purpose of our work is to study the impact of syntactic information while preserving the general framework of Phrase-Based SMT. First, we study the integration of syntactic information using a reranking approach. We define features measuring the similarity between the dependency structures of source and target sentences, as well as features of linguistic coherence of the target sentences. The importance of each feature is assessed by learning their weights through a Structured Perceptron Algorithm. The evaluation of several reranking models shows that these features often improve the quality of translations produced by the basic model, in terms of manual evaluations as opposed to automatic measures. Then, we propose different models in order to increase the quality and diversity of the search graph produced by the decoder, through filtering out uninteresting hypotheses based on the source syntactic structure. This is done either by learning limits on the phrase recordering, or by decomposing the source sentence in order to simplify the translation process. The initial evaluations of these models look promising
Nikoulina, Vassilina. "Modèle de traduction statistique à fragments enrichi par la syntaxe". Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00996317.
Testo completoWeill, Jean-Christophe. "Programmes d'échecs de championnat : architecture logicielle, synthèse de fonctions d'évaluation, parallélisme de recherche". Paris 8, 1995. http://www.theses.fr/1995PA080954.
Testo completoAouladhadj, Driss. "Méthodes de détection et de reconnaissance de modèles de drone par surveillance et analyse de l'activité radio fréquence". Electronic Thesis or Diss., Université Gustave Eiffel, 2023. http://www.theses.fr/2023UEFL2067.
Testo completoThe development of drones and their increasing affordability pose a threat, especially to critical sites and major public events. Sky surveillance is crucial to ensure that drones do not enter sensitive areas or target crowds, potentially carrying explosives. Traditional surveillance techniques, based on visual, thermal, or radar detection, have limitations in urban settings due to obstacles like buildings, the small size of drones, and weather variations. In this light, passive radiofrequency (RF) monitoring emerges as a promising solution. Most commercially available UAVs utilize RF communications with various standardized and proprietary protocols. This thesis delves into the design of methods to detect, identify, and locate drones by analyzing their RF communications. By examining specific signals emitted by different drones, this research develops techniques that merge signal processing with artificial intelligence to identify these protocols. Key challenges addressed in this work include interference from nearby devices and the physical behavior of signals, such as fading and multipath effects. The primary goal of this research is to devise an advanced jamming system to safeguard high-risk zones, such as airports or public gathering sites. This system will work in tandem with a jamming device to neutralize drones, especially in urban areas. To prevent interference with other communication devices or property damage, the jamming strategy is tailored based on the detected drone protocol, ensuring accurate and targeted intervention
Martin, Louis. "Simplification automatique de phrases à l'aide de méthodes contrôlables et non supervisées". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS265.
Testo completoIn this thesis we study the task of automatic sentence simplification. We first study the different methods used to evaluate simplification models, highlight several shortcomings of current approaches, and propose new contributions. We then propose to train sentence simplification models that can be adapted to the target user, allowing for greater simplification flexibility. Finally, we extend the scope of sentence simplification to several languages, by proposing methods that do not require annotated training data, but that nevertheless achieve very strong performance
Laugel, Thibault. "Interprétabilité locale post-hoc des modèles de classification "boites noires"". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS215.
Testo completoThis thesis focuses on the field of XAI (eXplainable AI), and more particularly local post-hoc interpretability paradigm, that is to say the generation of explanations for a single prediction of a trained classifier. In particular, we study a fully agnostic context, meaning that the explanation is generated without using any knowledge about the classifier (treated as a black-box) nor the data used to train it. In this thesis, we identify several issues that can arise in this context and that may be harmful for interpretability. We propose to study each of these issues and propose novel criteria and approaches to detect and characterize them. The three issues we focus on are: the risk of generating explanations that are out of distribution; the risk of generating explanations that cannot be associated to any ground-truth instance; and the risk of generating explanations that are not local enough. These risks are studied through two specific categories of interpretability approaches: counterfactual explanations, and local surrogate models
Richard, Michael. "Évaluation et validation de prévisions en loi". Thesis, Orléans, 2019. http://www.theses.fr/2019ORLE0501.
Testo completoIn this thesis, we study the evaluation and validation of predictive densities. In a first part, we are interested in the contribution of machine learning in the field of quantile and densityforecasting. We use some machine learning algorithms in quantile forecasting framework with real data, inorder to highlight the efficiency of particular method varying with nature of the data.In a second part, we expose some validation tests of predictive densities present in the literature. Asillustration, we use two of the mentionned tests on real data concerned about stock indexes log-returns.In the third part, we address the calibration constraint of probability forecasting. We propose a generic methodfor recalibration, which allows us to enforce this constraint. Thus, it permits to simplify the choice betweensome density forecasts. It remains to be known the impact on forecast quality, measured by predictivedistributions sharpness, or specific scores. We show that the impact on the Continuous Ranked ProbabilityScore (CRPS) is weak under some hypotheses and that it is positive under more restrictive ones. We use ourmethod on weather and electricity price ensemble forecasts.Keywords : Density forecasting, quantile forecasting, machine learning, validity tests, calibration, bias correction,PIT series , Pinball-Loss, CRPS
Thomas, Julien. "Apprentissage supervisé de données déséquilibrées par forêt aléatoire". Thesis, Lyon 2, 2009. http://www.theses.fr/2009LYO22004/document.
Testo completoThe problem of imbalanced datasets in supervised learning has emerged relatively recently, since the data mining has become a technology widely used in industry. The assisted medical diagnosis, the detection of fraud, abnormal phenomena, or specific elements on satellite imagery, are examples of industrial applications based on supervised learning of imbalanced datasets. The goal of our work is to bring supervised learning process on this issue. We also try to give an answer about the specific requirements of performance often related to the problem of imbalanced datasets, such as a high recall rate for the minority class. This need is reflected in our main application, the development of software to help radiologist in the detection of breast cancer. For this, we propose new methods of amending three different stages of a learning process. First in the sampling stage, we propose in the case of a bagging, to replaced classic bootstrap sampling by a guided sampling. Our techniques, FUNSS and LARSS use neighbourhood properties for the selection of objects. Secondly, for the representation space, our contribution is a method of variables construction adapted to imbalanced datasets. This method, the algorithm FuFeFa, is based on the discovery of predictive association rules. Finally, at the stage of aggregation of base classifiers of a bagging, we propose to optimize the majority vote in using weightings. For this, we have introduced a new quantitative measure of model assessment, PRAGMA, which allows taking into account user specific needs about recall and precision rates of each class
Asri, Layla El. "Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems". Electronic Thesis or Diss., Université de Lorraine, 2016. http://www.theses.fr/2016LORR0350.
Testo completoThis document proposes to learn the behaviour of the dialogue manager of a spoken dialogue system from a set of rated dialogues. This learning is performed through reinforcement learning. Our method does not require the definition of a representation of the state space nor a reward function. These two high-level parameters are learnt from the corpus of rated dialogues. It is shown that the spoken dialogue designer can optimise dialogue management by simply defining the dialogue logic and a criterion to maximise (e.g user satisfaction). The methodology suggested in this thesis first considers the dialogue parameters that are necessary to compute a representation of the state space relevant for the criterion to be maximized. For instance, if the chosen criterion is user satisfaction then it is important to account for parameters such as dialogue duration and the average speech recognition confidence score. The state space is represented as a sparse distributed memory. The Genetic Sparse Distributed Memory for Reinforcement Learning (GSDMRL) accommodates many dialogue parameters and selects the parameters which are the most important for learning through genetic evolution. The resulting state space and the policy learnt on it are easily interpretable by the system designer. Secondly, the rated dialogues are used to learn a reward function which teaches the system to optimise the criterion. Two algorithms, reward shaping and distance minimisation are proposed to learn the reward function. These two algorithms consider the criterion to be the return for the entire dialogue. These functions are discussed and compared on simulated dialogues and it is shown that the resulting functions enable faster learning than using the criterion directly as the final reward. A spoken dialogue system for appointment scheduling was designed during this thesis, based on previous systems, and a corpus of rated dialogues with this system were collected. This corpus illustrates the scaling capability of the state space representation and is a good example of an industrial spoken dialogue system upon which the methodology could be applied
Asri, Layla El. "Learning the Parameters of Reinforcement Learning from Data for Adaptive Spoken Dialogue Systems". Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0350/document.
Testo completoThis document proposes to learn the behaviour of the dialogue manager of a spoken dialogue system from a set of rated dialogues. This learning is performed through reinforcement learning. Our method does not require the definition of a representation of the state space nor a reward function. These two high-level parameters are learnt from the corpus of rated dialogues. It is shown that the spoken dialogue designer can optimise dialogue management by simply defining the dialogue logic and a criterion to maximise (e.g user satisfaction). The methodology suggested in this thesis first considers the dialogue parameters that are necessary to compute a representation of the state space relevant for the criterion to be maximized. For instance, if the chosen criterion is user satisfaction then it is important to account for parameters such as dialogue duration and the average speech recognition confidence score. The state space is represented as a sparse distributed memory. The Genetic Sparse Distributed Memory for Reinforcement Learning (GSDMRL) accommodates many dialogue parameters and selects the parameters which are the most important for learning through genetic evolution. The resulting state space and the policy learnt on it are easily interpretable by the system designer. Secondly, the rated dialogues are used to learn a reward function which teaches the system to optimise the criterion. Two algorithms, reward shaping and distance minimisation are proposed to learn the reward function. These two algorithms consider the criterion to be the return for the entire dialogue. These functions are discussed and compared on simulated dialogues and it is shown that the resulting functions enable faster learning than using the criterion directly as the final reward. A spoken dialogue system for appointment scheduling was designed during this thesis, based on previous systems, and a corpus of rated dialogues with this system were collected. This corpus illustrates the scaling capability of the state space representation and is a good example of an industrial spoken dialogue system upon which the methodology could be applied
Caigny, Arno de. "Innovation in customer scoring for the financial services industry". Thesis, Lille, 2019. http://www.theses.fr/2019LIL1A011.
Testo completoThis dissertation improves customer scoring. Customer scoring is important for companies in their decision making processes because it helps to solve key managerial issues such as the decision of which customers to target for a marketing campaign or the assessment of customer that are likely to leave the company. The research in this dissertation makes several contributions in three areas of the customer scoring literature. First, new sources of data are used to score customers. Second, methodology to go from data to decisions is improved. Third, customer life event prediction is proposed as a new application of customer scoring
L'Hour, Jérémy. "Policy evaluation, high-dimension and machine learning". Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLG008.
Testo completoThis dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance
Mpawenimana, Innocent. "Modélisation et conception d’objets connectés au service des maisons intelligentes : Évaluation et optimisation de leur autonomie et de leur QoS". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4107.
Testo completoThis PhD thesis is in the field of smart homes, and more specifically in the energy consumption optimization process for a home having an ambient energy source harvesting and storage system. The objective is to propose services to handle the household energy consumption and to promote self-consumption. To do so, relevant data must be first collected (current, active and reactive power consumption, temperature and so on). In this PhD, data have been first sensed using an intrusive load approach. Despite our efforts to build our own data base, we decided to use an online available dataset for the rest of this study. Different supervised machine learning algorithms have been evaluated from this dataset to identify home appliances with accuracy. Obtained results showed that only active and reactive power can be used for that purpose. To further optimize the accuracy, we proposed to use a moving average function for reducing the random variations in the observations. A non-intrusive load approach has been finally adopted to rather determine the global household active energy consumption. Using an online existing dataset, a machine learning algorithm based on Long Short-Term Memory (LSTM) has then been proposed to predict, over different time scale, the global household consumed energy. Long Short-Term Memory was also used to predict, for different weather profiles, the power that can be harvested from solar cells. Those predictions of consumed and harvested energy have been finally exploited by a Home Energy Management policy optimizing self-consumption. Simulation results show that the size of the solar cells as well as the battery impacts the self-consumption rate and must be therefore meticulously chosen
Nicol, Olivier. "Data-driven evaluation of contextual bandit algorithms and applications to dynamic recommendation". Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10211/document.
Testo completoThe context of this thesis work is dynamic recommendation. Recommendation is the action, for an intelligent system, to supply a user of an application with personalized content so as to enhance what is refered to as "user experience" e.g. recommending a product on a merchant website or even an article on a blog. Recommendation is considered dynamic when the content to recommend or user tastes evolve rapidly e.g. news recommendation. Many applications that are of interest to us generates a tremendous amount of data through the millions of online users they have. Nevertheless, using this data to evaluate a new recommendation technique or even compare two dynamic recommendation algorithms is far from trivial. This is the problem we consider here. Some approaches have already been proposed. Nonetheless they were not studied very thoroughly both from a theoretical point of view (unquantified bias, loose convergence bounds...) and from an empirical one (experiments on private data only). In this work we start by filling many blanks within the theoretical analysis. Then we comment on the result of an experiment of unprecedented scale in this area: a public challenge we organized. This challenge along with a some complementary experiments revealed a unexpected source of a huge bias: time acceleration. The rest of this work tackles this issue. We show that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it
Eickenberg, Michael. "Évaluation de modèles computationnels de la vision humaine en imagerie par résonance magnétique fonctionnelle". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112206/document.
Testo completoBlood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) makes it possible to measure brain activity through blood flow to areas with metabolically active neurons. In this thesis we use these measurements to evaluate the capacity of biologically inspired models of vision coming from computer vision to represent image content in a similar way as the human brain. The main vision models used are convolutional networks.Deep neural networks have made unprecedented progress in many fields in recent years. Even strongholds of biological systems such as scene analysis and object detection have been addressed with enormous success. A body of prior work has been able to establish firm links between the first and last layers of deep convolutional nets and brain regions: The first layer and V1 essentially perform edge detection and the last layer as well as inferotemporal cortex permit a linear read-out of object category. In this work we have generalized this correspondence to all intermediate layers of a convolutional net. We found that each layer of a convnet maps to a stage of processing along the ventral stream, following the hierarchy of biological processing: Along the ventral stream we observe a stage-by-stage increase in complexity. Between edge detection and object detection, for the first time we are given a toolbox to study the intermediate processing steps.A preliminary result to this was obtained by studying the response of the visual areas to presentation of visual textures and analysing it using convolutional scattering networks.The other global aspect of this thesis is “decoding” models: In the preceding part, we predicted brain activity from the stimulus presented (this is called “encoding”). Predicting a stimulus from brain activity is the inverse inference mechanism and can be used as an omnibus test for presence of this information in brain signal. Most often generalized linear models such as linear or logistic regression or SVMs are used for this task, giving access to a coefficient vector the same size as a brain sample, which can thus be visualized as a brain map. However, interpretation of these maps is difficult, because the underlying linear system is either ill-defined and ill-conditioned or non-adequately regularized, resulting in non-informative maps. Supposing a sparse and spatially contiguous organization of coefficient maps, we build on the convex penalty consisting of the sum of total variation (TV) seminorm and L1 norm (“TV+L1”) to develop a penalty grouping an activation term with a spatial derivative. This penalty sets most coefficients to zero but permits free smooth variations in active zones, as opposed to TV+L1 which creates flat active zones. This method improves interpretability of brain maps obtained through cross-validation to determine the best hyperparameter.In the context of encoding and decoding models, we also work on improving data preprocessing in order to obtain the best performance. We study the impulse response of the BOLD signal: the hemodynamic response function. To generate activation maps, instead of using a classical linear model with fixed canonical response function, we use a bilinear model with spatially variable hemodynamic response (but fixed across events). We propose an efficient optimization algorithm and show a gain in predictive capacity for encoding and decoding models on different datasets
Guettari, Nadjib. "Évaluation du contenu d'une image couleur par mesure basée pixel et classification par la théorie des fonctions de croyance". Thesis, Poitiers, 2017. http://www.theses.fr/2017POIT2275/document.
Testo completoNowadays it has become increasingly simpler for anyone to take pictures with digital cameras, to download these images to the computer and to use different image processing software to apply modifications on these images (Compression, denoising, transmission, etc.). However, these treatments lead to degradations which affect the visual quality of the image. In addition, with the widespread use of the Internet and the growth of electronic mail, sophisticated image-editing software has been democratised allowing to falsify images for legitimate or malicious purposes for confidential or secret communications. In this context, steganography is a method of choice for embedding and transmitting information.In this manuscript we discussed two issues : the image quality assessment and the detection of modification or the presence of hidden information in an image. The first objective is to develop a No-Reference measure allowing to automatically evaluate the quality of an image in correlation with the human visual appreciation. Then we propose a steganalysis scheme to detect, with the best possible reliability, the presence of information embedded in natural images. In this thesis, the challenge is to take into account the imperfection of the manipulated data coming from different sources of information with different degrees of precision. In this context, in order to take full advantage of all this information, we propose to use the theory of belief functions. This theory makes it possible to represent knowledge in a relatively natural way in the form of a belief structure.We proposed a No-reference image quality assessment measure, which is able to estimate the quality of the degraded images with multiple types of distortion. This approach, called wms-EVreg2, is based on the fusion of different statistical features, extracted from the image, depending on the reliability of each set of features estimated through the confusion matrix. From the various experiments, we found that wms-EVreg2 has a good correlation with subjective quality scores and provides competitive quality prediction performance compared to Full-reference image quality measures.For the second problem addressed, we proposed a steganalysis scheme based on the theory of belief functions constructed on random subspaces of the features. The performance of the proposed method was evaluated on different steganography algorithms in the JPEG transform domain as well as in the spatial domain. These experimental tests have shown the performance of the proposed method in some application frameworks. However, there are many configurations that reside undetectable
Sun, Yan. "Simulation du cycle biogéochimique du phosphore dans le modèle de surface terrestre ORCHIDEE : évaluation par rapport à des données d'observation locales et mondiales". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASJ001.
Testo completoPhosphorus (P) plays a critical role in controlling metabolic processes, soil organic matter dynamics, plant growth and ecosystem productivity, thereby affecting greenhouse gas balance (GHG) of land ecosystems. A small number of land surface models have incorporated P cycles but their predictions of GHG balances remain highly uncertain. The reasons are: (1) scarce benchmarking data for key P-related processes (e.g. continental to global scale gridded datasets), (2) lack of comprehensive global evaluation strategy tailored for d P processes and interlinkages with carbon and nitrogen (N) cycles, and (3) insufficient model calibration limited by the high computation cost to simulate coupled CNP cycles which operate on timescales of minutes to millenia. Addressing those research gaps, I apply a combination of statistical methods (machine learning), LSMs and observational data among various scales.Firstly (Chapter 2), to address the lack of benchmarking data, I applied two machine-learning methods with the aim to produce spatial gridded maps of acid phosphatase (AP) activity on continental scale by scaling up scattered site observations of potential AP activity. AP secreted by fungi, bacteria and plant roots play an important role in recycling of soil P via transforming unavailable organic P into assimilable phosphate. The back-propagation artificial network (BPN) method that was chosen explained 58% of AP variability and was able to identify the gradients in AP along three transects in Europe. Soil nutrients (total nitrogen, total P and labile organic P) and climatic controls (annual precipitation, mean annual temperature and temperature amplitude) were detected to be the dominant factors influencing AP variations in space.Secondly (Chapter 3), I evaluated the performance of the global version of the land surface model ORCHIDEE-CNP (v1.2) using the data from chapter 2 as well as additional data from remote-sensing, ground-based measurement networks and ecological databases. Simulated components of the N and P cycle at different levels of aggregation (from local to global) are in good agreement with data-driven estimates. We identified model biases, in the simulated large-scale patterns of leaf and soil stoichiometry and plant P use efficiency, which point towards an underestimation of P availability towards the poles. Based on our analysis, we propose ways to address the model biases by giving priority to better representing processes of soil organic P mineralization and soil inorganic P transformation.Lastly (Chapter 4), I designed and tested a Machine Learning (ML)-based procedure for acceleration of the equilibration of biogeochemical cycles to boundary conditions (spinup) which is causing the low computational efficiency of current P-enabled LSMs. This ML-based acceleration approach (MLA) requires to spin-up only a small subset of model pixels (14.1%) from which the equilibrium state of the remaining pixels is estimated by ML. MLA predicts the equilibrium state of soil, biomass and litter C, N and P on both PFT and global scale sufficiently well as indicated by the minor error introduced in simulating current land carbon balance. The computational consumption of MLA is about one order of magnitude less than the currently used approach, which opens the opportunity of data assimilation using the ever-growing observation datasets.In the outlook, specific applications of the MLA approach and future research priorities are discussed to further improve the reliability and robustness of phosphorus-enabled land surface models
Alves, da Silva Guilherme. "Traitement hybride pour l'équité algorithmique". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0323.
Testo completoAlgorithmic decisions are currently being used on a daily basis. These decisions often rely on Machine Learning (ML) algorithms that may produce complex and opaque ML models. Recent studies raised unfairness concerns by revealing discriminating outcomes produced by ML models against minorities and unprivileged groups. As ML models are capable of amplifying discrimination against minorities due to unfair outcomes, it reveals the need for approaches that uncover and remove unintended biases. Assessing fairness and mitigating unfairness are the two main tasks that have motivated the growth of the research field called {algorithmic fairness}. Several notions used to assess fairness focus on the outcomes and link to sensitive features (e.g. gender and ethnicity) through statistical measures. Although these notions have distinct semantics, the use of these definitions of fairness is criticized for being a reductionist understanding of fairness whose aim is basically to implement accept/not-accept reports, ignoring other perspectives on inequality and on societal impact. Process fairness instead is a subjective fairness notion which is centered on the process that leads to outcomes. To mitigate or remove unfairness, approaches generally apply fairness interventions in specific steps. They usually change either (1) the data before training or (2) the optimization function or (3) the algorithms' outputs in order to enforce fairer outcomes. Recently, research on algorithmic fairness have been dedicated to explore combinations of different fairness interventions, which is referred to in this thesis as {fairness hybrid-processing}. Once we try to mitigate unfairness, a tension between fairness and performance arises that is known as the fairness-accuracy trade-off. This thesis focuses on the fairness-accuracy trade-off problem since we are interested in reducing unintended biases without compromising classification performance. We thus propose ensemble-based methods to find a good compromise between fairness and classification performance of ML models, in particular models for binary classification. In addition, these methods produce ensemble classifiers thanks to a combination of fairness interventions, which characterizes the fairness hybrid-processing approaches. We introduce FixOut ({F}a{I}rness through e{X}planations and feature drop{Out}), the human-centered, model-agnostic framework that improves process fairness without compromising classification performance. It receives a pre-trained classifier (original model), a dataset, a set of sensitive features, and an explanation method as input, and it outputs a new classifier that is less reliant on the sensitive features. To assess the reliance of a given pre-trained model on sensitive features, FixOut uses explanations to estimate the contribution of features to models' outcomes. If sensitive features are shown to contribute globally to models' outcomes, then the model is deemed unfair. In this case, it builds a pool of fairer classifiers that are then aggregated to obtain an ensemble classifier. We show the adaptability of FixOut on different combinations of explanation methods and sampling approaches. We also evaluate the effectiveness of FixOut w.r.t. to process fairness but also using well-known standard fairness notions available in the literature. Furthermore, we propose several improvements such as automating the choice of FixOut's parameters and extending FixOut to other data types
Liu, Kaixuan. "Study on knowledge-based garment design and fit evaluation system". Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10020/document.
Testo completoFashion design and fit evaluation play a very important role in the clothing industry. Garment style and fit directly determine whether a customer buys the garment or not. In order to develop a fit garment, designers and pattern makers should adjust style and pattern many times until the satisfaction of their customers. Currently, the traditional fashion design and fit evaluation have three main shortcomings: 1) very time-consuming and low efficiency, 2) requiring experienced designers, and 3) not suitable for garment e-shopping. In my Ph.D. thesis, we propose three key technologies to improve the current design processes in the clothing industry. The first one is the Garment Flat and Pattern Associated design technology (GFPADT). The second one is the 3D interactive garment pattern making technology (3DIGPMT). The last one is the Machine learning-based Garment Fit Evaluation technology (MLBGFET). Finally, we provide a number of knowledge-based garment design and fit evaluation solutions (processes) by combining the proposed three key technologies to deal with garment design and production issues of fashions companies
Al-Kharaz, Mohammed. "Analyse multivariée des alarmes de diagnostic en vue de la prédiction de la qualité des produits". Electronic Thesis or Diss., Aix-Marseille, 2021. http://theses.univ-amu.fr.lama.univ-amu.fr/211207_ALKHARAZ_559anw633vgnlp70s324svilo_TH.pdf.
Testo completoThis thesis addresses the prediction of product quality and improving the performance of diagnostic alarms in a semiconductor facility. For this purpose, we exploit the alarm history collected during production. First, we propose an approach to model and estimate the degradation risk of the final product associated with each alarm triggered according to its activation behavior on all products during production. Second, using the estimated risk values for any alarm, we propose an approach to predict the final quality of the product's lot. This approach models the link between process alarm events and the final quality of product lot through machine learning techniques. We also propose a new approach based on alarm event text processing to predict the final product quality. This approach improves performance and exploits more information available in the alarm text. Finally, we propose a framework for analyzing alarm activations through performance evaluation tools and several interactive visualization techniques that are more suitable for semiconductor manufacturing. These allow us to closely monitor alarms, evaluate performance, and improve the quality of products and event data collected in history. The effectiveness of each of the above approaches is demonstrated using a real data set obtained from a semiconductor manufacturing facility
Kang, Chen. "Image Aesthetic Quality Assessment Based on Deep Neural Networks". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG004.
Testo completoWith the development of capture devices and the Internet, people access to an increasing amount of images. Assessing visual aesthetics has important applications in several domains, from image retrieval and recommendation to enhancement. Image aesthetic quality assessment aims at determining how beautiful an image looks to human observers. Many problems in this field are not studied well, including the subjectivity of aesthetic quality assessment, explanation of aesthetics and the human-annotated data collection. Conventional image aesthetic quality prediction aims at predicting the average score or aesthetic class of a picture. However, the aesthetic prediction is intrinsically subjective, and images with similar mean aesthetic scores/class might display very different levels of consensus by human raters. Recent work has dealt with aesthetic subjectivity by predicting the distribution of human scores, but predicting the distribution is not directly interpretable in terms of subjectivity, and might be sub-optimal compared to directly estimating subjectivity descriptors computed from ground-truth scores. Furthermore, labels in existing datasets are often noisy, incomplete or they do not allow more sophisticated tasks such as understanding why an image looks beautiful or not to a human observer. In this thesis, we first propose several measures of subjectivity, ranging from simple statistical measures such as the standard deviation of the scores, to newly proposed descriptors inspired by information theory. We evaluate the prediction performance of these measures when they are computed from predicted score distributions and when they are directly learned from ground-truth data. We find that the latter strategy provides in general better results. We also use the subjectivity to improve predicting aesthetic scores, showing that information theory inspired subjectivity measures perform better than statistical measures. Then, we propose an Explainable Visual Aesthetics (EVA) dataset, which contains 4070 images with at least 30 votes per image. EVA has been crowd-sourced using a more disciplined approach inspired by quality assessment best practices. It also offers additional features, such as the degree of difficulty in assessing the aesthetic score, rating for 4 complementary aesthetic attributes, as well as the relative importance of each attribute to form aesthetic opinions. The publicly available dataset is expected to contribute to future research on understanding and predicting visual quality aesthetics. Additionally, we studied the explainability of image aesthetic quality assessment. A statistical analysis on EVA demonstrates that the collected attributes and relative importance can be linearly combined to explain effectively the overall aesthetic mean opinion scores. We found subjectivity has a limited correlation to average personal difficulty in aesthetic assessment, and the subject's region, photographic level and age affect the user's aesthetic assessment significantly
Konishcheva, Kseniia. "Novel strategies for identifying and addressing mental health and learning disorders in school-age children". Electronic Thesis or Diss., Université Paris Cité, 2023. http://www.theses.fr/2023UNIP7083.
Testo completoThe prevalence of mental health and learning disorders in school-age children is a growing concern. Yet, a significant delay exists between the onset of symptoms and referral for intervention, contributing to long-term challenges for affected children. The current mental health system is fragmented, with teachers possessing valuable insights into their students' well-being but limited knowledge of mental health, while clinicians often only encounter more severe cases. Inconsistent implementation of existing screening programs in schools, mainly due to resource constraints, suggests the need for more effective solutions. This thesis presents two novel approaches for improvement of mental health and learning outcomes of children and adolescents. The first approach uses data-driven methods, leveraging the Healthy Brain Network dataset which contains item-level responses from over 50 assessments, consensus diagnoses, and cognitive task scores from thousands of children. Using machine learning techniques, item subsets were identified to predict common mental health and learning disability diagnoses. The approach demonstrated promising performance, offering potential utility for both mental health and learning disability detection. Furthermore, our approach provides an easy-to-use starting point for researchers to apply our method to new datasets. The second approach is a framework aimed at improving the mental health and learning outcomes of children by addressing the challenges faced by teachers in heterogeneous classrooms. This framework enables teachers to create tailored teaching strategies based on identified needs of individual students, and when necessary, suggest referral to clinical care. The first step of the framework is an instrument designed to assess each student's well-being and learning profile. FACETS is a 60-item scale built through partnerships with teachers and clinicians. Teacher acceptance and psychometric properties of FACETS are investigated. Preliminary pilot study demonstrated overall acceptance of FACETS among teachers. In conclusion, this thesis presents a framework to bridge the gap in detection and support of mental health and learning disorders in school-age children. Future studies will further validate and refine our tools, offering more timely and effective interventions to improve the well-being and learning outcomes of children in diverse educational settings
Reverdy, Clément. "Annotation et synthèse basée données des expressions faciales de la Langue des Signes Française". Thesis, Lorient, 2019. http://www.theses.fr/2019LORIS550.
Testo completoFrench Sign Language (LSF) represents part of the identity and culture of the deaf community in France. One way to promote this language is to generate signed content through virtual characters called signing avatars. The system we propose is part of a more general project of gestural synthesis of LSF by concatenation that allows to generate new sentences from a corpus of annotated motion data captured via a marker-based motion capture device (MoCap) by editing existing data. In LSF, facial expressivity is particularly important since it is the vector of numerous information (e.g., affective, clausal or adjectival). This thesis aims to integrate the facial aspect of LSF into the concatenative synthesis system described above. Thus, a processing pipeline is proposed, from data capture via a MoCap device to facial animation of the avatar from these data and to automatic annotation of the corpus thus constituted. The first contribution of this thesis concerns the employed methodology and the representation by blendshapes both for the synthesis of facial animations and for automatic annotation. It enables the analysis/synthesis scheme to be processed at an abstract level, with homogeneous and meaningful descriptors. The second contribution concerns the development of an automatic annotation method based on the recognition of expressive facial expressions using machine learning techniques. The last contribution lies in the synthesis method, which is expressed as a rather classic optimization problem but in which we have included
Seeliger, Barbara. "Évaluation de la perfusion viscérale et anastomotique par réalité augmentée basée sur la fluorescence". Thesis, Strasbourg, 2019. http://www.theses.fr/2019STRAJ048.
Testo completoThe fluorescence-based enhanced reality approach is used to quantify fluorescent signal dynamics and superimpose the perfusion cartography onto laparoscopic images in real time. A colonic ischemia model was chosen to differentiate between different types of ischemia and determine the extension of an ischemic zone in the different layers of the colonic wall. The evaluation of fluorescence dynamics associated with a machine learning approach made it possible to distinguish between arterial and venous ischemia with a good prediction rate. In the second study, quantitative perfusion assessment showed that the extent of ischemia was significantly larger on the mucosal side, and may be underestimated with an exclusive analysis of the serosal side. Two further studies have revealed that fluorescence imaging can guide the surgeon in real time during minimally invasive adrenal surgery, and that quantitative software fluorescence analysis facilitates the distinction between vascularized and ischemic segments
L'Hour, Jérémy. "Policy evaluation, high-dimension and machine learning". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLG008/document.
Testo completoThis dissertation is comprised of three essays that apply machine learning and high-dimensional statistics to causal inference. The first essay proposes a parametric alternative to the synthetic control method (Abadie and Gardeazabal, 2003; Abadie et al., 2010) that relies on a Lasso-type first-step. We show that the resulting estimator is doubly robust, asymptotically Gaussian and ``immunized'' against first-step selection mistakes. The second essay studies a penalized version of the synthetic control method especially useful in the presence of micro-economic data. The penalization parameter trades off pairwise matching discrepancies with respect to the characteristics of each unit in the synthetic control against matching discrepancies with respect to the characteristics of the synthetic control unit as a whole. We study the properties of the resulting estimator, propose data-driven choices of the penalization parameter and discuss randomization-based inference procedures. The last essay applies the Generic Machine Learning framework (Chernozhukov et al., 2018) to study heterogeneity of the treatment in a randomized experiment designed to compare public and private provision of job counselling. From a methodological perspective, we discuss the extension of the Generic Machine Learning framework to experiments with imperfect compliance
Millan, Mégane. "L'apprentissage profond pour l'évaluation et le retour d'information lors de l'apprentissage de gestes". Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUS057.
Testo completoLearning a new sport or manual work is complex. Indeed, many gestures have to be assimilated in order to reach a good level of skill. However, learning these gestures cannot be done alone. Indeed, it is necessary to see the gesture execution with an expert eye in order to indicate corrections for improvement. However, experts, whether in sports or in manual works, are not always available to analyze and evaluate a novice’s gesture. In order to help experts in this task of analysis, it is possible to develop virtual coaches. Depending on the field, the virtual coach will have more or less skills, but an evaluation according to precise criteria is always mandatory. Providing feedback on mistakes is also essential for the learning of a novice. In this thesis, different solutions for developing the most effective virtual coaches are proposed. First of all, and as mentioned above, it is necessary to evaluate the gestures. From this point of view, a first part consisted in understanding the stakes of automatic gesture analysis, in order to develop an automatic evaluation algorithm that is as efficient as possible. Subsequently, two algorithms for automatic quality evaluation are proposed. These two algorithms, based on deep learning, were then tested on two different gestures databases in order to evaluate their genericity. Once the evaluation has been carried out, it is necessary to provide relevant feedback to the learner on his errors. In order to maintain continuity in the work carried out, this feedback is also based on neural networks and deep learning. A method has been developed based on neural network explanability methods. It allows to go back to the moments of the gestures when errors were made according to the evaluation model. Finally, coupled with semantic segmentation, this method makes it possible to indicate to learners which part of the gesture was badly performed, and to provide them with statistics and a learning curve
Graff, Kevin. "Contribution à la cartographie multirisques de territoires côtiers : approche quantitative des conséquences potentielles et des concomitances hydrologiques (Normandie, France) Analysis and quantification of potential consequences in multirisk coastal context at different spatial scales (Normandy, France) Characterization of elements at risk in the multirisk coastal context and at different spatial scales: Multi-database integration (normandy, France)". Thesis, Normandie, 2020. http://www.theses.fr/2020NORMC001.
Testo completoThe coastal environment in Normandy is conducive to a convergence of multiple hazards (erosion, marine submersion, flooding by overflowing streams or upwelling of a water table, turbid flooding by runoff, coastal or continental slope movement). Because of their interface positions, important regressive dynamics go between the marine and continental processes. This interaction will occur within the slopes and valleys where coastal populations and their activities have tended to become more densified since the 19th century. In this context, it is necessary to adopt a multi-hazard and multi-risk approach considering the spatial or temporal confluence of several hazards and their possible cascading effects and to assess the multi-sector impacts generated. by these vagaries.As part of this thesis, as well as in the ANR RICOCHET program, three study sites were selected at the outlet of the coastal rivers: from Auberville to Pennedepie, from Quiberville to Dieppe and from Criel-sur-Mer to Ault due to significant issues and strong interactions between hydrological and gravitational phenomena. Two main objectives have been carried out: (1) an methodological development on analyses of potential consequences by considering all the elements at risk within a study territory through a multiscaling approach; (2) an analysis of hydrological concomitances through a both statistical and spatial approach
Guimbaud, Jean-Baptiste. "Enhancing Environmental Risk Scores with Informed Machine Learning and Explainable AI". Electronic Thesis or Diss., Lyon 1, 2024. http://www.theses.fr/2024LYO10188.
Testo completoFrom conception onward, environmental factors such as air quality or dietary habits can significantly impact the risk of developing various chronic diseases. Within the epidemiological literature, indicators known as Environmental Risk Scores (ERSs) are used not only to identify individuals at risk but also to study the relationships between environmental factors and health. A limit of most ERSs is that they are expressed as linear combinations of a limited number of factors. This doctoral thesis aims to develop ERS indicators able to investigate nonlinear relationships and interactions across a broad range of exposures while discovering actionable factors to guide preventive measures and interventions, both in adults and children. To achieve this aim, we leverage the predictive abilities of non-parametric machine learning methods, combined with recent Explainable AI tools and existing domain knowledge. In the first part of this thesis, we compute machine learning-based environmental risk scores for mental, cardiometabolic, and respiratory general health for children. On top of identifying nonlinear relationships and exposure-exposure interactions, we identified new predictors of disease in childhood. The scores could explain a significant proportion of variance and their performances were stable across different cohorts. In the second part, we propose SEANN, a new approach integrating expert knowledge in the form of Pooled Effect Sizes (PESs) into the training of deep neural networks for the computation of extit{informed environmental risk scores}. SEANN aims to compute more robust ERSs, generalizable to a broader population, and able to capture exposure relationships that are closer to evidence known from the literature. We experimentally illustrate the approach's benefits using synthetic data, showing improved prediction generalizability in noisy contexts (i.e., observational settings) and improved reliability of interpretation using Explainable Artificial Intelligence (XAI) methods compared to an agnostic neural network. In the last part of this thesis, we propose a concrete application for SEANN using data from a cohort of Spanish adults. Compared to an agnostic neural network-based ERS, the score obtained with SEANN effectively captures relationships more in line with the literature-based associations without deteriorating the predictive performances. Moreover, exposures with poor literature coverage significantly differ from those obtained with the agnostic baseline method with more plausible directions of associations.In conclusion, our risk scores demonstrate substantial potential for the data-driven discovery of unknown nonlinear environmental health relationships by leveraging existing knowledge about well-known relationships. Beyond their utility in epidemiological research, our risk indicators are able to capture holistic individual-level non-hereditary risk associations that can inform practitioners about actionable factors in high-risk individuals. As in the post-genetic era, personalized medicine prevention will focus more and more on modifiable factors, we believe that such approaches will be instrumental in shaping future healthcare paradigms
Cappelaere, Charles-Henri. "Estimation du risque de mort subite par arrêt cardiaque a l'aide de méthodes d'apprentissage artificiel". Electronic Thesis or Diss., Paris 6, 2014. http://www.theses.fr/2014PA066014.
Testo completoImplantable cardioverter defibrillators (ICD) have been prescribed for prophylaxis since the early 2000?s, for patients at high risk of SCD. Unfortunately, most implantations to date appear unnecessary. This result raises an important issue because of the perioperative and postoperative risks. Thus, it is important to improve the selection of the candidates to ICD implantation in primary prevention. Risk stratification for SCD based on Holter recordings has been extensively performed in the past, without resulting in a significant improvement of the selection of candidates to ICD implantation. The present report describes a nonlinear multivariate analysis of Holter recording indices. We computed all the descriptors available in the Holter recordings present in our database. The latter consisted of labelled Holter recordings of patients equipped with an ICD in primary prevention, a fraction of these patients received at least one appropriate therapy from their ICD during a 6-month follow-up. Based on physiological knowledge on arrhythmogenesis, feature selection was performed, and an innovative procedure of classifier design and evaluation was proposed. The classifier is intended to discriminate patients who are really at risk of sudden death from patients for whom ICD implantation does not seem necessary. In addition, we designed an ad hoc classifier that capitalizes on prior knowledge on arrhythmogenesis. We conclude that improving prophylactic ICD-implantation candidate selection by automatic classification from Holter recording features may be possible. Nevertheless, that statement should be supported by the study of a more extensive and appropriate database
Aziz, Usama. "Détection des défauts des éoliennes basée sur la courbe de puissance : Comparaison critique des performances et proposition d'une approche multi-turbines". Thesis, Université Grenoble Alpes, 2020. https://tel.archives-ouvertes.fr/tel-03066125.
Testo completoSince wind turbines are electricity generators, the electrical power produced by a machine is a relevant variable for monitoring and detecting possible faults. In the framework of this thesis, an in-depth literature review was first performed on fault detection methods for wind turbines using the electrical power produced. It showed that, although many methods have been proposed in the literature, it is very difficult to compare their performance in an objective way due to the lack of reference data, allowing to implement and evaluate all these methods on the basis of the same data.To address this problem, as a first step, a new realistic simulation approach has been proposed in this thesis. It allows to create simulated data streams, coupling the power output, wind speed and temperature, in normal conditions and in fault situations, in an infinite way. The defects that can be simulated are those that impact the shape of the power curve. The simulated data are generated from real data recorded on several French wind farms, located on different geographical sites. In a second step, a method for evaluating the performance of fault detection methods using the power produced has been proposed.This new simulation method was implemented on 4 different fault situations affecting the power curve, using data from 5 geographically remote wind farms. A total of 1875 years of 10-minute SCADA data was generated and used to compare the detection performance of 3 fault detection methods proposed in the literature. This allowed a rigorous comparison of their performance.In the second part of this research, the proposed simulation method was extended to a multi-turbine configuration. Indeed, several multi-turbine strategies have been published in the literature, with the objective of reducing the impact of environmental conditions on the performance of fault detection methods using temperature as a variable. In order to evaluate the performance gain that a multi-turbine strategy could bring, a hybrid mono-multi-turbine implementation of fault detection methods based on the power curve was first proposed. Then, the simulation framework proposed to evaluate mono-turbine methods was extended to multi-turbine approaches and a numerical experimental analysis of the performance of this hybrid mono-multi-turbine implementation was performed
Benguigui, Michaël. "Valorisation d’options américaines et Value At Risk de portefeuille sur cluster de GPUs/CPUs hétérogène". Thesis, Nice, 2015. http://www.theses.fr/2015NICE4053/document.
Testo completoThe research work described in this thesis aims at speeding up the pricing of complex financial instruments, like an American option on a realistic size basket of assets (e.g. 40) by leveraging the parallel processing power of Graphics Processing Units. To this aim, we start from a previous research work that distributed the pricing algorithm based on Monte Carlo simulation and machine learning proposed by J. Picazo. We propose an adaptation of this distributed algorithm to take advantage of a single GPU. This allows us to get performances using one single GPU comparable to those measured using a 64 cores cluster for pricing a 40-assets basket American option. Still, on this realistic-size option, the pricing requires a handful of hours. Then we extend this first contribution in order to tackle a cluster of heterogeneous devices, both GPUs and CPUs programmed in OpenCL, at once. Doing this, we are able to drastically accelerate the option pricing time, even if the various classification methods we experiment with (AdaBoost, SVM) constitute a performance bottleneck. So, we consider instead an alternate, distributable approach, based upon Random Forests which allow our approach to become more scalable. The last part reuses these two contributions to tackle the Value at Risk evaluation of a complete portfolio of financial instruments, on a heterogeneous cluster of GPUs and CPUs
Iriart, Alejandro. "Mesures d’insertion sociale destinées aux détenus québécois et récidive criminelle : une approche par l'apprentissage automatique". Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/66717.
Testo completoIn this master thesis, we tried to determine the real influence of social rehabilitation programs on the risk of recidivism. To do this, we used a machine learning algorithm to analyze a database provided by the Quebec Ministry of Public Security (MSP). In this database, we are able to follow the numerous incarcerations of 97,140 prisoners from 2006 to 2018. Our analysis focuses only on inmates who have served in the prison in Quebec City. The approach we used is named Generalized Random Forests (GRF) and was developed by Athey et al. (2019). Our main analysis focuses not only on the characteristics of the prisoners, but also on the results they obtained when they were subjected to the LS/CMI, an extensive questionnaire aimed at determining the criminogenic needs and the risk level of the inmates . We also determined which variables have the most influence on predicting the treatment effect by using a function of the same algorithm that calculates the relative importance of each of the variables to make a prediction. By comparing participants and non-participants, we were able to demonstrate that participating in a program reduces the risk of recidivism by approximately 6.9% for a two-year trial period. Participating in a program always reduces significantly recidivism no matter the definition of recidivism used. We also determined that in terms of personal characteristics, it is the age, the nature of the offence and the number of years of study that are the main predictors for the individual causal effects. As for the LS/CMI, only a few sections of the questionnaire have real predictive power while others, like the one about leisure, do not. In light of our results, we believe that a more efficient instrument capable of predicting recidivism can be created by focusing on the newly identified variables with the greatest predictive power. A better instrument will make it possible to provide better counselling to prisoners on the programs they should follow, and thus increase their chances of being fully rehabilitated.
Potet, Marion. "Vers l'intégration de post-éditions d'utilisateurs pour améliorer les systèmes de traduction automatiques probabilistes". Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00995104.
Testo completoEnríquez, Luis. "Personal data breaches : towards a deep integration between information security risks and GDPR compliance risks". Electronic Thesis or Diss., Université de Lille (2022-....), 2024. http://www.theses.fr/2024ULILD016.
Testo completoInformation security is deeply linked to data protection law, because an ineffective security implementation can lead to personal data breaches. The GDPR is based on a risk-based approach for the protection of the rights and freedoms of the data subjects, meaning that risk management is the mechanism for protecting fundamental rights. However, the state of the art of information security risk management and legal risk management are still immature. Unfortunately, the current state of the art does not assess the multi-dimensionality of data protection risks, and it has skipped the main purpose of a risk-based approach, measuring risk for taking informed decisions. The legal world shall understand that risk management does not work by default, and it often requires applied-scientific methods for assessing risks. This thesis proposes a mindset change with the aim of fixing data protection risk management, with a holistic data protection approach that merges operational, financial, and legal risks. The concept of a Personal Data Value at Risk is introduced as the outcome of several quantitative strategies based on risk modeling, jurimetrics, and data protection analytics. The ideas presented here shall also contribute to comply with upcoming risk-based regulations that rely on data protection, such as artificial intelligence. The risk transformation may appear difficult, but it is compulsory for the evolution of data protection
Telmoudi, Fedya. "Estimation and misspecification Risks in VaR estimation". Thesis, Lille 3, 2014. http://www.theses.fr/2014LIL30061/document.
Testo completoIn this thesis, we study the problem of conditional Value at Risk (VaR) estimation taking into account estimation risk and model risk. First, we considered a two-step method for VaR estimation. The first step estimates the volatility parameter using a generalized quasi maximum likelihood estimator (gQMLE) based on an instrumental density h. The second step estimates a quantile of innovations from the empirical quantile of residuals obtained in the first step. We give conditions under which the two-step estimator of the VaR is consistent and asymptotically normal. We also compare the efficiencies of the estimators for various instrumental densities h. When the distribution of is not the density h the first step usually gives a biased estimator of the volatility parameter and the second step gives a biased estimator of the quantile of the innovations. However, we show that both errors counterbalance each other to give a consistent estimate of the VaR. We then focus on the VaR estimation within the framework of GARCH models using the gQMLE based on a class of instrumental densities called double generalized gamma which contains the Gaussian distribution. Our goal is to compare the performance of the Gaussian QMLE against the gQMLE. The choice of the optimal estimator depends on the value of d that minimizes the asymptotic variance. We test if this parameter is equal 2. When the test is applied to real series of financial returns, the hypothesis stating the optimality of Gaussian QMLE is generally rejected. Finally, we consider non-parametric machine learning models for VaR estimation. These methods are designed to eliminate model risk because they are not based on a specific form of volatility. We use the support vector machine model for regression (SVR) based on the least square loss function (LS). In order to improve the solution of LS-SVR model, we used the weighted LS-SVR and the fixed size LS-SVR models. Numerical illustrations highlight the contribution of the proposed models for VaR estimation taking into account the risk of specification and estimation
Cao, Qiushi. "Semantic technologies for the modeling of predictive maintenance for a SME network in the framework of industry 4.0 Smart condition monitoring for industry 4.0 manufacturing processes: an ontology-based approach Using rule quality measures for rule base refinement in knowledge-based predictive maintenance systems Combining chronicle mining and semantics for predictive maintenance in manufacturing processes". Thesis, Normandie, 2020. http://www.theses.fr/2020NORMIR04.
Testo completoIn the manufacturing domain, the detection of anomalies such as mechanical faults and failures enables the launching of predictive maintenance tasks, which aim to predict future faults, errors, and failures and also enable maintenance actions. With the trend of Industry 4.0, predictive maintenance tasks are benefiting from advanced technologies such as Cyber-Physical Systems (CPS), the Internet of Things (IoT), and Cloud Computing. These advanced technologies enable the collection and processing of sensor data that contain measurements of physical signals of machinery, such as temperature, voltage, and vibration. However, due to the heterogeneous nature of industrial data, sometimes the knowledge extracted from industrial data is presented in a complex structure. Therefore formal knowledge representation methods are required to facilitate the understanding and exploitation of the knowledge. Furthermore, as the CPSs are becoming more and more knowledge-intensive, uniform knowledge representation of physical resources and reasoning capabilities for analytic tasks are needed to automate the decision-making processes in CPSs. These issues bring obstacles to machine operators to perform appropriate maintenance actions. To address the aforementioned challenges, in this thesis, we propose a novel semantic approach to facilitate predictive maintenance tasks in manufacturing processes. In particular, we propose four main contributions: i) a three-layered ontological framework that is the core component of a knowledge-based predictive maintenance system; ii) a novel hybrid semantic approach to automate machinery failure prediction tasks, which is based on the combined use of chronicles (a more descriptive type of sequential patterns) and semantic technologies; iii) a new approach that uses clustering methods with Semantic Web Rule Language (SWRL) rules to assess failures according to their criticality levels; iv) a novel rule base refinement approach that uses rule quality measures as references to refine a rule base within a knowledge-based predictive maintenance system. These approaches have been validated on both real-world and synthetic data sets