Dissertations / Theses on the topic 'Apprentissage automatique – Prévision – Utilisation'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Apprentissage automatique – Prévision – Utilisation.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Loisel, Julie. "Détection des ruptures de la chaîne du froid par une approche d'apprentissage automatique." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASB014.
Full textThe cold chain is essential to ensure food safety and avoid food waste. Wireless sensors are increasingly used to monitor the air temperature through the cold chain, however, the exploitation of these measurements is still limited. This thesis explores how machine learning can be used to predict the temperature of different food products types from the measured air temperature in a pallet and detect cold chain breaks. We introduced, firstly, a definition of a cold chain break based on two main product categories: products obligatorily preserved at a regulated temperature such as meat and fish, and products for which a temperature is recommended such as fruits and vegetables. The cold chain break leads to food poisoning for the first product category and organoleptic quality degradation for the second one.For temperature-regulated products, it is crucial to predict the product temperature to ensure that it does not exceed the regulatory temperature. Although several studies demonstrated the effectiveness of neural networks for the prediction, none has compared the synthetic and experimental data to train them. In this thesis, we proposed to compare these two types of data in order to provide guidelines for the development of neural networks. In practice, the products and packaging are diverse; experiments for each application are impossible due to the complexity of implementation. By comparing synthetic and experimental data, we were able to determine best practices for developing neural networks to predict product temperature and maintain cold chain. For temperature-regulated products, once the cold chain break is detected, they are no more consumable and must be eliminated. For temperature-recommended products, we compared three different approaches to detect cold chain breaks and implement corrective actions: a) method based on a temperature threshold, b) method based on a classifier which determines whether the products will be delivered with the expected qualities, and c) method also based on a classifier but which integrates the cost of the corrective measure in the decision-making process. The performances of the three methods are discussed and prospects for improvement are proposed
De, Carvalho Gomes Fernando. "Utilisation d'algorithmes stochastiques en apprentissage." Montpellier 2, 1992. http://www.theses.fr/1992MON20254.
Full textToqué, Florian. "Prévision et visualisation de l'affluence dans les transports en commun à l'aide de méthodes d'apprentissage automatique." Thesis, Paris Est, 2019. http://www.theses.fr/2019PESC2029.
Full textAs part of the fight against global warming, several countries around the world, including Canada and some European countries, including France, have established measures to reduce greenhouse gas emissions. One of the major areas addressed by the states concerns the transport sector and more particularly the development of public transport to reduce the use of private cars. To this end, the local authorities concerned aim to establish more accessible, clean and sustainable urban transport systems. In this context, this thesis, co-directed by the University of Paris-Est, the french institute of science and technology for transport, development and network (IFSTTAR) and Polytechnique Montréal in Canada, focuses on the analysis of urban mobility through research conducted on the forecasting and visualization of public transport ridership using machine learning methods. The main motivations concern the improvement of transport services offered to passengers such as: better planning of transport supply, improvement of passenger information (e.g., proposed itinerary in the case of an event/incident, information about the crowd in the train at a chosen time, etc.). In order to improve transport operators' knowledge of user travel in urban areas, we are taking advantage of the development of data science (e.g., data collection, development of machine learning methods). This thesis thus focuses on three main parts: (i) long-term forecasting of passenger demand using event databases, (ii) short-term forecasting of passenger demand and (iii) visualization of passenger demand on public transport. The research is mainly based on the use of ticketing data provided by transport operators and was carried out on three real case study, the metro and bus network of the city of Rennes, the rail and tramway network of "La Défense" business district in Paris, France, and the metro network of Montreal, Quebec in Canada
Kashnikov, Yuriy. "Une approche holistique pour la prédiction des optimisations du compilateur par apprentissage automatique." Versailles-St Quentin en Yvelines, 2013. http://www.theses.fr/2013VERS0047.
Full textEffective compiler optimizations can greatly improve applications performance. These optimizations are numerous and can be applied in any order. Compilers select these optimizations using solutions driven by heuristics which may degrade programs performance. Therefore, developers resort to the tedious manual search for the best optimizations. Combinatorial search space makes this effort intractable and one can easily fall into a local minimum and miss the best combination. This thesis develops a holistic approach to improve applications performance with compiler optimizations and machine learning. A combination of static loop analysis and statistical learning is used to analyze a large corpus of loops and reveal good potential for compiler optimizations. Milepost GCC, a machine-learning based compiler, is applied to optimize benchmarks and an industrial database application. It uses function level static features and classification algorithms to predict a good sequence of optimizations. While Milepost GCC can mispredict the best optimizations, in general it obtains considerable speedups and outperforms state-of-the-art compiler heuristics. The culmination of this thesis is the ULM meta-optimization framework. ULM characterizes applications at different levels with static code features and hardware performance counters and finds the most important combination of program features. By selecting among three classification algorithms and tuning their parameters, ULM builds a sophisticated predictor that can outperform existing solutions. As a result, the ULM framework predicted correctly the best sequence of optimizations sequence in 92% of cases
Dupont, Pierre. "Utilisation et apprentissage de modèles de langage pour la reconnaissance de la parole continue /." Paris : École nationale supérieure des télécommunications, 1996. http://catalogue.bnf.fr/ark:/12148/cb35827695q.
Full textMelzi, Fateh. "Fouille de données pour l'extraction de profils d'usage et la prévision dans le domaine de l'énergie." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1123/document.
Full textNowadays, countries are called upon to take measures aimed at a better rationalization of electricity resources with a view to sustainable development. Smart Metering solutions have been implemented and now allow a fine reading of consumption. The massive spatio-temporal data collected can thus help to better understand consumption behaviors, be able to forecast them and manage them precisely. The aim is to be able to ensure "intelligent" use of resources to consume less and consume better, for example by reducing consumption peaks or by using renewable energy sources. The thesis work takes place in this context and aims to develop data mining tools in order to better understand electricity consumption behaviors and to predict solar energy production, then enabling intelligent energy management.The first part of the thesis focuses on the classification of typical electrical consumption behaviors at the scale of a building and then a territory. In the first case, an identification of typical daily power consumption profiles was conducted based on the functional K-means algorithm and a Gaussian mixture model. On a territorial scale and in an unsupervised context, the aim is to identify typical electricity consumption profiles of residential users and to link these profiles to contextual variables and metadata collected on users. An extension of the classical Gaussian mixture model has been proposed. This allows exogenous variables such as the type of day (Saturday, Sunday and working day,...) to be taken into account in the classification, thus leading to a parsimonious model. The proposed model was compared with classical models and applied to an Irish database including both electricity consumption data and user surveys. An analysis of the results over a monthly period made it possible to extract a reduced set of homogeneous user groups in terms of their electricity consumption behaviors. We have also endeavoured to quantify the regularity of users in terms of consumption as well as the temporal evolution of their consumption behaviors during the year. These two aspects are indeed necessary to evaluate the potential for changing consumption behavior that requires a demand response policy (shift in peak consumption, for example) set up by electricity suppliers.The second part of the thesis concerns the forecast of solar irradiance over two time horizons: short and medium term. To do this, several approaches have been developed, including autoregressive statistical approaches for modelling time series and machine learning approaches based on neural networks, random forests and support vector machines. In order to take advantage of the different models, a hybrid model combining the different models was proposed. An exhaustive evaluation of the different approaches was conducted on a large database including four locations (Carpentras, Brasilia, Pamplona and Reunion Island), each characterized by a specific climate as well as weather parameters: measured and predicted using NWP models (Numerical Weather Predictions). The results obtained showed that the hybrid model improves the results of photovoltaic production forecasts for all locations
Thorey, Jean. "Prévision d’ensemble par agrégation séquentielle appliquée à la prévision de production d’énergie photovoltaïque." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066526/document.
Full textOur main objective is to improve the quality of photovoltaic power forecasts deriving from weather forecasts. Such forecasts are imperfect due to meteorological uncertainties and statistical modeling inaccuracies in the conversion of weather forecasts to power forecasts. First we gather several weather forecasts, secondly we generate multiple photovoltaic power forecasts, and finally we build linear combinations of the power forecasts. The minimization of the Continuous Ranked Probability Score (CRPS) allows to statistically calibrate the combination of these forecasts, and provides probabilistic forecasts under the form of a weighted empirical distribution function. We investigate the CRPS bias in this context and several properties of scoring rules which can be seen as a sum of quantile-weighted losses or a sum of threshold-weighted losses. The minimization procedure is achieved with online learning techniques. Such techniques come with theoretical guarantees of robustness on the predictive power of the combination of the forecasts. Essentially no assumptions are needed for the theoretical guarantees to hold. The proposed methods are applied to the forecast of solar radiation using satellite data, and the forecast of photovoltaic power based on high-resolution weather forecasts and standard ensembles of forecasts
Nachouki, Mirna. "L'acquisition de connaissances dans les systèmes dynamiques : production et utilisation dans le cadre de l'atelier de génie didacticiel intégré." Toulouse 3, 1995. http://www.theses.fr/1995TOU30001.
Full textBaudin, Paul. "Prévision séquentielle par agrégation d'ensemble : application à des prévisions météorologiques assorties d'incertitudes." Thesis, Université Paris-Saclay (ComUE), 2015. http://www.theses.fr/2015SACLS117/document.
Full textIn this thesis, we study sequential prediction problems. The goal is to devise and apply automatic strategy, learning from the past, with potential help from basis predictors. We desire these strategies to have strong mathematical guarantees and to be valid in the most general cases. This enables us to apply the algorithms deriving from the strategies to meteorological data predictions. Finally, we are interested in theoretical and practical versions of this sequential prediction framework to cumulative density function prediction. Firstly, we study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and propose a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz predictor. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic processes. Secondly, we propose a specific sequential aggregation method of meteorological simulation of mean sea level pressure. The aim is to obtain, with a ridge regression algorithm, better prediction performance than a reference prediction, belonging to the constant linear prediction of basis predictors. We begin by recalling the mathematical framework and basic notions of environmental science. Then, the used datasets and practical performance of strategies are studied, as well as the sensitivity of the algorithm to parameter tuning. We then transpose the former method to another meteorological variable: the wind speed 10 meter above ground. This study shows that the wind speed exhibits different behaviors on a macro level. In the last chapter, we present the tools used in a probabilistic prediction framework and underline their merits. First, we explain the relevancy of probabilistic prediction and expose this domain's state of the art. We carry on with an historical approach of popular probabilistic scores. The used algorithms are then thoroughly described before the descriptions of their empirical results on the mean sea level pressure and wind speed
Desrousseaux, Christophe. "Utilisation d'un critère entropique dans les systèmes de détection." Lille 1, 1998. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/1998/50376-1998-229.pdf.
Full textMonsifrot, Antoine. "Utilisation du raisonnement à partir de cas et de l'apprentissage pour l'optimisation de code." Rennes 1, 2002. http://www.theses.fr/2002REN10107.
Full textKritter, Thibaut. "Utilisation de données cliniques pour la construction de modèles en oncologie." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0166/document.
Full textThis thesis deals with the use of clinical data in the construction of models applied to oncology. Existing models which take into account many biological mechanisms of tumor growth have too many parameters and cannot be calibrated on clinical cases. On the contrary, too simple models are not able to precisely predict tumor evolution for each patient. The diversity of data acquired by clinicians is a source of information that can make model estimations more precise. Through two different projets, we integrated data in the modeling process in order to extract more information from it. In the first part, clinical imaging and biopsy data are combined with machine learning methods. Our aim is to distinguish fast recurrent patients from slow ones. Results show that the obtained stratification is more efficient than the stratification used by cliniciens. It could help physicians to adapt treatment in a patient-specific way. In the second part, data is used to correct a simple tumor growth model. Even though this model is efficient to predict the volume of a tumor, its simplicity prevents it from accounting for shape evolution. Yet, an estimation of the tumor shape enables clinician to better plan surgery. Data assimilation methods aim at adapting the model and rebuilding the tumor environment which is responsible for these shape changes. The prediction of the growth of brain metastases is then more accurate
Dione, Mamadou. "Prévision court terme de la production éolienne par Machine learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAG004.
Full textThe energy transition law passed by the French government has specific implications for renewable energies, in particular for their remuneration mechanism. Until 2015, a purchase obligation contract made it possible to sell electricity from wind power at a fixed rate. From 2015 onwards, some wind farms began to be exempted from the purchase obligation. This is because wind energy is starting to be sold directly on the market by the producers because of the breach of the purchase obligation contracts. Distribution system operators and transmission system operators require or even oblige producers to provide at least a production forecast one day in advance in order to rebalance the market. Over- or underestimation could be subject to penalties. There is, therefore, a huge need for accurate forecasts. It is in this context that this thesis was launched with the aim of proposing a model for predicting wind farms production by machine learning. We have production data and real wind measurements as well as data from meteorological models. We first compared the performances of the GFS and ECMWF models and studied the relationships between these two models through canonical correlation analysis. We then applied machine learning models to validate a first random forest prediction model. We then modeled the spatio-temporal wind dynamics and integrated it into the prediction model, which improved the prediction error by 3%. We also studied the selection of grid points by a variable group importance measure using random forests. Random forest prediction intervals associated with point forecasts of wind farm production are also studied. The forecasting model resulting from this work was developed to enable the ENGIE Group to have its own daily forecasts for all its wind farms
Abdellaoui, Redhouane. "Utilisation de données du Web communautaire à des fins de surveillance de l’usage de médicaments." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS548.
Full textPharmacovigilance suffers from chronic underreporting of drug's adverse effects from health professional's part. The FDA (US Food and Drug Administration), The EMA (European Medicines Agency), and other health agencies, suggest that social media could constitute an additional data source for detection of weak pharmacovigilance signals. The WHO (World Health Organization) published a report in 2003 outlining the problem of non-compliance with treatment over long term and its prejudicial effectiveness on health systems worldwide. The necessary data for development of an information extraction system from patient's forums are made available by the company Kappa Sante. The first proposed approach fits into a context of pharmacovigilance case detection from patient's online discussions on health forums. We propose a filter based on the number of words separating the name of the mentioned drug in the message from the term considered as a potential adverse effect. We propose a second approach based on topic models to target groups of messages addressing topics dealing with non-compliance. In terms of pharmacovigilance, the proposed Gaussian filter identifies 50.03% of false positives with a precision of 95.8% and a recall of 50%. The case detection approach of non-compliance allows the identification of messages describing this kind of behaviors with a precision of 32.6% and a recall of 98.5%
Grenet, Ingrid. "De l’utilisation des données publiques pour la prédiction de la toxicité des produits chimiques." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4050.
Full textCurrently, chemical safety assessment mostly relies on results obtained in in vivo studies performed in laboratory animals. However, these studies are costly in term of time, money and animals used and therefore not adapted for the evaluation of thousands of compounds. In order to rapidly screen compounds for their potential toxicity and prioritize them for further testing, alternative solutions are envisioned such as in vitro assays and computational predictive models. The objective of this thesis is to evaluate how the public data from ToxCast and ToxRefDB can allow the construction of this type of models in order to predict in vivo effects induced by compounds, only based on their chemical structure. To do so, after data pre-processing, we first focus on the prediction of in vitro bioactivity from chemical structure and then on the prediction of in vivo effects from in vitro bioactivity data. For the in vitro bioactivity prediction, we build and test various models based on compounds’ chemical structure descriptors. Since learning data are highly imbalanced in favor of non-toxic compounds, we test a data augmentation technique and show that it improves models’ performances. We also perform a largescale study to predict hundreds of in vitro assays from ToxCast and show that the stacked generalization ensemble method leads to reliable models when used on their applicability domain. For the in vivo effects prediction, we evaluate the link between results from in vitro assays targeting pathways known to induce endocrine effects and in vivo effects observed in endocrine organs during longterm studies. We highlight that, unexpectedly, these assays are not predictive of the in vivo effects, which raises the crucial question of the relevance of in vitro assays. We thus hypothesize that the selection of assays able to predict in vivo effects should be based on complementary information such as, in particular, mechanistic data
Larlus, Diane. "Création et utilisation de vocabulaires visuels pour la catégorisation d'images et la segmentation de classes d'objets." Phd thesis, Grenoble INPG, 2008. http://tel.archives-ouvertes.fr/tel-00343665.
Full textNous nous intéresserons tout d'abord à l'étude de différentes méthodes de création du vocabulaire visuel et à l'évaluation de ces vocabulaires dans le contexte de la catégorisation d'images.
Dans un deuxième temps, nous étudierons la segmentation de classes d'objets et verrons en particulier comment combiner les propriétés de régularisation très locales permises par un champ de Markov avec un modèle d'apparence basé sur des régions qui représentent chacune un objet et qui sont considérées comme des collections de mots visuels.
Hamadi, Abdelkader. "Utilisation du contexte pour l'indexation sémantique des images et vidéos." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM047/document.
Full textThe automated indexing of image and video is a difficult problem because of the``distance'' between the arrays of numbers encoding these documents and the concepts (e.g. people, places, events or objects) with which we wish to annotate them. Methods exist for this but their results are far from satisfactory in terms of generality and accuracy. Existing methods typically use a single set of such examples and consider it as uniform. This is not optimal because the same concept may appear in various contexts and its appearance may be very different depending upon these contexts. In this thesis, we considered the use of context for indexing multimedia documents. The context has been widely used in the state of the art to treat various problems. In our work, we use relationships between concepts as a source of semantic context. For the case of videos, we exploit the temporal context that models relationships between the shots of the same video. We propose several approaches using both types of context and their combination, in different levels of an indexing system. We also present the problem of multiple concept detection. We assume that it is related to the context use problematic. We consider that detecting simultaneously a set of concepts is equivalent to detecting one or more concepts forming the group in a context where the others are present. To do that, we studied and compared two types of approaches. All our proposals are generic and can be applied to any system for the detection of any concept. We evaluated our contributions on TRECVID and VOC collections, which are of international standards and recognized by the community. We achieved good results comparable to those of the best indexing systems evaluated in recent years in the evaluation campaigns cited previously
Prudhomme, Elie. "Représentation et fouille de données volumineuses." Thesis, Lyon 2, 2009. http://www.theses.fr/2009LYO20048/document.
Full text/
Allain, Guillaume. "Prévision et analyse du trafic routier par des méthodes statistiques." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/351/.
Full textThe industrial partner of this work is Mediamobile/V-trafic, a company which processes and broadcasts live road-traffic information. The goal of our work is to enhance traffic information with forecasting and spatial extending. Our approach is sometimes inspired by physical modelling of traffic dynamic, but it mainly uses statistical methods in order to propose self-organising and modular models suitable for industrial constraints. In the first part of this work, we describe a method to forecast trafic speed within a time frame of a few minutes up to several hours. Our method is based on the assumption that traffic on the a road network can be summarized by a few typical profiles. Those profiles are linked to the users' periodical behaviors. We therefore make the assumption that observed speed curves on each point of the network are stemming from a probabilistic mixture model. The following parts of our work will present how we can refine the general method. Medium term forecasting uses variables built from the calendar. The mixture model still stands. Additionnaly we use a fonctionnal regression model to forecast speed curves. We then introduces a local regression model in order to stimulate short-term trafic dynamics. The kernel function is built from real speed observations and we integrate some knowledge about traffic dynamics. The last part of our work focuses on the analysis of speed data from in traffic vehicles. These observations are gathered sporadically in time and on the road segment. The resulting data is completed and smoothed by local polynomial regression
Koehl, Ludovic. "Conception et réalisation d'un estimateur de dimension fractale par utilisation de techniques floues." Lille 1, 1998. https://pepite-depot.univ-lille.fr/LIBRE/Th_Num/1998/50376-1998-1.pdf.
Full textFrigui, Nejm Eddine. "Maintenance automatique du réseau programmable d'accès optique de très haut débit." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2019. http://www.theses.fr/2019IMTA0127/document.
Full textPassive Optical Network (PON) representing one of the most attractive FTTH access network solutions, have been widely deployed for several years thanks to their ability to offer high speed services. However, due to the dynamicity of users traffic patterns, PONs need to rely on an efficient upstream bandwidth allocation mechanism. This mechanism is currently limited by the static nature of Service Level Agreement (SLA) parameters which can lead to an unoptimized bandwidth allocation in the network. The objective of this thesis is to propose a new management architecture for optimizing the upstream bandwidth allocation in PON while acting only on manageable parameters to allow the involvement of self-decision elements into the network. To achieve this, classification techniques based on machine learning approaches are used to analyze the behavior of PON users and to specify their upstream data transmission tendency. A dynamic adjustment of some SLA parameters is then performed to maximize the overall customers’ satisfaction with the network
Sqali, Houssaini Mamoun. "Utilisation du formalisme DEVS pour la validation de comportements des systèmes à partir des scénarios UML." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4318.
Full textA development of a system begins with the constitution, in the phase of requirements analysis, a specification in which a set of scenarios describing the behavior of the system is defined with the constraints that it must obey, where each scenario is a partial representation of the system behavior. However, this specification is not directly implementable, because it is difficult, especially for more complex systems, to observe the global behavior of a system directly from scenario. That is why they are often integrated with other models used in the detailed design, called "behavioral models", in particular State Machines [Harel 87], who allow to move from partial to global view of the system in order to answer different problems such as validation of the behavior or the detection of system inconsistencies. Our thesis aims, firstly, to study different languages of scenarios, especially UML sequence diagrams, and MSC's (Message Sequence Charts), and secondly to propose an automatic synthesis method who generate executable discrete event DEVS models [Zeigler 76] from scenarios describing the desired behavior of a system. The resulting models are executable and deterministic with a formal semantics that ensures a unique interpretation of each element of models. The use of final models simulation traces, taking into account the coverage of the simulation compared to the number of states and transitions visited, allow validating the behavior
Kosowska-Stamirowska, Zuzanna. "Évolution et robustesse du réseau maritime mondial : une approche par les systèmes complexes." Thesis, Paris 1, 2020. http://www.theses.fr/2020PA01H022.
Full textOver 70% of the total value of international trade is carried by sea, accounting for 80% of all cargo in terms of volume. In 2016, the UN Secretary General drew attention to the role of maritime transport, describing it as “the backbone of global trade and of the global economy”. Maritime trade flows impact not only the economic development of the concerned regions, but also their ecosystems. Moving ships are an important vector of spread for bioinvasions. Shipping routes are constantly evolving and likely to be affected by the consequences of Climate Change, while at the same time ships are a considerable source of air pollution, with CO2 emissions at a level comparable to Germany, and NOx and SOx emissions comparable to the United States. With the development of Arctic shipping becoming a reality, the need to understand the behavior of this system and to forecast future maritime trade flows reasserts itself. Despite their scope and crucial importance, studies of maritime trade flows on a global scale, based on data and formal methods are scarce, and even fewer studies address the question of their evolution. In this thesis we use a unique database on daily movements of the world fleet between 1977 and 2008 provided by the maritime insurer Lloyd’s in order to build a complex network of maritime trade flows where ports stand for nodes and links are created by ship voyages. In this thesis we perform a data-driven analysis of the maritime trade network. We use tools from Complexity Science and Machine Learning applied on network data to study the network’s properties and develop models for predicting the opening of new shipping lines and for forecasting future trade volume on links. Applying Machine Learning to analyse networked trade flows appears to be a new approach with respect to the state-of-the-art, and required careful selection and customization of existing Machine Learning tools to make them fit networked data on physical flows. The results of the thesis suggest a hypothesis of trade following a random walk on the underlying network structure. [...] Thanks to a natural experiment, involving traffic redirection from the port of Kobe after the 1995 earthquake, we find that the traffic was redirected preferentially to ports which had the highest number of Common Neighbors with Kobe before the cataclysm. Then, by simulating targeted attacks on the maritime trade network, we analyze the best criteria which may serve to maximize the harm done to the network and analyse the overall robustness of the network to different types of attacks. All these results hint that maritime trade flows follow a form of random walk on the network of sea connections, which provides evidence for a novel view on the nature of trade flows
Matteo, Lionel. "De l’image optique "multi-stéréo" à la topographie très haute résolution et la cartographie automatique des failles par apprentissage profond." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4099.
Full textSeismogenic faults are the source of earthquakes. The study of their properties thus provides information on some of the properties of the large earthquakes they might produce. Faults are 3D features, forming complex networks generally including one master fault and myriads of secondary faults and fractures that intensely dissect the master fault embedding rocks. I aim in my thesis to develop approaches to help studying this intense secondary faulting/fracturing. To identify, map and measure the faults and fractures within dense fault networks, I have handled two challenges:1) Faults generally form steep topographic escarpments at the ground surface that enclose narrow, deep corridors or canyons, where topography, and hence fault traces, are difficult to measure using the available standard methods (such as stereo and tri-stereo of optical satellite images). To address this challenge, I have thus used multi-stéréo acquisitions with different configuration such as different roll and pitch angles, different date of acquisitions and different mode of acquisitions (mono and tri-stéréo). Our dataset amounting 37 Pléiades images in three different tectonic sites within Western USA (Valley of Fire, Nevada; Granite Dells, Arizona; Bishop Tuff, California) allow us to test different configuration of acquisitions to calculate the topography with three different approaches. Using the free open-source software Micmac (IGN ; Rupnik et al., 2017), I have calculated the topography in the form of Digital Surface Models (DSM): (i) with the combination of 2 to 17 Pleiades images, (ii) stacking and merging DSM built from individual stéréo or tri-stéréo acquisitions avoiding the use of multi-dates combinations, (iii) stacking and merging point clouds built from tri-stereo acquisitions following the multiview pipeline developped by Rupnik et al., 2018. We used the recent multiview stereo pipeling CARS (CNES/CMLA) developped by Michel et al., 2020 as a last approach (iv), combnining tri-stereo acquisitions. From the four different approaches, I have thus calculated more than 200 DSM and my results suggest that combining two tri-stéréo acquisitions or one stéréo and one tri-stéréo acquisitions with opposite roll angles leads to the most accurate DSM (with the most complete and precise topography surface).2) Commonly, faults are mapped manually in the field or from optical images and topographic data through the recognition of the specific curvilinear traces they form at the ground surface. However, manual mapping is time-consuming, which limits our capacity to produce complete representations and measurements of the fault networks. To overcome this problem, we have adopted a machine learning approach, namely a U-Net Convolutional Neural Network, to automate the identification and mapping of fractures and faults in optical images and topographic data. Intentionally, we trained the CNN with a moderate amount of manually created fracture and fault maps of low resolution and basic quality, extracted from one type of optical images (standard camera photographs of the ground surface). Based on the results of a number of performance tests, we select the best performing model, MRef, and demonstrate its capacity to predict fractures and faults accurately in image data of various types and resolutions (ground photographs, drone and satellite images and topographic data). The MRef predictions thus enable the statistical analysis of the fault networks. MRef exhibits good generalization capacities, making it a viable tool for fast and accurate extraction of fracture and fault networks from image and topographic data
Cimmino, Francesco Maria. "Essais sur la création d'une centrale électrique virtuelle pour les petites et moyennes entreprises." Thesis, Aix-Marseille, 2021. http://www.theses.fr/2021AIXM0564.
Full textA PhD thesis "on the job" is a bridge between the academic and the economic world. In this thesis, these two worlds came together to provide solutions to Swiss companies that want to create virtual power plants (VPP). In order to be able to tackle the subject, I started the thesis by analysing the legislative aspects that have allowed the development of this technology, where the most important element is the "Winter pack" of the European Commission which define common rules to open the energy market. Then I focused on the explanations of the technical developments for the VPP, which are linked to the developments of the "Smart Grid" concept. The end of the introduction of my thesis is a short view on economic theories, which allows the reader to understand the structure of the financial markets where the VPP valuation is possible.After this introduction, which will enable the readers to become familiar with the subject, there are three scientific articles where I have analysed problems that companies are facing in this sector. The forecasting of energy demand, production, and secondary market prices.The articles have helped to address the issues by providing efficient forecasting methodologies in comparison with the literature; in addition, companies use some of these models.The methodologies to answer these issues come from the world of finance (ARMA, SETAR, Var) and machine learning (LSTM, GRU), but also from contributions from other disciplines such as marketing (MCA) and geostatistics (IWD)
Sheeren, David. "Méthodologie d' évaluation de la cohérence inter-représentations pour l'intégration de bases de données spatiales : une approche combinant l' utilisation de métadonnées et l' apprentissage automatique." Paris 6, 2005. https://tel.archives-ouvertes.fr/tel-00085693.
Full textBrédy, Jhemson, and Jhemson Brédy. "Prévision de la profondeur de la nappe phréatique d'un champ de canneberges à l'aide de deux approches de modélisation des arbres de décision." Master's thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/37875.
Full textLa gestion intégrée de l’eau souterraine constitue un défi majeur pour les activités industrielles, agricoles et domestiques. Dans certains systèmes agricoles, une gestion optimisée de la nappe phréatique représente un facteur important pour améliorer les rendements des cultures et l’utilisation de l'eau. La prévision de la profondeur de la nappe phréatique (PNP) devient l’une des stratégies utiles pour planifier et gérer en temps réel l’eau souterraine. Cette étude propose une approche de modélisation basée sur les arbres de décision pour prédire la PNP en fonction des précipitations, des précédentes PNP et de l'évapotranspiration pour la gestion de l’eau souterraine des champs de canneberges. Premièrement, deux modèles: « Random Forest (RF) » et « Extreme Gradient Boosting (XGB) » ont été paramétrisés et comparés afin de prédirela PNP jusqu'à 48 heures. Deuxièmement, l’importance des variables prédictives a été déterminée pour analyser leur influence sur la simulation de PNP. Les mesures de PNP de trois puits d'observation dans un champ de canneberges, pour la période de croissance du 8 juillet au 30 août 2017, ont été utilisées pour entraîner et valider les modèles. Des statistiques tels que l’erreur quadratique moyenne, le coefficient de détermination et le coefficient d’efficacité de Nash-Sutcliffe sont utilisés pour mesurer la performance des modèles. Les résultats montrent que l'algorithme XGB est plus performant que le modèle RF pour prédire la PNP et est sélectionné comme le modèle optimal. Parmi les variables prédictives, les valeurs précédentes de PNP étaient les plus importantes pour la simulation de PNP, suivie par la précipitation. L’erreur de prédiction du modèle optimal pour la plage de PNP était de ± 5 cm pour les simulations de 1, 12, 24, 36 et 48 heures. Le modèle XGB fournit des informations utiles sur la dynamique de PNP et une simulation rigoureuse pour la gestion de l’irrigation des canneberges.
Integrated ground water management is a major challenge for industrial, agricultural and domestic activities. In some agricultural production systems, optimized water table management represents a significant factor to improve crop yields and water use. Therefore, predicting water table depth (WTD) becomes an important means to enable real-time planning and management of groundwater resources. This study proposes a decision-tree-based modelling approach for WTD forecasting as a function of precipitation, previous WTD values and evapotranspiration with applications in groundwater resources management for cranberry farming. Firstly, two models-based decision trees, namely Random Forest (RF) and Extrem Gradient Boosting (XGB), were parameterized and compared to predict the WTD up to 48-hours ahead for a cranberry farm located in Québec, Canada. Secondly, the importance of the predictor variables was analyzed to determine their influence on WTD simulation results. WTD measurements at three observation wells within acranberry field, for the growing period from July 8, 2017 to August 30, 2017, were used for training and testing the models. Statistical parameters such as the mean squared error, coefficient of determination and Nash-Sutcliffe efficiency coefficient were used to measure models performance. The results show that the XGB algorithm outperformed the RF model for predictions of WTD and was selected as the optimal model. Among the predictor variables, the antecedent WTD was the most important for water table depth simulation, followed by the precipitation. Base on the most important variables and optimal model, the prediction error for entire WTD range was within ± 5 cm for 1-, 12-, 24-, 26-and 48-hour prediction. The XGB model can provide useful information on the WTD dynamics and a rigorous simulation for irrigation planning and management in cranberry fields.
Integrated ground water management is a major challenge for industrial, agricultural and domestic activities. In some agricultural production systems, optimized water table management represents a significant factor to improve crop yields and water use. Therefore, predicting water table depth (WTD) becomes an important means to enable real-time planning and management of groundwater resources. This study proposes a decision-tree-based modelling approach for WTD forecasting as a function of precipitation, previous WTD values and evapotranspiration with applications in groundwater resources management for cranberry farming. Firstly, two models-based decision trees, namely Random Forest (RF) and Extrem Gradient Boosting (XGB), were parameterized and compared to predict the WTD up to 48-hours ahead for a cranberry farm located in Québec, Canada. Secondly, the importance of the predictor variables was analyzed to determine their influence on WTD simulation results. WTD measurements at three observation wells within acranberry field, for the growing period from July 8, 2017 to August 30, 2017, were used for training and testing the models. Statistical parameters such as the mean squared error, coefficient of determination and Nash-Sutcliffe efficiency coefficient were used to measure models performance. The results show that the XGB algorithm outperformed the RF model for predictions of WTD and was selected as the optimal model. Among the predictor variables, the antecedent WTD was the most important for water table depth simulation, followed by the precipitation. Base on the most important variables and optimal model, the prediction error for entire WTD range was within ± 5 cm for 1-, 12-, 24-, 26-and 48-hour prediction. The XGB model can provide useful information on the WTD dynamics and a rigorous simulation for irrigation planning and management in cranberry fields.
Yang, Gen. "Modèles prudents en apprentissage statistique supervisé." Thesis, Compiègne, 2016. http://www.theses.fr/2016COMP2263/document.
Full textIn some areas of supervised machine learning (e.g. medical diagnostics, computer vision), predictive models are not only evaluated on their accuracy but also on their ability to obtain more reliable representation of the data and the induced knowledge, in order to allow for cautious decision making. This is the problem we studied in this thesis. Specifically, we examined two existing approaches of the literature to make models and predictions more cautious and more reliable: the framework of imprecise probabilities and the one of cost-sensitive learning. These two areas are both used to make models and inferences more reliable and cautious. Yet few existing studies have attempted to bridge these two frameworks due to both theoretical and practical problems. Our contributions are to clarify and to resolve these problems. Theoretically, few existing studies have addressed how to quantify the different classification errors when set-valued predictions are produced and when the costs of mistakes are not equal (in terms of consequences). Our first contribution has been to establish general properties and guidelines for quantifying the misclassification costs for set-valued predictions. These properties have led us to derive a general formula, that we call the generalized discounted cost (GDC), which allow the comparison of classifiers whatever the form of their predictions (singleton or set-valued) in the light of a risk aversion parameter. Practically, most classifiers basing on imprecise probabilities fail to integrate generic misclassification costs efficiently because the computational complexity increases by an order (or more) of magnitude when non unitary costs are used. This problem has led to our second contribution, the implementation of a classifier that can manage the probability intervals produced by imprecise probabilities and the generic error costs with the same order of complexity as in the case where standard probabilities and unitary costs are used. This is to use a binary decomposition technique, the nested dichotomies. The properties and prerequisites of this technique have been studied in detail. In particular, we saw that the nested dichotomies are applicable to all imprecise probabilistic models and they reduce the imprecision level of imprecise models without loss of predictive power. Various experiments were conducted throughout the thesis to illustrate and support our contributions. We characterized the behavior of the GDC using ordinal data sets. These experiences have highlighted the differences between a model based on standard probability framework to produce indeterminate predictions and a model based on imprecise probabilities. The latter is generally more competent because it distinguishes two sources of uncertainty (ambiguity and the lack of information), even if the combined use of these two types of models is also of particular interest as it can assist the decision-maker to improve the data quality or the classifiers. In addition, experiments conducted on a wide variety of data sets showed that the use of nested dichotomies significantly improves the predictive power of an indeterminate model with generic costs
Brochero, Darwin. "Hydroinformatics and diversity in hydrological ensemble prediction systems." Thesis, Université Laval, 2013. http://www.theses.ulaval.ca/2013/29908/29908.pdf.
Full textIn this thesis, we tackle the problem of streamflow probabilistic forecasting from two different perspectives based on multiple hydrological models collaboration (diversity). The first one favours a hybrid approach for the evaluation of multiple global hydrological models and tools of machine learning for predictors selection, while the second one constructs Artificial Neural Network (ANN) ensembles, forcing diversity within. This thesis is based on the concept of diversity for developing different methodologies around two complementary problems. The first one focused on simplifying, via members selection, a complex Hydrological Ensemble Prediction System (HEPS) that has 800 daily forecast scenarios originating from the combination of 50 meteorological precipitation members and 16 global hydrological models. We explore in depth four techniques: Linear Correlation Elimination, Mutual Information, Backward Greedy Selection, and Nondominated Sorting Genetic Algorithm II (NSGA-II). We propose the optimal hydrological model participation concept that identifies the number of meteorological representative members to propagate into each hydrological model in the simplified HEPS scheme. The second problem consists in the stratified selection of data patterns that are used for training an ANN ensemble or stack. For instance, taken from the database of the second and third MOdel Parameter Estimation eXperiment (MOPEX) workshops, we promoted an ANN prediction stack in which each predictor is trained on input spaces defined by the Input Variable Selection application on different stratified sub-samples. In summary, we demonstrated that implicit diversity in the configuration of a HEPS is efficient in the search for a HEPS of high performance.
Cablé, Baptiste. "Vers la gestion de scénarios pour la reconnaissance et l'anticipation de situations dynamiques." Troyes, 2011. http://www.theses.fr/2011TROY0007.
Full textOur study deals with the problem of recognition and anticipation of dynamic situations for user assistance. Existing tools like Hidden Markov Models or Petri Nets are already used in this context. However, learning this kind of models is complicated and slow. Thus, the de-signer has to specify every model of situation so that the program can work in real-time. Our solution is a generic algorithm which build itself the representation of the dynamic system. It adapts to the user and the situation in order to make predictions. Dynamic situations are modeled by scenarios. A scenario corresponds to a period during which every event has an influence on other. It is made of an ordered series of states and actions in the form of symbols. The algorithm is a kind of Case-Based Reason-ing method but some modifications are made. Representations and computations are oriented towards simplicity and speed. Moreover, the algorithm is suitable for problems which evolve in time. The approach is applied to two distinct fields. The first application consists in assisting the user of a powered wheelchair. Without knowing initially the environment, the algorithm memorizes the usual paths of the user. This knowledge is used to drive automatically the wheelchair during usual paths. The second ap-plication is dedicated to the assistance of novice players in a multi-player online game. Experience of dynamic situations is learned from all the players and is used to predict the consequences of every battle
Bako, Maria. "Utilisation de l'ordinateur pour le développemnt de la vision spatiale." Toulouse 3, 2006. http://www.theses.fr/2006TOU30041.
Full textThe aim of this thesis is to decide that the computer programs can help in improvement of spatial intelligence. At first we examined that the computer programs could replace the models in the education, or not. The aim of the first experiment was to compare the result of tests based on programs and models about plane sections. The result indicates that it is not enough to rattle off the solutions, but students need to work up the computer-generated answers to burn into their mind. To improve the student's spatial abilities we prepared several programs to generate different kinds of spatial problems, and correct their answers. The programs generating the tests were written in Javascript and were embedded in the source of the HTML pages, as well the routines of checking. Our experiments show by using these programs the students' results are getting better and better, so we can improve their spatial intelligence, moreover the students like to use computer programs to study spatial geometry
Cherif, Aymen. "Réseaux de neurones, SVM et approches locales pour la prévision de séries temporelles." Thesis, Tours, 2013. http://www.theses.fr/2013TOUR4003/document.
Full textTime series forecasting is a widely discussed issue for many years. Researchers from various disciplines have addressed it in several application areas : finance, medical, transportation, etc. In this thesis, we focused on machine learning methods : neural networks and SVM. We have also been interested in the meta-methods to push up the predictor performances, and more specifically the local models. In a divide and conquer strategy, the local models perform a clustering over the data sets before different predictors are affected into each obtained subset. We present in this thesis a new algorithm for recurrent neural networks to use them as local predictors. We also propose two novel clustering techniques suitable for local models. The first is based on Kohonen maps, and the second is based on binary trees
Ben, Hassine Nesrine. "Machine Learning for Network Resource Management." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV061.
Full textAn intelligent exploitation of data carried on telecom networks could lead to a very significant improvement in the quality of experience (QoE) for the users. Machine Learning techniques offer multiple operating, which can help optimize the utilization of network resources.In this thesis, two contexts of application of the learning techniques are studied: Wireless Sensor Networks (WSNs) and Content Delivery Networks (CDNs). In WSNs, the question is how to predict the quality of the wireless links in order to improve the quality of the routes and thus increase the packet delivery rate, which enhances the quality of service offered to the user. In CDNs, it is a matter of predicting the popularity of videos in order to cache the most popular ones as close as possible to the users who request them, thereby reducing latency to fulfill user requests.In this work, we have drawn upon learning techniques from two different domains, namely statistics and Machine Learning. Each learning technique is represented by an expert whose parameters are tuned after an off-line analysis. Each expert is responsible for predicting the next metric value (i.e. popularity for videos in CDNs, quality of the wireless link for WSNs). The accuracy of the prediction is evaluated by a loss function, which must be minimized. Given the variety of experts selected, and since none of them always takes precedence over all the others, a second level of expertise is needed to provide the best prediction (the one that is the closest to the real value and thus minimizes a loss function). This second level is represented by a special expert, called a forecaster. The forecaster provides predictions based on values predicted by a subset of the best experts.Several methods are studied to identify this subset of best experts. They are based on the loss functions used to evaluate the experts' predictions and the value k, representing the k best experts. The learning and prediction tasks are performed on-line on real data sets from a real WSN deployed at Stanford, and from YouTube for the CDN. The methodology adopted in this thesis is applied to predicting the next value in a series of values.More precisely, we show how the quality of the links can be evaluated by the Link Quality Indicator (LQI) in the WSN context and how the Single Exponential Smoothing (SES) and Average Moving Window (AMW) experts can predict the next LQI value. These experts react quickly to changes in LQI values, whether it be a sudden drop in the quality of the link or a sharp increase in quality. We propose two forecasters, Exponential Weighted Average (EWA) and Best Expert (BE), as well as the Expert-Forecaster combination to provide better predictions.In the context of CDNs, we evaluate the popularity of each video by the number of requests for this video per day. We use both statistical experts (ARMA) and experts from the Machine Learning domain (e.g. DES, polynomial regression). These experts are evaluated according to different loss functions. We also introduce forecasters that differ in terms of the observation horizon used for prediction, loss function and number of experts selected for predictions. These predictions help decide which videos will be placed in the caches close to the users. The efficiency of the caching technique based on popularity prediction is evaluated in terms of hit rate and update rate. We highlight the contributions of this caching technique compared to a classical caching algorithm, Least Frequently Used (LFU).This thesis ends with recommendations for the use of online and offline learning techniques for networks (WSN, CDN). As perspectives, we propose different applications where the use of these techniques would improve the quality of experience for mobile users (cellular networks) or users of IoT (Internet of Things) networks, based, for instance, on Time Slotted Channel Hopping (TSCH)
Neumann, Andreas. "Introduction d'outils de l'intelligence artificielle dans la prévision de pluie par radar." Phd thesis, Ecole Nationale des Ponts et Chaussées, 1991. http://tel.archives-ouvertes.fr/tel-00520834.
Full textCaigny, Arno de. "Innovation in customer scoring for the financial services industry." Thesis, Lille, 2019. http://www.theses.fr/2019LIL1A011.
Full textThis dissertation improves customer scoring. Customer scoring is important for companies in their decision making processes because it helps to solve key managerial issues such as the decision of which customers to target for a marketing campaign or the assessment of customer that are likely to leave the company. The research in this dissertation makes several contributions in three areas of the customer scoring literature. First, new sources of data are used to score customers. Second, methodology to go from data to decisions is improved. Third, customer life event prediction is proposed as a new application of customer scoring
Gerchinovitz, Sébastien. "Prédiction de suites individuelles et cadre statistique classique : étude de quelques liens autour de la régression parcimonieuse et des techniques d'agrégation." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00653550.
Full textLam, Chi-Nguyen. "Méthodes de Machine Learning pour le suivi de l'occupation du sol des deltas du Viêt-Nam." Thesis, Brest, 2021. http://www.theses.fr/2021BRES0074.
Full textSocio-economic development in Vietnam is associated with the existence of large fluvial deltas. Furthermore, environmental factors such as dryness and flooding have an important role in the change of land use/land cover within these deltas. These changes have an impact on the natural and economic balance of the country. In this perspective, the objectives of the present thesis are to suggest processing methods of satellite data for an efficient mapping and monitoring of land use in the two main deltas of Vietnam, the Red River and the Mekong Delta. Indeed, experimental work has been carried out by verifying and evaluating the contribution of multi-sensor image processing through various image segmentation approaches and machine/deep learning algorithms. Thus, a Convolutional Neural Network (CNN) model adapted to the context of the study demonstrated its robustness for the detection and mapping of land use in order to characterise the flood hazard and analyse the issues at risk
Thomas, Julien. "Apprentissage supervisé de données déséquilibrées par forêt aléatoire." Thesis, Lyon 2, 2009. http://www.theses.fr/2009LYO22004/document.
Full textThe problem of imbalanced datasets in supervised learning has emerged relatively recently, since the data mining has become a technology widely used in industry. The assisted medical diagnosis, the detection of fraud, abnormal phenomena, or specific elements on satellite imagery, are examples of industrial applications based on supervised learning of imbalanced datasets. The goal of our work is to bring supervised learning process on this issue. We also try to give an answer about the specific requirements of performance often related to the problem of imbalanced datasets, such as a high recall rate for the minority class. This need is reflected in our main application, the development of software to help radiologist in the detection of breast cancer. For this, we propose new methods of amending three different stages of a learning process. First in the sampling stage, we propose in the case of a bagging, to replaced classic bootstrap sampling by a guided sampling. Our techniques, FUNSS and LARSS use neighbourhood properties for the selection of objects. Secondly, for the representation space, our contribution is a method of variables construction adapted to imbalanced datasets. This method, the algorithm FuFeFa, is based on the discovery of predictive association rules. Finally, at the stage of aggregation of base classifiers of a bagging, we propose to optimize the majority vote in using weightings. For this, we have introduced a new quantitative measure of model assessment, PRAGMA, which allows taking into account user specific needs about recall and precision rates of each class
Fernandez, Tamayo Borja. "L'importance des données textuelles dans le Capital Privé. Prévision des rendements des fonds, grâce à l'intelligence artificielle, à partir des documents envoyés par les gestionnaires de fonds pre et post investissement." Thesis, Université Côte d'Azur, 2022. http://theses.univ-cotedazur.fr/2022COAZ0033.
Full textPrivate equity AUM rose from less than 1 trillion in 2004 to over 10 trillion in 2021. This large market is dominated by institutional investors who spend many resources on investment selection and monitoring. Investors receive a Private Placement Memorandums (PPM), which defines the fund offering to investors. Previous literature is limited to quantitative information such as the track record and manager's experience available in the PPM. After investing in a fund, Limited Partners (LP) receive regular updates from General Partners (GPs) who invest on behalf of LPs. These reports include quantitative information and a letter describing the funds' investing, value creation, and exiting investments. While the quantitative information of these reports and its association with future fund returns has been explored thoroughly, the qualitative content in the letter has not. This study examines the importance of the PPM text detailing investing approaches (Chapters 1 and 2) and the investor letter (Chapter 3) in explaining fund performance and fundraising success. Chapter 1 examines the relationship between investment approach readability and fund returns using 373 PPMs. We use several readability measures suggested by accounting and finance literature to evaluate the readability of the investment approach descriptions. In line with the management obfuscation hypothesis, we establish a negative link between the investment approach description and fund returns for fund managers with bad performance at the time of a new fund's fundraising. This effect is resilient to multiple measures of track record quality. We examine the association between the readability of the investment approach description and the number of days needed to reach the final fund closing (fundraising speed). Our data imply that the investment approach's readability is not linked with fundraising speed, in line with the intuition that investors don't use the textual information in PPMs to select funds. Our findings imply investors base investment decisions on quantitative information, mainly the GP's track record.Chapter 2 analyzes the potential of combining Natural Language Processing (NLP) and machine learning approaches to select and deselect funds based on the investment approach description. First, we use NLP to convert the investment approach description into numerical vectors as forecasting regressors. Then, we train machine learning models with funds raised before 2012. Finally, we test the algorithms' ability to predict 2012-2014 fund performance (i.e., not used to train the algorithms). Our machine learning models are 60% accurate. This means algorithms classify 60% of non-trained funds as outperformers or underperformers.These accuracy rates are robust when backtesting models with funds raised before 2008 and after 2011. After controlling for other fund performance factors, we find a positive relationship between algorithm-predicted probability of success and fund returns. Finally, we show that using machine learning algorithms to select fund managers generates higher returns.Chapter 3 examines the link between managerial tone in investor letters and fund returns. We gather GP's sentiment with FinBERT, a neural network-based system trained to assess the sentiment of a sentence. We then explore whether managerial tone predicts future fund returns. Our data reveal that managerial tone is associated with the returns two years after a letter is issued. Finally, because managers need a new fund to continue investing and earning fees, we examine the GP's tone when raising a new fund. We find that managers with bad performance and low reputation at risk (i.e., young managers) employ an excessively optimistic tone while raising a new fund, suggesting they inflate their tone to secure successful fundraising. This finding proves the presence of agency costs between fund managers and investors due to information asymmetries
Çinar, Yagmur Gizem. "Prédiction de séquences basée sur des réseaux de neurones récurrents dans le contexte des séries temporelles et des sessions de recherche d'information." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM079.
Full textThis thesis investigates challenges of sequence prediction in different scenarios such as sequence prediction using recurrent neural networks (RNNs) in the context of time series and information retrieval (IR) search sessions. Predicting the unknown values that follow some previously observed values is basically called sequence prediction.It is widely applicable to many domains where a sequential behavior is observed in the data. In this study, we focus on two different types of sequence prediction tasks: time series forecasting and next query prediction in an information retrieval search session.Time series often display pseudo-periods, i.e. time intervals with strong correlation between values of time series. Seasonal changes in weather time series or electricity usage at day and night time are some examples of pseudo-periods. In a forecasting scenario, pseudo-periods correspond to the difference between the positions of the output being predicted and specific inputs.In order to capture periods in RNNs, one needs a memory of the input sequence. Sequence-to-sequence RNNs (with attention mechanism) reuse specific (representations of) input values to predict output values. Sequence-to-sequence RNNs with an attention mechanism seem to be adequate for capturing periods. In this manner, we first explore the capability of an attention mechanism in that context. However, according to our initial analysis, a standard attention mechanism did not perform well to capture the periods. Therefore, we propose a period-aware content-based attention RNN model. This model is an extension of state-of-the-art sequence-to-sequence RNNs with attention mechanism and it is aimed to capture the periods in time series with or without missing values.Our experimental results with period-aware content-based attention RNNs show significant improvement on univariate and multivariate time series forecasting performance on several publicly available data sets.Another challenge in sequence prediction is the next query prediction. The next query prediction helps users to disambiguate their search query, to explore different aspects of the information they need or to form a precise and succint query that leads to higher retrieval performance. A search session is dynamic, and the information need of a user might change over a search session as a result of the search interactions. Furthermore, interactions of a user with a search engine influence the user's query reformulations. Considering this influence on the query formulations, we first analyze where the next query words come from? Using the analysis of the sources of query words, we propose two next query prediction approaches: a set view and a sequence view.The set view adapts a bag-of-words approach using a novel feature set defined based on the sources of next query words analysis. Here, the next query is predicted using learning to rank. The sequence view extends a hierarchical RNN model by considering the sources of next query words in the prediction. The sources of next query words are incorporated by using an attention mechanism on the interaction words. We have observed using sequence approach, a natural formulation of the problem, and exploiting all sources of evidence lead to better next query prediction
Faouzi, Johann. "Machine learning to predict impulse control disorders in Parkinson's disease." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS048.
Full textImpulse control disorders are a class of psychiatric disorders characterized by impulsivity. These disorders are common during the course of Parkinson's disease, decrease the quality of life of subjects, and increase caregiver burden. Being able to predict which individuals are at higher risk of developing these disorders and when is of high importance. The objective of this thesis is to study impulse control disorders in Parkinson's disease from the statistical and machine learning points of view, and can be divided into two parts. The first part consists in investigating the predictive performance of the altogether factors associated with these disorders in the literature. The second part consists in studying the association and the usefulness of other factors, in particular genetic data, to improve the predictive performance
Alaoui, Ismaili Oumaima. "Clustering prédictif Décrire et prédire simultanément." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLA010.
Full textPredictive clustering is a new supervised learning framework derived from traditional clustering. This new framework allows to describe and to predict simultaneously. Compared to a classical supervised learning, predictive clsutering algorithms seek to discover the internal structure of the target class in order to use it for predicting the class of new instances.The purpose of this thesis is to look for an interpretable model of predictive clustering. To acheive this objective, we choose to modified traditional K-means algorithm. This new modified version is called predictive K-means. It contains 7 differents steps, each of which can be supervised seperatly from the others. In this thesis, we only deal four steps : 1) data preprocessing, 2) initialization of centers, 3) selecting of the best partition, and 4) importance of features.Our experimental results show that the use of just two supervised steps (data preprocessing and initialization of centers), allow the K-means algorithm to acheive competitive performances with some others predictive clustering algorithms.These results show also that our preprocessing methods can help predictive K-means algorithm to provide results easily comprehensible by users. We are also showing in this thesis that the use of our new measure to evaluate predictive clustering quality, helps our predictive K-means algorithm to find the optimal partition that establishes the best trade-off between description and prediction. It thus allows users to find the different reasons behind the same prediction : two differents instances could have the same predicted label
Brégère, Margaux. "Stochastic bandit algorithms for demand side management Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders Online Hierarchical Forecasting for Power Consumption Data Target Tracking for Contextual Bandits : Application to Demand Side Management." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM022.
Full textAs electricity is hard to store, the balance between production and consumption must be strictly maintained. With the integration of intermittent renewable energies into the production mix, the management of the balance becomes complex. At the same time, the deployment of smart meters suggests demand response. More precisely, sending signals - such as changes in the price of electricity - would encourage users to modulate their consumption according to the production of electricity. The algorithms used to choose these signals have to learn consumer reactions and, in the same time, to optimize them (exploration-exploration trade-off). Our approach is based on bandit theory and formalizes this sequential learning problem. We propose a first algorithm to control the electrical demand of a homogeneous population of consumers and offer T⅔ upper bound on its regret. Experiments on a real data set in which price incentives were offered illustrate these theoretical results. As a “full information” dataset is required to test bandit algorithms, a consumption data generator based on variational autoencoders is built. In order to drop the assumption of the population homogeneity, we propose an approach to cluster households according to their consumption profile. These different works are finally combined to propose and test a bandit algorithm for personalized demand side management
El, Garrab Hamza. "Amélioration de la chaine logistique de pièces de rechange en boucle fermée : application des modèles d’apprentissage." Thesis, Angers, 2020. http://www.theses.fr/2020ANGE0019.
Full textIn the field of after-sales service and particularly in maintenance, the quick intervention and repair of the customer's property is a key element for his satisfaction and for the creation of the brand image in the market. The work presented in this thesis proposes a Big Data and Machine Learning approach for the improvement of the information flow in the spare parts supply chain. Our contribution focuses on load forecasting in spare parts repair centers, which are the main suppliers of parts used to repair customers' systems. The size of the supply chain and its complexity, the large number of part numbers as well as the multitude of special cases (countries with specific laws, special parts...) makes that classical approaches do not offer reliable forecasts for repair services. In this project, we propose learning algorithms allowing the construction of knowledge from large volumes of data, instead of manual implementation. We will see the models in the literature, present our methodology, and then implement the models and evaluate their performance in comparison with existing algorithms
Bahri, Emna. "Amélioration des procédures adaptatives pour l'apprentissage supervisé des données réelles." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20089/document.
Full textMachine learning often overlooks various difficulties when confronted real data. Indeed, these data are generally complex, voluminous, and heterogeneous, due to the variety of sources. Among these problems, the most well known concern the sensitivity of the algorithms to noise and unbalanced data. Overcoming these problems is a real challenge to improve the effectiveness of the learning process against real data. In this thesis, we have chosen to improve adaptive procedures (boosting) that are less effective in the presence of noise or with unbalanced data.First, we are interested in robustifying Boosting against noise. Most boosting procedures have contributed greatly to improve the predictive power of classifiers in data mining, but they are prone to noisy data. In this case, two problems arise, (1) the over-fitting due to the noisy examples and (2) the decrease of convergence rate of boosting. Against these two problems, we propose AdaBoost-Hybrid, an adaptation of the Adaboost algorithm that takes into account mistakes made in all the previous iteration. Experimental results are very promising.Then, we are interested in another difficult problem, the prediction when the class is unbalanced. Thus, we propose an adaptive method based on boosted associative classification. The interest of using associations rules is allowing the focus on small groups of cases, which is well suited for unbalanced data. This method relies on 3 contributions: (1) FCP-Growth-P, a supervised algorithm for extracting class frequent itemsets, derived from FP-Growth by introducing the condition of pruning based on counter-examples to specify rules, (2) W-CARP associative classification method which aims to give results at least equivalent to those of existing approaches but in a faster manner, (3) CARBoost, a classification method that uses adaptive associative W-CARP as weak classifier. Finally, in a chapter devoted to the specific application of intrusion’s detection, we compared the results of AdaBoost-Hybrid and CARBoost to those of reference methods (data KDD Cup 99)
Daouayry, Nassia. "Détection d’évènements anormaux dans les gros volumes de données d’utilisation issues des hélicoptères." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI084.
Full textThis thesis addresses the topic of the normality of the helicopter component systems functioning through the exploitation of the usage data coming from the HUMS (Health and Usage Monitoring System) for the maintenance. Helicopters are complex systems and are subject to strict regulatory requirements imposed by the authorities in charge of flight safety. The analysis of monitoring data is therefore a preferred means of improving helicopter maintenance. In addition, the data produced by the HUMS system are an indispensable resource for assessing the health of the systems after each flight. The data collected are numerous and the complexity of the different systems makes it difficult to analyze them on a case-by-case basis.The work of this thesis deals mainly with the issues related to the utilization of multivariate series for the visualization and the implementation of anomaly detection tools within Airbus Helicopters.We have developed different approaches to catch in the flight data a relative normality for a given system.A work on the visualization of time series has been developed to identify the patterns representing the normality of a system's operation.Based on this approach, we have developed a "virtual sensor" allowing to estimate the values of a real sensor from a set of flight parameters in order to detect abnormal events when the values of these two sensors tend to diverge
Tabarly, Guilhem. "The Financial Cycle and the Business Cycle : it Takes Two to Tango." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLED007.
Full textThe interplay between financial factors and the real economy is now a focal point of macroeconomic research. The introductory chapter seeks to provide a conceptual framework for the study of macro-financial linkages. The rest of the thesis falls within the impetus to research programs brought to the fore by the recent crisis. The second chapter claims that the Financial Cycle is made up of two different components, the Credit Cycle and the Financial Condition Cycle. The two cycles are identified in the light of their impact on economic activity and their relevance is assessed on the grounds of their contribution for the real-time estimation of the output gap. The third chapter uses a datadriven technique to unravel the contemporaneous causal ordering between economic variables and financial variables and investigates the impact of structural financial shocks on economic activity. The final chapter explores, via a battery of econometric and Machine Learning models, whether the inherently unstable nature of financial variables’ predictive power for output is related to the modelling framework or to the variables themselves
Dohmatob, Elvis. "Amélioration de connectivité fonctionnelle par utilisation de modèles déformables dans l'estimation de décompositions spatiales des images de cerveau." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS297/document.
Full textMapping the functions of the human brain using fMRI data has become a very active field of research. However, the available theoretical and practical tools are limited and many important tasks like the empirical definition of functional brain networks, are difficult to implement due to lack of a framework for statistical modelling of such networks. We propose to develop at the population level, models that jointly perform estimation of functional connectivity and alignment the brain data across the different individuals / subjects in the population. Building upon such a contribution, we will develop new methods for statistical inference to help compare functional connectivity across different individuals in the presence of noise (scanner noise, physiological noise, etc.)
Richard, Michael. "Évaluation et validation de prévisions en loi." Thesis, Orléans, 2019. http://www.theses.fr/2019ORLE0501.
Full textIn this thesis, we study the evaluation and validation of predictive densities. In a first part, we are interested in the contribution of machine learning in the field of quantile and densityforecasting. We use some machine learning algorithms in quantile forecasting framework with real data, inorder to highlight the efficiency of particular method varying with nature of the data.In a second part, we expose some validation tests of predictive densities present in the literature. Asillustration, we use two of the mentionned tests on real data concerned about stock indexes log-returns.In the third part, we address the calibration constraint of probability forecasting. We propose a generic methodfor recalibration, which allows us to enforce this constraint. Thus, it permits to simplify the choice betweensome density forecasts. It remains to be known the impact on forecast quality, measured by predictivedistributions sharpness, or specific scores. We show that the impact on the Continuous Ranked ProbabilityScore (CRPS) is weak under some hypotheses and that it is positive under more restrictive ones. We use ourmethod on weather and electricity price ensemble forecasts.Keywords : Density forecasting, quantile forecasting, machine learning, validity tests, calibration, bias correction,PIT series , Pinball-Loss, CRPS
Wohlfarth, Till. "Machine-learning pour la prédiction des prix dans le secteur du tourisme en ligne." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0090/document.
Full textThe goal of this paper is to consider the design of decision-making tools in the context of varying travel prices from the customer’s perspective. Based on vast streams of heterogeneous historical data collected through the internet, we describe here two approaches to forecasting travel price changes at a given horizon, taking as input variables a list of descriptive characteristics of the flight, together with possible features of the past evolution of the related price series. Though heterogeneous in many respects ( e.g. sampling, scale), the collection of historical prices series is here represented in a unified manner, by marked point processes (MPP). State-of-the-art supervised learning algorithms, possibly combined with a preliminary clustering stage, grouping flights whose related price series exhibit similar behavior, can be next used in order to help the customer to decide when to purchase her/his ticket