Auswahl der wissenschaftlichen Literatur zum Thema „Forêt d'arbres décisionnels“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Inhaltsverzeichnis
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Forêt d'arbres décisionnels" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Dissertationen zum Thema "Forêt d'arbres décisionnels"
Chuchuk, Olga. „Optimisation de l'accès aux données au CERN et dans la Grille de calcul mondiale pour le LHC (WLCG)“. Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4005.
Der volle Inhalt der QuelleThe Worldwide LHC Computing Grid (WLCG) offers an extensive distributed computing infrastructure dedicated to the scientific community involved with CERN's Large Hadron Collider (LHC). With storage that totals roughly an exabyte, the WLCG addresses the data processing and storage requirements of thousands of international scientists. As the High-Luminosity LHC phase approaches, the volume of data to be analysed will increase steeply, outpacing the expected gain through the advancement of storage technology. Therefore, new approaches to effective data access and management, such as caches, become essential. This thesis delves into a comprehensive exploration of storage access within the WLCG, aiming to enhance the aggregate science throughput while limiting the cost. Central to this research is the analysis of real file access logs sourced from the WLCG monitoring system, highlighting genuine usage patterns.In a scientific setting, caching has profound implications. Unlike more commercial applications such as video streaming, scientific data caches deal with varying file sizes—from a mere few bytes to multiple terabytes. Moreover, the inherent logical associations between files considerably influence user access patterns. Traditional caching research has predominantly revolved around uniform file sizes and independent reference models. Contrarily, scientific workloads encounter variances in file sizes, and logical file interconnections significantly influence user access patterns.My investigations show how LHC's hierarchical data organization, particularly its compartmentalization into datasets, impacts request patterns. Recognizing the opportunity, I introduce innovative caching policies that emphasize dataset-specific knowledge, and compare their effectiveness with traditional file-centric strategies. Furthermore, my findings underscore the "delayed hits" phenomenon triggered by limited connectivity between computing and storage locales, shedding light on its potential repercussions for caching efficiency.Acknowledging the long-standing challenge of predicting Data Popularity in the High Energy Physics (HEP) community, especially with the upcoming HL-LHC era's storage conundrums, my research integrates Machine Learning (ML) tools. Specifically, I employ the Random Forest algorithm, known for its suitability with Big Data. By harnessing ML to predict future file reuse patterns, I present a dual-stage method to inform cache eviction policies. This strategy combines the power of predictive analytics and established cache eviction algorithms, thereby devising a more resilient caching system for the WLCG. In conclusion, this research underscores the significance of robust storage services, suggesting a direction towards stateless caches for smaller sites to alleviate complex storage management requirements and open the path to an additional level in the storage hierarchy. Through this thesis, I aim to navigate the challenges and complexities of data storage and retrieval, crafting more efficient methods that resonate with the evolving needs of the WLCG and its global community
Caron, Maxime. „Données confidentielles : génération de jeux de données synthétisés par forêts aléatoires pour des variables catégoriques“. Master's thesis, Université Laval, 2015. http://hdl.handle.net/20.500.11794/25935.
Der volle Inhalt der QuelleConfidential data are very common in statistics nowadays. One way to treat them is to create partially synthetic datasets for data sharing. We will present an algorithm based on random forest to generate such datasets for categorical variables. We are interested by the formula used to make inference from multiple synthetic dataset. We show that the order of the synthesis has an impact on the estimation of the variance with the formula. We propose a variant of the algorithm inspired by differential privacy, and show that we are then not able to estimate a regression coefficient nor its variance. We show the impact of synthetic datasets on structural equations modeling. One conclusion is that the synthetic dataset does not really affect the coefficients between latent variables and measured variables.
Rancourt, Marie-Pierre. „Programmes d'aide à l'emploi et solidarité sociale : analyse causale des effets de la participation par l'approche des forêts aléatoires“. Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67007.
Der volle Inhalt der QuelleIn this thesis, we assess the effect of employment assistance programs on the number of exits from social assistance and the cumulative duration spent outside of it among beneficiaries living with severe constraints. It is obvious that not all beneficiaries will derive the same benefits from participating in a program and for this reason it is useful to assess treatment effects conditional on the characteristics of each individual. To answer the research question, we need a flexible method that allows us to estimate differentiated treatment effects based on individual characteristics. To do this, we use a machine learning technique called generalized random forests (grf ) allowing us to evaluate heterogeneous treatment effects by conditioning on the characteristics of individuals. We used a database provided by the Ministère du Travail, de l’Emploi et de la Solidarité sociale (MTESS) containing monthly observations of all recipients of social assistance between 1999 and 2018 in Quebec. Using the grf method and the MTESS database, we found that beneficiaries with the longest cumulative durations on social assistance had lower treatment effects than those with shorter durations. We also observed that the younger and more educated beneficiaries benefited more from program participation than the others. This is also the case for individuals who have an auditory diagnosis and those who do not have an organic diagnosis.
Djiemon, Deuga Anicet, und Deuga Anicet Djiemon. „Les forêts d'arbres décisionnels et la régression linéaire pour étudier les effets du sous-solage et des drains agricoles sur la hauteur des plants de maïs et les nappes d'eau dans un sol à perméabilité réduite“. Master's thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/34905.
Der volle Inhalt der QuelleLes travaux de sous-solage qui améliorent le drainage interne et décompactent des horizons rendus pratiquement imperméables par la compaction profonde seraient bénéfiques aux sols de faible perméabilité. Le sous-solage profond exécuté perpendiculairement aux drains avec un bélier (bulldozer) pourrait être plus efficace pour temporairement améliorer le drainage de ces sols qu’une sous-soleuse conventionnelle attelée à un tracteur et opérée en mode parallèle aux drains. Toutefois, les aménagements réalisés pour améliorer le drainage de surface et interne de ces sols rendent complexe l’évaluation de ces pratiques en dispositif expérimental. L’objectif principal de ce projet était de comparer les forêts d’arbres décisionnelles (FAD) à la régression linéaire multiple (RLM) pour détecter les effets du sous-solage et des systèmes de drainage souterrain et de surface sur la hauteur des plants et la profondeur moyenne de la nappe durant la saison de croissance. Un essai de sous solage a été réalisé à l’automne 2014, dans une argilelimoneuse Kamouraska naturellement mal drainée, remodelée en planches arrondies et souffrant de compaction importante. L’essai comparait un témoin sans sous-solage à quatre traitements de sous-solage, soit une sous-soleuse sur bélier ou sur tracteur, opérées parallèlement ou perpendiculairement aux drains. Chaque traitement a été répété trois fois et disposé aléatoirement en autant de blocs. Au printemps 2016, 198 puits ont été creusés à 60 cm de profondeur pour enregistrer la profondeur de la nappe sous chaque traitement entre juin et juillet 2016. La photogrammétrie a été utilisée pour estimer la hauteur des plants de maïs. Les FAD et la RLM permettent de détecter les principaux facteurs affectant la hauteur des plants de maïs et la profondeur moyenne de la nappe, soit les aménagements antérieurs pour améliorer le drainage interne et le drainage de surface des sols. Les coefficients de détermination obtenus avec les FAD (R2 ≥ 0,94) étaient toutefois plus élevés que ceux obtenus avec la RLM (R2 ≥ 0,28). Aucun traitement de sous-solage n’a amélioré significativement le drainage interne ni la hauteur des plants de maïs par rapport au témoin sans sous-solage. Les FAD permettent en outre de mieux visualiser les relations non linéaires entre les variables prédites et les autres variables, notamment la position sur la planche et la distance aux drains souterrains, et finalement de déterminer les distances aux drains souterrains optimales (< 2 m) et critiques (> 4 m), la distance optimale à la raie de curage (> 8 m) et la profondeur moyenne critique de la nappe (< 0,25 m). Les FAD permettent ainsi de prédire la hauteur des plants de maïs et la profondeur moyenne de la nappe avec une plus grande précision qu’avec la RLM.
Jabiri, Fouad. „Applications de méthodes de classification non supervisées à la détection d'anomalies“. Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67914.
Der volle Inhalt der QuelleIn this thesis, we will first present the binary tree partitioning algorithm and isolation forests. Binary trees are very popular classifiers in supervised machine learning. The isolation forest belongs to the family of unsupervised methods. It is an ensemble of binary trees used in common to isolate outlying instances. Subsequently, we will present the approach that we have named "Exponential smoothig" (or "pooling"). This technique consists in encoding sequences of variables of different lengths into a single vector of fixed size. Indeed, the objective of this thesis is to apply the algorithm of isolation forests to identify anomalies in insurance claim forms available in the database of a large Canadian insurance company in order to detect cases of fraud. However, a form is a sequence of claims. Each claim is characterized by a set of variables and thus it will be impossible to apply the isolation forest algorithm directly to this kind of data. It is for this reason that we are going to apply Exponential smoothing. Our application effectively isolates claims and abnormal forms, and we find that the latter tend to be audited by the company more often than regular forms.
Samarakoon, Prasad. „Random Regression Forests for Fully Automatic Multi-Organ Localization in CT Images“. Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM039/document.
Der volle Inhalt der QuelleLocating an organ in a medical image by bounding that particular organ with respect to an entity such as a bounding box or sphere is termed organ localization. Multi-organ localization takes place when multiple organs are localized simultaneously. Organ localization is one of the most crucial steps that is involved in all the phases of patient treatment starting from the diagnosis phase to the final follow-up phase. The use of the supervised machine learning technique called random forests has shown very encouraging results in many sub-disciplines of medical image analysis. Similarly, Random Regression Forests (RRF), a specialization of random forests for regression, have produced the state of the art results for fully automatic multi-organ localization.Although, RRF have produced state of the art results in multi-organ segmentation, the relative novelty of the method in this field still raises numerous questions about how to optimize its parameters for consistent and efficient usage. The first objective of this thesis is to acquire a thorough knowledge of the inner workings of RRF. After achieving the above mentioned goal, we proposed a consistent and automatic parametrization of RRF. Then, we empirically proved the spatial indenpendency hypothesis used by RRF. Finally, we proposed a novel RRF specialization called Light Random Regression Forests for multi-organ localization
Brédy, Jhemson, und Jhemson Brédy. „Prévision de la profondeur de la nappe phréatique d'un champ de canneberges à l'aide de deux approches de modélisation des arbres de décision“. Master's thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/37875.
Der volle Inhalt der QuelleLa gestion intégrée de l’eau souterraine constitue un défi majeur pour les activités industrielles, agricoles et domestiques. Dans certains systèmes agricoles, une gestion optimisée de la nappe phréatique représente un facteur important pour améliorer les rendements des cultures et l’utilisation de l'eau. La prévision de la profondeur de la nappe phréatique (PNP) devient l’une des stratégies utiles pour planifier et gérer en temps réel l’eau souterraine. Cette étude propose une approche de modélisation basée sur les arbres de décision pour prédire la PNP en fonction des précipitations, des précédentes PNP et de l'évapotranspiration pour la gestion de l’eau souterraine des champs de canneberges. Premièrement, deux modèles: « Random Forest (RF) » et « Extreme Gradient Boosting (XGB) » ont été paramétrisés et comparés afin de prédirela PNP jusqu'à 48 heures. Deuxièmement, l’importance des variables prédictives a été déterminée pour analyser leur influence sur la simulation de PNP. Les mesures de PNP de trois puits d'observation dans un champ de canneberges, pour la période de croissance du 8 juillet au 30 août 2017, ont été utilisées pour entraîner et valider les modèles. Des statistiques tels que l’erreur quadratique moyenne, le coefficient de détermination et le coefficient d’efficacité de Nash-Sutcliffe sont utilisés pour mesurer la performance des modèles. Les résultats montrent que l'algorithme XGB est plus performant que le modèle RF pour prédire la PNP et est sélectionné comme le modèle optimal. Parmi les variables prédictives, les valeurs précédentes de PNP étaient les plus importantes pour la simulation de PNP, suivie par la précipitation. L’erreur de prédiction du modèle optimal pour la plage de PNP était de ± 5 cm pour les simulations de 1, 12, 24, 36 et 48 heures. Le modèle XGB fournit des informations utiles sur la dynamique de PNP et une simulation rigoureuse pour la gestion de l’irrigation des canneberges.
Integrated ground water management is a major challenge for industrial, agricultural and domestic activities. In some agricultural production systems, optimized water table management represents a significant factor to improve crop yields and water use. Therefore, predicting water table depth (WTD) becomes an important means to enable real-time planning and management of groundwater resources. This study proposes a decision-tree-based modelling approach for WTD forecasting as a function of precipitation, previous WTD values and evapotranspiration with applications in groundwater resources management for cranberry farming. Firstly, two models-based decision trees, namely Random Forest (RF) and Extrem Gradient Boosting (XGB), were parameterized and compared to predict the WTD up to 48-hours ahead for a cranberry farm located in Québec, Canada. Secondly, the importance of the predictor variables was analyzed to determine their influence on WTD simulation results. WTD measurements at three observation wells within acranberry field, for the growing period from July 8, 2017 to August 30, 2017, were used for training and testing the models. Statistical parameters such as the mean squared error, coefficient of determination and Nash-Sutcliffe efficiency coefficient were used to measure models performance. The results show that the XGB algorithm outperformed the RF model for predictions of WTD and was selected as the optimal model. Among the predictor variables, the antecedent WTD was the most important for water table depth simulation, followed by the precipitation. Base on the most important variables and optimal model, the prediction error for entire WTD range was within ± 5 cm for 1-, 12-, 24-, 26-and 48-hour prediction. The XGB model can provide useful information on the WTD dynamics and a rigorous simulation for irrigation planning and management in cranberry fields.
Integrated ground water management is a major challenge for industrial, agricultural and domestic activities. In some agricultural production systems, optimized water table management represents a significant factor to improve crop yields and water use. Therefore, predicting water table depth (WTD) becomes an important means to enable real-time planning and management of groundwater resources. This study proposes a decision-tree-based modelling approach for WTD forecasting as a function of precipitation, previous WTD values and evapotranspiration with applications in groundwater resources management for cranberry farming. Firstly, two models-based decision trees, namely Random Forest (RF) and Extrem Gradient Boosting (XGB), were parameterized and compared to predict the WTD up to 48-hours ahead for a cranberry farm located in Québec, Canada. Secondly, the importance of the predictor variables was analyzed to determine their influence on WTD simulation results. WTD measurements at three observation wells within acranberry field, for the growing period from July 8, 2017 to August 30, 2017, were used for training and testing the models. Statistical parameters such as the mean squared error, coefficient of determination and Nash-Sutcliffe efficiency coefficient were used to measure models performance. The results show that the XGB algorithm outperformed the RF model for predictions of WTD and was selected as the optimal model. Among the predictor variables, the antecedent WTD was the most important for water table depth simulation, followed by the precipitation. Base on the most important variables and optimal model, the prediction error for entire WTD range was within ± 5 cm for 1-, 12-, 24-, 26-and 48-hour prediction. The XGB model can provide useful information on the WTD dynamics and a rigorous simulation for irrigation planning and management in cranberry fields.
Laqrichi, Safae. „Approche pour la construction de modèles d'estimation réaliste de l'effort/coût de projet dans un environnement incertain : application au domaine du développement logiciel“. Thesis, Ecole nationale des Mines d'Albi-Carmaux, 2015. http://www.theses.fr/2015EMAC0013/document.
Der volle Inhalt der QuelleSoftware effort estimation is one of the most important tasks in the management of software projects. It is the basis for planning, control and decision making. Achieving reliable estimates in projects upstream phases is a complex and difficult activity because, among others, of the lack of information about the project and its future, the rapid changes in the methods and technologies related to the software field and the lack of experience with similar projects. Many estimation models exist, but it is difficult to identify a successful model for all types of projects and that is applicable to all companies (different levels of experience, mastered technologies and project management practices). Overall, all of these models form the strong assumption that (1) the data collected are complete and sufficient, (2) laws linking the parameters characterizing the projects are fully identifiable and (3) information on the new project are certain and deterministic. However, in reality on the ground, that is difficult to be ensured.Two problems then emerge from these observations: how to select an estimation model for a specific company ? and how to conduct an estimate for a new project that presents uncertainties ?The work of this thesis interested in answering these questions by proposing a general estimation framework. This framework covers two phases: the construction phase of the estimation system and system usage phase for estimating new projects. The construction phase of the rating system consists of two processes: 1) evaluation and reliable comparison of different estimation models then selection the most suitable estimation model, 2) construction of a realistic estimation system from the selected estimation model and 3) use of the estimation system in estimating effort of new projects that are characterized by uncertainties. This approach acts as an aid to decision making for project managers in supporting the realistic estimate of effort, cost and time of their software projects. The implementation of all processes and practices developed as part of this work has given rise to an open-source computer prototype. The results of this thesis fall in the context of ProjEstimate FUI13 project
Mercadier, Mathieu. „Banking risk indicators, machine learning and one-sided concentration inequalities“. Thesis, Limoges, 2020. http://aurore.unilim.fr/theses/nxfile/default/a5bdd121-a1a2-434e-b7f9-598508c52104/blobholder:0/2020LIMO0001.pdf.
Der volle Inhalt der QuelleThis doctoral thesis is a collection of three essays aiming to implement, and if necessary to improve, financial risk measures and to assess banking risks, using machine learning methods. The first chapter offers an elementary formula inspired by CreditGrades, called E2C, estimating CDS spreads, whose accuracy is improved by a random forest algorithm. Our results emphasize the E2C's key role and the additional contribution of a specific company's debt rating and size. The second chapter infers a one-sided version of the inequality bounding the probability of a unimodal random variable. Our results show that the unimodal assumption for stock returns is generally accepted, allowing us to refine individual risk measures' bounds, to discuss implications for tail risk multipliers, and to infer simple versions of bounds of systemic measures. The third chapter provides a decision support tool clustering listed banks depending on their riskiness using an adjusted version of the k-means algorithm. This entirely automatic process is based on a very large set of stand-alone and systemic risk indicators reduced to representative factors. The obtained results are aggregated per country and region, offering the opportunity to study zones of fragility. They underline the importance of paying a particular attention to the ambiguous impact of banks' size on systemic measures
Truong, Arthur. „Analyse du contenu expressif des gestes corporels“. Thesis, Evry, Institut national des télécommunications, 2016. http://www.theses.fr/2016TELE0015/document.
Der volle Inhalt der QuelleNowadays, researches dealing with gesture analysis suffer from a lack of unified mathematical models. On the one hand, gesture formalizations by human sciences remain purely theoretical and are not inclined to any quantification. On the other hand, the commonly used motion descriptors are generally purely intuitive, and limited to the visual aspects of the gesture. In the present work, we retain Laban Movement Analysis (LMA – originally designed for the study of dance movements) as a framework for building our own gesture descriptors, based on expressivity. Two datasets are introduced: the first one is called ORCHESTRE-3D, and is composed of pre-segmented orchestra conductors’ gestures, which have been annotated with the help of lexicon of musical emotions. The second one, HTI 2014-2015, comprises sequences of multiple daily actions. In a first experiment, we define a global feature vector based upon the expressive indices of our model and dedicated to the characterization of the whole gesture. This descriptor is used for action recognition purpose and to discriminate the different emotions of our orchestra conductors’ dataset. In a second approach, the different elements of our expressive model are used as a frame descriptor (e.g., describing the gesture at a given time). The feature space provided by such local characteristics is used to extract key poses of the motion. With the help of such poses, we obtain a per-frame sub-representation of body motions which is available for real-time action recognition purpose