Thèses sur le sujet « Random Forest predictive model »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Random Forest predictive model.

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Random Forest predictive model ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.

1

Palczewska, Anna Maria. « Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology ». Thesis, University of Bradford, 2014. http://hdl.handle.net/10454/7349.

Texte intégral
Résumé :
This thesis is concerned with developing methodologies that enable existing models to be effectively reused. Results of this thesis are presented in the framework of Quantitative Structural-Activity Relationship (QSAR) models, but their application is much more general. QSAR models relate chemical structures with their biological, chemical or environmental activity. There are many applications that offer an environment to build and store predictive models. Unfortunately, they do not provide advanced functionalities that allow for efficient model selection and for interpretation of model predictions for new data. This thesis aims to address these issues and proposes methodologies for dealing with three research problems: model governance (management), model identification (selection), and interpretation of model predictions. The combination of these methodologies can be employed to build more efficient systems for model reuse in QSAR modelling and other areas. The first part of this study investigates toxicity data and model formats and reviews some of the existing toxicity systems in the context of model development and reuse. Based on the findings of this review and the principles of data governance, a novel concept of model governance is defined. Model governance comprises model representation and model governance processes. These processes are designed and presented in the context of model management. As an application, minimum information requirements and an XML representation for QSAR models are proposed. Once a collection of validated, accepted and well annotated models is available within a model governance framework, they can be applied for new data. It may happen that there is more than one model available for the same endpoint. Which one to chose? The second part of this thesis proposes a theoretical framework and algorithms that enable automated identification of the most reliable model for new data from the collection of existing models. The main idea is based on partitioning of the search space into groups and assigning a single model to each group. The construction of this partitioning is difficult because it is a bi-criteria problem. The main contribution in this part is the application of Pareto points for the search space partition. The proposed methodology is applied to three endpoints in chemoinformatics and predictive toxicology. After having identified a model for the new data, we would like to know how the model obtained its prediction and how trustworthy it is. An interpretation of model predictions is straightforward for linear models thanks to the availability of model parameters and their statistical significance. For non linear models this information can be hidden inside the model structure. This thesis proposes an approach for interpretation of a random forest classification model. This approach allows for the determination of the influence (called feature contribution) of each variable on the model prediction for an individual data. In this part, there are three methods proposed that allow analysis of feature contributions. Such analysis might lead to the discovery of new patterns that represent a standard behaviour of the model and allow additional assessment of the model reliability for new data. The application of these methods to two standard benchmark datasets from the UCI machine learning repository shows a great potential of this methodology. The algorithm for calculating feature contributions has been implemented and is available as an R package called rfFC.
Styles APA, Harvard, Vancouver, ISO, etc.
2

Stum, Alexander Knell. « Random Forests Applied as a Soil Spatial Predictive Model in Arid Utah ». DigitalCommons@USU, 2010. https://digitalcommons.usu.edu/etd/736.

Texte intégral
Résumé :
Initial soil surveys are incomplete for large tracts of public land in the western USA. Digital soil mapping offers a quantitative approach as an alternative to traditional soil mapping. I sought to predict soil classes across an arid to semiarid watershed of western Utah by applying random forests (RF) and using environmental covariates derived from Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and digital elevation models (DEM). Random forests are similar to classification and regression trees (CART). However, RF is doubly random. Many (e.g., 500) weak trees are grown (trained) independently because each tree is trained with a new randomly selected bootstrap sample, and a random subset of variables is used to split each node. To train and validate the RF trees, 561 soil descriptions were made in the field. An additional 111 points were added by case-based reasoning using aerial photo interpretation. As RF makes classification decisions from the mode of many independently grown trees, model uncertainty can be derived. The overall out of the bag (OOB) error was lower without weighting of classes; weighting increased the overall OOB error and the resulting output did not reflect soil-landscape relationships observed in the field. The final RF model had an OOB error of 55.2% and predicted soils on landforms consistent with soil-landscape relationships. The OOB error for individual classes typically decreased with increasing class size. In addition to the final classification, I determined the second and third most likely classification, model confidence, and the hypothetical extent of individual classes. Pixels that had high possibility of belonging to multiple soil classes were aggregated using a minimum confidence value based on limiting soil features, which is an effective and objective method of determining membership in soil map unit associations and complexes mapped at the 1:24,000 scale. Variables derived from both DEM and Landsat 7 ETM+ sources were important for predicting soil classes based on Gini and standard measures of variable importance and OOB errors from groves grown with exclusively DEM- or Landsat-derived data. Random forests was a powerful predictor of soil classes and produced outputs that facilitated further understanding of soil-landscape relationships.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Kalmár, Marcus, et Joel Nilsson. « The art of forecasting – an analysis of predictive precision of machine learning models ». Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280675.

Texte intégral
Résumé :
Forecasting is used for decision making and unreliable predictions can instill a false sense of condence. Traditional time series modelling is astatistical art form rather than a science and errors can occur due to lim-itations of human judgment. In minimizing the risk of falsely specifyinga process the practitioner can make use of machine learning models. Inan eort to nd out if there's a benet in using models that require lesshuman judgment, the machine learning models Random Forest and Neural Network have been used to model a VAR(1) time series. In addition,the classical time series models AR(1), AR(2), VAR(1) and VAR(2) havebeen used as comparative foundation. The Random Forest and NeuralNetwork are trained and ultimately the models are used to make pre-dictions evaluated by RMSE. All models yield scattered forecast resultsexcept for the Random Forest that steadily yields comparatively precisepredictions. The study shows that there is denitive benet in using Random Forests to eliminate the risk of falsely specifying a process and do infact provide better results than a correctly specied model.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Wagner, Christopher. « Regression Model to Project and Mitigate Vehicular Emissions in Cochabamba, Bolivia ». University of Dayton / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1501719312999566.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Zhang, Yi. « Strategies for Combining Tree-Based Ensemble Models ». NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1021.

Texte intégral
Résumé :
Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Jonsson, Estrid, et Sara Fredrikson. « An Investigation of How Well Random Forest Regression Can Predict Demand : Is Random Forest Regression better at predicting the sell-through of close to date products at different discount levels than a basic linear model ? » Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302025.

Texte intégral
Résumé :
Allt eftersom klimatkrisen fortskrider ökar engagemanget kring hållbarhet inom företag. Växthusgaser är ett av de största problemen och matsvinn har därför fått mycket uppmärksamhet sedan det utnämndes till den tredje största bidragaren till de globala utsläppen. För att minska sitt bidrag rabatterar många matbutiker produkter med kort bästföredatum, vilket kommit att kräva en förståelse för hur priskänslig efterfrågan på denna typ av produkt är. Prisoptimering görs vanligtvis med så kallade Generalized Linear Models men då efterfrågan är ett komplext koncept har maskininl ärningsmetoder börjat utmana de traditionella modellerna. En sådan metod är Random Forest Regression, och syftet med uppsatsen är att utreda ifall modellen är bättre på att estimera efterfrågan baserat på rabattnivå än en klassisk linjär modell. Vidare utreds det ifall ett tydligt linjärt samband existerar mellan rabattnivå och efterfrågan, samt ifall detta beror av produkttyp. Resultaten visar på att Random Forest tar bättre hänsyn till det komplexa samband som visade sig finnas, och i detta specifika fall presterar bättre. Vidare visade resultaten att det sammantaget inte finns något linjärt samband, men att vissa produktkategorier uppvisar svag linjäritet.
As the climate crisis continues to evolve many companies focus their development on becoming more sustainable. With greenhouse gases being highlighted as the main problem, food waste has obtained a great deal of attention after being named the third largest contributor to global emissions. One way retailers have attempted to improve is through offering close-to-date produce at discount, hence decreasing levels of food being thrown away. To minimize waste the level of discount must be optimized, and as the products can be seen as flawed the known price-to-demand relation of the products may be insufficient. The optimization process historically involves generalized linear regression models, however demand is a complex concept influenced by many factors. This report investigates whether a Machine Learning model, Random Forest Regression, is better at estimating the demand of close-to-date products at different discount levels than a basic linear regression model. The discussion also includes an analysis on whether discounts always increase the will to buy and whether this depends on product type. The results show that Random Forest to a greater extent considers the many factors influencing demand and is superior as a predictor in this case. Furthermore it was concluded that there is generally not a clear linear relation however this does depend on product type as certain categories showed some linearity.
Styles APA, Harvard, Vancouver, ISO, etc.
7

Mathis, Tyler Alan. « Predicting Hardness of Friction Stir Processed 304L Stainless Steel using a Finite Element Model and a Random Forest Algorithm ». BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7591.

Texte intégral
Résumé :
Friction stir welding is an advanced welding process that is being investigated for use in many different industries. One area that has been investigated for its application is in healing critical nuclear reactor components that are developing cracks. However, friction stir welding is a complicated process and it is difficult to predict what the final properties of a set of welding parameters will be. This thesis sets forth a method using finite element analysis and a random forest model to accurately predict hardness in the welding nugget after processing. The finite element analysis code used and ALE formulation that enabled an Eulerian approach to modeling. Hardness is used as the property to estimate because of its relationship to tensile strength and grain size. The input parameters to the random forest model are temperature, cooling rate, strain rate, and RPM. Two welding parameter sets were used to train the model. The method was found to have a high level of accuracy as measured by R^2, but had greater difficulty in predicting the parameter set with higher RPM.
Styles APA, Harvard, Vancouver, ISO, etc.
8

Victors, Mason Lemoyne. « A Classification Tool for Predictive Data Analysis in Healthcare ». BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/5639.

Texte intégral
Résumé :
Hidden Markov Models (HMMs) have seen widespread use in a variety of applications ranging from speech recognition to gene prediction. While developed over forty years ago, they remain a standard tool for sequential data analysis. More recently, Latent Dirichlet Allocation (LDA) was developed and soon gained widespread popularity as a powerful topic analysis tool for text corpora. We thoroughly develop LDA and a generalization of HMMs and demonstrate the conjunctive use of both methods in predictive data analysis for health care problems. While these two tools (LDA and HMM) have been used in conjunction previously, we use LDA in a new way to reduce the dimensionality involved in the training of HMMs. With both LDA and our extension of HMM, we train classifiers to predict development of Chronic Kidney Disease (CKD) in the near future.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Ospina, Arango Juan David. « Predictive models for side effects following radiotherapy for prostate cancer ». Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S046/document.

Texte intégral
Résumé :
La radiothérapie externe (EBRT en anglais pour External Beam Radiotherapy) est l'un des traitements référence du cancer de prostate. Les objectifs de la radiothérapie sont, premièrement, de délivrer une haute dose de radiations dans la cible tumorale (prostate et vésicules séminales) afin d'assurer un contrôle local de la maladie et, deuxièmement, d'épargner les organes à risque voisins (principalement le rectum et la vessie) afin de limiter les effets secondaires. Des modèles de probabilité de complication des tissus sains (NTCP en anglais pour Normal Tissue Complication Probability) sont nécessaires pour estimer sur les risques de présenter des effets secondaires au traitement. Dans le contexte de la radiothérapie externe, les objectifs de cette thèse étaient d'identifier des paramètres prédictifs de complications rectales et vésicales secondaires au traitement; de développer de nouveaux modèles NTCP permettant l'intégration de paramètres dosimétriques et de paramètres propres aux patients; de comparer les capacités prédictives de ces nouveaux modèles à celles des modèles classiques et de développer de nouvelles méthodologies d'identification de motifs de dose corrélés à l'apparition de complications. Une importante base de données de patients traités par radiothérapie conformationnelle, construite à partir de plusieurs études cliniques prospectives françaises, a été utilisée pour ces travaux. Dans un premier temps, la fréquence des symptômes gastro-Intestinaux et génito-Urinaires a été décrite par une estimation non paramétrique de Kaplan-Meier. Des prédicteurs de complications gastro-Intestinales et génito-Urinaires ont été identifiés via une autre approche classique : la régression logistique. Les modèles de régression logistique ont ensuite été utilisés dans la construction de nomogrammes, outils graphiques permettant aux cliniciens d'évaluer rapidement le risque de complication associé à un traitement et d'informer les patients. Nous avons proposé l'utilisation de la méthode d'apprentissage de machine des forêts aléatoires (RF en anglais pour Random Forests) pour estimer le risque de complications. Les performances de ce modèle incluant des paramètres cliniques et patients, surpassent celles des modèle NTCP de Lyman-Kutcher-Burman (LKB) et de la régression logistique. Enfin, la dose 3D a été étudiée. Une méthode de décomposition en valeurs populationnelles (PVD en anglais pour Population Value Decomposition) en 2D a été généralisée au cas tensoriel et appliquée à l'analyse d'image 3D. L'application de cette méthode à une analyse de population a été menée afin d'extraire un motif de dose corrélée à l'apparition de complication après EBRT. Nous avons également développé un modèle non paramétrique d'effets mixtes spatio-Temporels pour l'analyse de population d'images tridimensionnelles afin d'identifier une région anatomique dans laquelle la dose pourrait être corrélée à l'apparition d'effets secondaires
External beam radiotherapy (EBRT) is one of the cornerstones of prostate cancer treatment. The objectives of radiotherapy are, firstly, to deliver a high dose of radiation to the tumor (prostate and seminal vesicles) in order to achieve a maximal local control and, secondly, to spare the neighboring organs (mainly the rectum and the bladder) to avoid normal tissue complications. Normal tissue complication probability (NTCP) models are then needed to assess the feasibility of the treatment and inform the patient about the risk of side effects, to derive dose-Volume constraints and to compare different treatments. In the context of EBRT, the objectives of this thesis were to find predictors of bladder and rectal complications following treatment; to develop new NTCP models that allow for the integration of both dosimetric and patient parameters; to compare the predictive capabilities of these new models to the classic NTCP models and to develop new methodologies to identify dose patterns correlated to normal complications following EBRT for prostate cancer treatment. A large cohort of patient treated by conformal EBRT for prostate caner under several prospective French clinical trials was used for the study. In a first step, the incidence of the main genitourinary and gastrointestinal symptoms have been described. With another classical approach, namely logistic regression, some predictors of genitourinary and gastrointestinal complications were identified. The logistic regression models were then graphically represented to obtain nomograms, a graphical tool that enables clinicians to rapidly assess the complication risks associated with a treatment and to inform patients. This information can be used by patients and clinicians to select a treatment among several options (e.g. EBRT or radical prostatectomy). In a second step, we proposed the use of random forest, a machine-Learning technique, to predict the risk of complications following EBRT for prostate cancer. The superiority of the random forest NTCP, assessed by the area under the curve (AUC) of the receiving operative characteristic (ROC) curve, was established. In a third step, the 3D dose distribution was studied. A 2D population value decomposition (PVD) technique was extended to a tensorial framework to be applied on 3D volume image analysis. Using this tensorial PVD, a population analysis was carried out to find a pattern of dose possibly correlated to a normal tissue complication following EBRT. Also in the context of 3D image population analysis, a spatio-Temporal nonparametric mixed-Effects model was developed. This model was applied to find an anatomical region where the dose could be correlated to a normal tissue complication following EBRT
Styles APA, Harvard, Vancouver, ISO, etc.
10

Kabir, Mitra. « Prediction of mammalian essential genes based on sequence and functional features ». Thesis, University of Manchester, 2017. https://www.research.manchester.ac.uk/portal/en/theses/prediction-of-mammalian-essential-genes-based-on-sequence-and-functional-features(cf8eeed5-c2b3-47c3-9a8f-2cc290c90d56).html.

Texte intégral
Résumé :
Essential genes are those whose presence is imperative for an organism's survival, whereas the functions of non-essential genes may be useful but not critical. Abnormal functionality of essential genes may lead to defects or death at an early stage of life. Knowledge of essential genes is therefore key to understanding development, maintenance of major cellular processes and tissue-specific functions that are crucial for life. Existing experimental techniques for identifying essential genes are accurate, but most of them are time consuming and expensive. Predicting essential genes using computational methods, therefore, would be of great value as they circumvent experimental constraints. Our research is based on the hypothesis that mammalian essential (lethal) and non-essential (viable) genes are distinguishable by various properties. We examined a wide range of features of Mus musculus genes, including sequence, protein-protein interactions, gene expression and function, and found 75 features that were statistically discriminative between lethal and viable genes. These features were used as inputs to create a novel machine learning classifier, allowing the prediction of a mouse gene as lethal or viable with the cross-validation and blind test accuracies of ∼91% and ∼93%, respectively. The prediction results are promising, indicating that our classifier is an effective mammalian essential gene prediction method. We further developed the mouse gene essentiality study by analysing the association between essentiality and gene duplication. Mouse genes were labelled as singletons or duplicates, and their expression patterns over 13 developmental stages were examined. We found that lethal genes originating from duplicates are considerably lower in proportion than singletons. At all developmental stages a significantly higher proportion of singletons and lethal genes are expressed than duplicates and viable genes. Lethal genes were also found to be more ancient than viable genes. In addition, we observed that duplicate pairs with similar patterns of developmental co-expression are more likely to be viable; lethal gene duplicate pairs do not have such a trend. Overall, these results suggest that duplicate genes in mouse are less likely to be essential than singletons. Finally, we investigated the evolutionary age of mouse genes across development to see if the morphological hourglass pattern exists in the mouse. We found that in mouse embryos, genes expressed in early and late stages are evolutionarily younger than those expressed in mid-embryogenesis, thus yielding an hourglass pattern. However, the oldest genes are not expressed at the phylotypic stage stated in prior studies, but instead at an earlier time point - the egg cylinder stage. These results question the application of the hourglass model to mouse development.
Styles APA, Harvard, Vancouver, ISO, etc.
11

Mita, Mara. « Assessment of seismic displacements of existing landslides through numerical modelling and simplified methods ». Electronic Thesis or Diss., Université Gustave Eiffel, 2023. http://www.theses.fr/2023UEFL2075.

Texte intégral
Résumé :
Les glissements de terrain sismo-induits sont des effets secondaires fréquents des séismes qui peuvent provoquer des dommages plus importants que les séismes eux-mêmes. Prévoir ces phénomènes est donc essentiel pour la gestion des risques dans les régions sismiques. Les déplacements co-sismiques sont généralement évalués par la méthode « bloc rigide » de Newmark (1965). Malgré ses limites, cette méthode a deux avantages: i) des temps de calcul relativement courts, ii) une compatibilité avec les logiciels SIG pour des analyses à l'échelle régionale. Les modélisations numériques complexes permettent quant à elles de simuler la propagation des ondes sismiques dans les versants et les effets associés. Cependant, elles sont caractérisées par des temps de calcul longs, ce qui limite leur utilisation à l'échelle des versants. L'objectif de cette étude est de mieux comprendre dans quel cas les méthodes analytiques et numériques prédisent des valeurs de déplacements différentes. 216 prototypes de glissements de terrain ont été définis en 2D en combinant des paramètres géométriques et géotechniques déduits de la littérature. Ces modèles ont été soumis à 17 signaux sismiques d'Intensité Arias constante (IA~ 0,1 m/s) et de période moyenne variable. Les résultats ont permis de définir un modèle « Random Forest » préliminaire pour prédire a priori la différence entre les valeurs de déplacements des deux méthodes. Les résultats ont ainsi permis : i) d'identifier les paramètres qui contrôlent les déplacements dans les deux méthodes, ii) de conclure que les différences entre les valeurs de déplacements sont négligeables dans la plupart des cas pour cette valeur de IA
Landslides are common secondary effects related to earthquakes which can be responsible for greater damages than the ground shaking alone. Predicting these phenomena is therefore essential for risk management in seismic regions. Nowadays, landslides permanent co-seismic displacements are assessed by the traditional « rigid-sliding block » method proposed by Newmark (1965). Despite its limitations, this method has two advantages: i) relatively short computation times, ii) compatibility with GIS software for regional-scale analyses. Alternatively, more complex numerical analyses can be performed to simulate seismic waves propagation into slopes and related effects. However, due to their longer computation times, their use is usually limited to slope-scale analyses. This study aims at better understanding in which conditions (i.e. combinations of introduced relevant parameters), analytical and numerical methods predict different landslides earthquake-induced displacements. At this regard, 216 2D landslide prototypes were designed by combining geometrical and geotechnical parameters inferred by statistical analysis on data collected by literature review. Landslide prototypes were forced by 17 signals with constant Arias Intensity (AI ~ 0.1 m/s) and variable mean period. Results allowed defining a preliminary Random Forest model to predict a priori, the expected difference between displacements by the two methods. Analysis of results allowed: i) identifying parameters affecting displacement variation according to the two methods, ii) concluding that in here considered AI level, computed displacements differences are negligible in most of the cases
Styles APA, Harvard, Vancouver, ISO, etc.
12

Asritha, Kotha Sri Lakshmi Kamakshi. « Comparing Random forest and Kriging Methods for Surrogate Modeling ». Thesis, Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20230.

Texte intégral
Résumé :
The issue with conducting real experiments in design engineering is the cost factor to find an optimal design that fulfills all design requirements and constraints. An alternate method of a real experiment that is performed by engineers is computer-aided design modeling and computer-simulated experiments. These simulations are conducted to understand functional behavior and to predict possible failure modes in design concepts. However, these simulations may take minutes, hours, days to finish. In order to reduce the time consumption and simulations required for design space exploration, surrogate modeling is used. \par Replacing the original system is the motive of surrogate modeling by finding an approximation function of simulations that is quickly computed. The process of surrogate model generation includes sample selection, model generation, and model evaluation. Using surrogate models in design engineering can help reduce design cycle times and cost by enabling rapid analysis of alternative designs.\par Selecting a suitable surrogate modeling method for a given function with specific requirements is possible by comparing different surrogate modeling methods. These methods can be compared using different application problems and evaluation metrics. In this thesis, we are comparing the random forest model and kriging model based on prediction accuracy. The comparison is performed using mathematical test functions. This thesis conducted quantitative experiments to investigate the performance of methods. After experimental analysis, it is found that the kriging models have higher accuracy compared to random forests. Furthermore, the random forest models have less execution time compared to kriging for studied mathematical test problems.
Styles APA, Harvard, Vancouver, ISO, etc.
13

Ekeberg, Lukas, et Alexander Fahnehjelm. « Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240401.

Texte intégral
Résumé :
The Swedish real estate market has been digitalized over the past decade with the current practice being to post your real estate advertisement online. A question that has arisen is how a seller can optimize their public listing to maximize the selling premium. This paper analyzes the use of three machine learning methods to solve this problem: Linear Regression, Decision Tree Regressor and Random Forest Regressor. The aim is to retrieve information regarding how certain attributes contribute to the premium value. The dataset used contains apartments sold within the years of 2014-2018 in the Östermalm / Djurgården district in Stockholm, Sweden. The resulting models returned an R2-value of approx. 0.26 and Mean Absolute Error of approx. 0.06. While the models were not accurate regarding prediction of premium, information was still able to be extracted from the models. In conclusion, a high amount of views and a publication made in April provide the best conditions for an advertisement to reach a high selling premium. The seller should try to keep the amount of days since publication lower than 15.5 days and avoid publishing on a Tuesday.
Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar.
Styles APA, Harvard, Vancouver, ISO, etc.
14

Кичигіна, Анастасія Юріївна. « Прогнозування ІМТ за допомогою методів машинного навчання ». Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2020. https://ela.kpi.ua/handle/123456789/37413.

Texte intégral
Résumé :
Дипломна робота містить : 100 с., 17 табл., 16 рис., 2 дод. та 24 джерела. Об’єктом дослідження є індекс маси тіла людини. Предметом дослідження є методи машинного навчання – регресійні моделі, ансамблева модель випадковий ліс та нейронна мережа. В даній роботі проведено дослідження залежності індексу маси тіла людини та наявності надмірної маси тіла від харчових та побутових звичок. Для побудови дослідження були використані методи машинного навчання та аналізу даних, проведено роботу для визначення можливостей по покращенню роботи стандартних моделей та визначено кращу модель для реалізації прогнозування та класифікації на основі наведених даних. Напрямок роботи є в понижені розмірності простору ознак, відбору кращих спостережень з валідними даним для кращої роботи моделей, а також у комбінуванні різних методів навчання та отриманні більш ефективних ансамблевих моделей.
Thesis: 100 p., 17 tabl., 16 fig., 2 add. and 24 references. The object of the study is the human body mass index. The subject of research is machine learning methods - regression models, ensemble model random forest and neural network. In this paper, a study of the dependence of the human body mass index and the presence of excess body weight on eating and living habits. To build the study, the methods of machine learning and data analysis were used, work was done to identify opportunities to improve the performance of standard models and identified the best model for the implementation of predicting and classification based on the data. The direction of work is in the reduced dimensions of the feature space, selection of the best observations with valid data for better performance of models, as well as in combining different teaching methods and obtaining more effective ensemble models.
Styles APA, Harvard, Vancouver, ISO, etc.
15

Henriksson, Erik, et Kristopher Werlinder. « Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.

Texte intégral
Résumé :
The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data.
Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.
Styles APA, Harvard, Vancouver, ISO, etc.
16

Lazic, Marko, et Felix Eder. « Using Random Forest model to predict image engagement rate ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229932.

Texte intégral
Résumé :
The purpose of this research is to investigate if Google Cloud Vision API combined with Random Forest Machine Learning algorithm is advanced enough in order to make a software that would evaluate how much an Instagram photo contributes to the image of a brand. The data set contains images scraped from the public Instagram feed filtered by #Nike, together with the meta data of the post. Each image was processed by the Google Cloud Vision API in order to obtain a set of descriptive labels for the content of the image. The data set was sent to the Random Forest algorithm in order to train the predictor. The results of the research shows that the predictor can only guess the correct score in about 4% of cases. The results are not very accurate, which is mostly because of the limiting factors of the Google Cloud Vision API. The conclusion that was drawn is that it is not possible to create a software that can accurately predict the engagement rate of an image with the technology that is publicly available today.
Syftet med denna forskning är att undersöka om Google Cloud Vision API kombinerat med Random Forest Machine Learning algoritmer är tillräckligt avancerade för att skapa en mjukvara som tillförlitligt kan evaluera hur mycket ett Instagram-inlägg kan bidra till bilden av ett varumärke. Datamängden innehåller bilder hämtade från Instagrams publika flöde filtrerat av #Nike, tillsammans med metadatan för inlägget. Varje bild var bearbetad av Google Cloud Vision API för att få tag på en mängd deskriptiva etiketter för innehållet av en bild. Datamängden skickades till Random Forest-algoritmen för att träna dess model. Undersökningens resultat är inte särskilt exakta, vilket främst beror på de begränsade faktorerna från Google Cloud Vision API. Slutsatsen som dras är att det inte är möjligt att tillförlitligt förutspå en bilds kvalitet med tekniken som finns allmänt tillgänglig idag.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Galleguillos, Aguilar Matías. « Desarrollo de un modelo predictivo de deserción de estudiantes de primer año en institución de educación superior ». Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/170006.

Texte intégral
Résumé :
Memoria para optar al título de Ingeniero Civil Eléctrico
En Chile, durante los últimos 30 años ha habido un crecimiento significativo en el acceso de las personas a la educación superior. Acompañado de este crecimiento se ha visto un aumento en la deserción universitaria, siendo particularmente elevada la de alumnos de primer año. Este problema tiene grandes costos de distinta índole tanto para los alumnos como para las universidades, haciendo que se haya posicionado como una de las métricas más importantes que se utiliza para acreditar a las instituciones. La Universidad de las Américas se ha visto enfrentada a una alta tasa de deserción, traduciéndose en que en el año 2013 haya contribuido de manera importante a la pérdida de su acreditación, por lo que se transformó en tema prioritario a resolver. Por esto se ideó un plan para ayudar a los alumnos con mayor probabilidad de desertar. Actualmente UDLA no posee un sistema automatizado que clasifique a los alumnos en base a análisis de datos de su comportamiento, sólo se cuenta con un sistema de reglas creado en base al conocimiento de deserción de miembros de la universidad, por lo que tiene una alta tasa de errores. En el último estudio publicado por el Servicio de Información de Educación Superior sobre retención de alumnos de primer año, construido con datos de alumnos que ingresaron a estudiar el año 2016, la Universidad de las Américas se ubica en la posición 47 de 58 universidades. Por esto, desarrollar un sistema capaz de identificar a los alumnos que estén en riesgo de desertar sigue siendo un tema prioritario para la institución. El objetivo del presente trabajo es desarrollar un sistema capaz de entregar un índice de riesgo de deserción de cada alumno de primer año. Para esto se propone plantear el proceso de asignar riesgo como un problema de clasificación y afrontarlo con herramientas de inteligencia computacional. Para resolver el problema se dividió el semestre en tramos y se entrenó un modelo para cada uno de éstos. La precisión del primer modelo fue más baja que la de estudios similares que afrontaron el mismo problema en otras universidades del mundo, teniendo un 70,1% de aciertos. El modelo de cada tramo entregó mejores resultados que los del tramo anterior, siendo el del final del semestre el de mejores resultados llegando a un 82,5% de precisión, lo que se asemeja a otros trabajos.
Styles APA, Harvard, Vancouver, ISO, etc.
18

Lanka, Venkata Raghava Ravi Teja Lanka. « VEHICLE RESPONSE PREDICTION USING PHYSICAL AND MACHINE LEARNING MODELS ». The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1511891682062084.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
19

Lundström, Love, et Oscar Öhman. « Machine Learning in credit risk : Evaluation of supervised machine learning models predicting credit risk in the financial sector ». Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164101.

Texte intégral
Résumé :
When banks lend money to another party they face a risk that the borrower will not fulfill its obligation towards the bank. This risk is called credit risk and it’s the largest risk banks faces. According to the Basel accord banks need to have a certain amount of capital requirements to protect themselves towards future financial crisis. This amount is calculated for each loan with an attached risk-weighted asset, RWA. The main parameters in RWA is probability of default and loss given default. Banks are today allowed to use their own internal models to calculate these parameters. Thus hold capital with no gained interest is a great cost, banks seek to find tools to better predict probability of default to lower the capital requirement. Machine learning and supervised algorithms such as Logistic regression, Neural network, Decision tree and Random Forest can be used to decide credit risk. By training algorithms on historical data with known results the parameter probability of default (PD) can be determined with a higher certainty degree compared to traditional models, leading to a lower capital requirement. On the given data set in this article Logistic regression seems to be the algorithm with highest accuracy of classifying customer into right category. However, it classifies a lot of people as false positive meaning the model thinks a customer will honour its obligation but in fact the customer defaults. Doing this comes with a great cost for the banks. Through implementing a cost function to minimize this error, we found that the Neural network has the lowest false positive rate and will therefore be the model that is best suited for this specific classification task.
När banker lånar ut pengar till en annan part uppstår en risk i att låntagaren inte uppfyller sitt antagande mot banken. Denna risk kallas för kredit risk och är den största risken en bank står inför. Enligt Basel föreskrifterna måste en bank avsätta en viss summa kapital för varje lån de ger ut för att på så sätt skydda sig emot framtida finansiella kriser. Denna summa beräknas fram utifrån varje enskilt lån med tillhörande risk-vikt, RWA. De huvudsakliga parametrarna i RWA är sannolikheten att en kund ej kan betala tillbaka lånet samt summan som banken då förlorar. Idag kan banker använda sig av interna modeller för att estimera dessa parametrar. Då bundet kapital medför stora kostnader för banker, försöker de sträva efter att hitta bättre verktyg för att uppskatta sannolikheten att en kund fallerar för att på så sätt minska deras kapitalkrav. Därför har nu banker börjat titta på möjligheten att använda sig av maskininlärningsalgoritmer för att estimera dessa parametrar. Maskininlärningsalgoritmer såsom Logistisk regression, Neurala nätverk, Beslutsträd och Random forest, kan användas för att bestämma kreditrisk. Genom att träna algoritmer på historisk data med kända resultat kan parametern, chansen att en kund ej betalar tillbaka lånet (PD), bestämmas med en högre säkerhet än traditionella metoder. På den givna datan som denna uppsats bygger på visar det sig att Logistisk regression är den algoritm med högst träffsäkerhet att klassificera en kund till rätt kategori. Däremot klassifiserar denna algoritm många kunder som falsk positiv vilket betyder att den predikterar att många kunder kommer betala tillbaka sina lån men i själva verket inte betalar tillbaka lånet. Att göra detta medför en stor kostnad för bankerna. Genom att istället utvärdera modellerna med hjälp av att införa en kostnadsfunktion för att minska detta fel finner vi att Neurala nätverk har den lägsta falsk positiv ration och kommer därmed vara den model som är bäst lämpad att utföra just denna specifika klassifierings uppgift.
Styles APA, Harvard, Vancouver, ISO, etc.
20

De, Giorgi Marcello. « Tree ensemble methods for Predictive Maintenance : a case study ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22282/.

Texte intégral
Résumé :
Nel lavoro descritto in questa tesi sono stati creati modelli per la manutenzione predittiva di macchine utensili in ambito industriale; in particolare, i modelli realizzati sono stati addestrati sfruttando degli ensemble tree methods con le finalità di: predire il verificarsi di un guasto in macchina con un anticipo tale da permettere l'organizzazione delle squadre di manutenzione; predire la necessità della sostituzione anticipata dell'utensile utilizzato dalla macchina, per mantenere alti gli standard di qualità. Dopo aver dato uno sfondo al contesto industriale in esame, la tesi illustra i processi seguiti per la creazione e l'aggregazione di un dataset, e l'introduzione di informazioni relative agli eventi in macchina. Analizzato il comportamento di alcune variabili durante la lavorazione ed effettuata una distinzione tra cicli di lavorazione validi e non validi, si procede introducendo gli ensemble tree methods e il motivo della scelta di questa classe di algoritmi. Nel dettaglio, vengono presentati due possibili candidati al problema trattato: Random Forest ed XGBoost; dopo averne descritto il funzionamento, vengono presentati i risultati ottenuti dai modelli proponendo, per stimarne l'efficacia, un funzione di costo atteso come alternativa all'accuracy score. I risultati dei modelli allenati con i due algoritmi proposti vengono infine confrontati.
Styles APA, Harvard, Vancouver, ISO, etc.
21

Jiao, Weiwei. « Predictive Analysis for Trauma Patient Readmission Database ». The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492718909631318.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
22

Geylan, Gökçe. « Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical Libraries ». Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447354.

Texte intégral
Résumé :
DNA-encoded chemical libraries (DEL) allows an exhaustive chemical space sampling with a large-scale data consisting of compounds produced through combinatorial synthesis. This novel technology was utilized in the early drug discovery stages for robust hit identification and lead optimization. In this project, the aim was to build a Machine Learning- based QSAR model with conformal prediction for hit identification on two different target proteins, the DEL was assayed on. An initial investigation was conducted on a pilot project with 1000 compounds and the analyses and the conclusions drawn from this part were later applied to a larger dataset with 1.2 million compounds. With this classification model, the prediction of the compound activity in the DEL as well as in an external dataset was aimed to be analyzed with identification of the top hits to evaluate model’s performance and applicability. Support Vector Machine (SVM) and Random Forest (RF) models were built on both the pilot and the main datasets with different descriptor sets of Signature Fingerprints, RDKIT and CDK. In addition, an Autoencoder was used to supply data-driven descriptors on the pilot data as well. The Libsvm and the Liblinear implementations were explored and compared based on the models’ performances. The comparisons were made by considering the key concepts of conformal prediction such as the trade-off between validity and efficiency, observed fuzziness and the calibration against a range of significance levels. The top hits were determined by two sorting methods, credibility and p-value differences between the binary classes. The assignment of correct single-labels to the true actives over a wide range of significance levels regardless of the similarity of the test compounds to the training set was confirmed for the models. Furthermore, an accumulation of these true actives in the models’ top hit selections was observed according to the latter sorting method and additional investigations on the similarity and the building block enrichments in the top 50 and 100 compounds were conducted. The Tanimoto similarity demonstrated the model’s predictive power in selecting structurally dissimilar compounds while the building block enrichment analysis showed the selectivity of the binding pocket where the target protein B was determined to be more selective. All of these comparison methods enabled an extensive study on the model evaluation and performance. In conclusion, the Liblinear model with the Signature Fingerprints was concluded to give the best model performance for both the pilot and the main datasets with the considerations of the model performances and the computational power requirements. However, an external set prediction was not successful due to the low structural diversity in the DEL which the model was trained on.
Styles APA, Harvard, Vancouver, ISO, etc.
23

Olofsson, Nina. « A Machine Learning Ensemble Approach to Churn Prediction : Developing and Comparing Local Explanation Models on Top of a Black-Box Classifier ». Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210565.

Texte intégral
Résumé :
Churn prediction methods are widely used in Customer Relationship Management and have proven to be valuable for retaining customers. To obtain a high predictive performance, recent studies rely on increasingly complex machine learning methods, such as ensemble or hybrid models. However, the more complex a model is, the more difficult it becomes to understand how decisions are actually made. Previous studies on machine learning interpretability have used a global perspective for understanding black-box models. This study explores the use of local explanation models for explaining the individual predictions of a Random Forest ensemble model. The churn prediction was studied on the users of Tink – a finance app. This thesis aims to take local explanations one step further by making comparisons between churn indicators of different user groups. Three sets of groups were created based on differences in three user features. The importance scores of all globally found churn indicators were then computed for each group with the help of local explanation models. The results showed that the groups did not have any significant differences regarding the globally most important churn indicators. Instead, differences were found for globally less important churn indicators, concerning the type of information that users stored in the app. In addition to comparing churn indicators between user groups, the result of this study was a well-performing Random Forest ensemble model with the ability of explaining the reason behind churn predictions for individual users. The model proved to be significantly better than a number of simpler models, with an average AUC of 0.93.
Metoder för att prediktera utträde är vanliga inom Customer Relationship Management och har visat sig vara värdefulla när det kommer till att behålla kunder. För att kunna prediktera utträde med så hög säkerhet som möjligt har den senasteforskningen fokuserat på alltmer komplexa maskininlärningsmodeller, såsom ensembler och hybridmodeller. En konsekvens av att ha alltmer komplexa modellerär dock att det blir svårare och svårare att förstå hur en viss modell har kommitfram till ett visst beslut. Tidigare studier inom maskininlärningsinterpretering har haft ett globalt perspektiv för att förklara svårförståeliga modeller. Denna studieutforskar lokala förklaringsmodeller för att förklara individuella beslut av en ensemblemodell känd som 'Random Forest'. Prediktionen av utträde studeras påanvändarna av Tink – en finansapp. Syftet med denna studie är att ta lokala förklaringsmodeller ett steg längre genomatt göra jämförelser av indikatorer för utträde mellan olika användargrupper. Totalt undersöktes tre par av grupper som påvisade skillnader i tre olika variabler. Sedan användes lokala förklaringsmodeller till att beräkna hur viktiga alla globaltfunna indikatorer för utträde var för respektive grupp. Resultaten visade att detinte fanns några signifikanta skillnader mellan grupperna gällande huvudindikatorerna för utträde. Istället visade resultaten skillnader i mindre viktiga indikatorer som hade att göra med den typ av information som lagras av användarna i appen. Förutom att undersöka skillnader i indikatorer för utträde resulterade dennastudie i en välfungerande modell för att prediktera utträde med förmågan attförklara individuella beslut. Random Forest-modellen visade sig vara signifikantbättre än ett antal enklare modeller, med ett AUC-värde på 0.93.
Styles APA, Harvard, Vancouver, ISO, etc.
24

Forsblom, Findlay, et Lars Petter Ulvatne. « Snow depth measurements and predictions : Reducing environmental impact for artificial grass pitches at snowfall ». Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-96395.

Texte intégral
Résumé :
Rubber granulates, used at artificial grass pitches, pose a threat to the environment when leaking into the nature. As the granulates leak to the environment through rain water and snow clearances, they can be transported by rivers and later on end up in the marine life. Therefore, reducing the snow clearances to its minimum is of importance. If the snow clearance problem is minimized or even eliminated, this will have a positive impact on the surrounding nature. The object of this project is to propose a method for deciding when to remove snow and automate the information dispersing upon clearing or closing a pitch. This includes finding low powered sensors to measure snow depth, find a machine learning model to predict upcoming snow levels and create an application with a clear and easy-to-use interface to present weather information and disperse information to the responsible persons. Controlled experiments is used to find the models and sensors that are suitable to solve this problem. The sensors are tested on a single snow quality, where ultrasonic and infrared sensors are found suitable. However, fabricated tests for newly fallen snow questioned the possibility of measuring snow depth using the ultrasonic sensor in the general case. Random Forest is presented as the machine learning model that predicts future snow levels with the highest accuracy. From a survey, indications is found that the web application fulfills the intended functionalities, with some improvements suggested.
Styles APA, Harvard, Vancouver, ISO, etc.
25

Auret, Lidia. « Process monitoring and fault diagnosis using random forests ». Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/5360.

Texte intégral
Résumé :
Thesis (PhD (Process Engineering))--University of Stellenbosch, 2010.
Dissertation presented for the Degree of DOCTOR OF PHILOSOPHY (Extractive Metallurgical Engineering) in the Department of Process Engineering at the University of Stellenbosch
ENGLISH ABSTRACT: Fault diagnosis is an important component of process monitoring, relevant in the greater context of developing safer, cleaner and more cost efficient processes. Data-driven unsupervised (or feature extractive) approaches to fault diagnosis exploit the many measurements available on modern plants. Certain current unsupervised approaches are hampered by their linearity assumptions, motivating the investigation of nonlinear methods. The diversity of data structures also motivates the investigation of novel feature extraction methodologies in process monitoring. Random forests are recently proposed statistical inference tools, deriving their predictive accuracy from the nonlinear nature of their constituent decision tree members and the power of ensembles. Random forest committees provide more than just predictions; model information on data proximities can be exploited to provide random forest features. Variable importance measures show which variables are closely associated with a chosen response variable, while partial dependencies indicate the relation of important variables to said response variable. The purpose of this study was therefore to investigate the feasibility of a new unsupervised method based on random forests as a potentially viable contender in the process monitoring statistical tool family. The hypothesis investigated was that unsupervised process monitoring and fault diagnosis can be improved by using features extracted from data with random forests, with further interpretation of fault conditions aided by random forest tools. The experimental results presented in this work support this hypothesis. An initial study was performed to assess the quality of random forest features. Random forest features were shown to be generally difficult to interpret in terms of geometry present in the original variable space. Random forest mapping and demapping models were shown to be very accurate on training data, and to extrapolate weakly to unseen data that do not fall within regions populated by training data. Random forest feature extraction was applied to unsupervised fault diagnosis for process data, and compared to linear and nonlinear methods. Random forest results were comparable to existing techniques, with the majority of random forest detections due to variable reconstruction errors. Further investigation revealed that the residual detection success of random forests originates from the constrained responses and poor generalization artifacts of decision trees. Random forest variable importance measures and partial dependencies were incorporated in a visualization tool to allow for the interpretation of fault conditions. A dynamic change point detection application with random forests proved more successful than an existing principal component analysis-based approach, with the success of the random forest method again residing in reconstruction errors. The addition of random forest fault diagnosis and change point detection algorithms to a suite of abnormal event detection techniques is recommended. The distance-to-model diagnostic based on random forest mapping and demapping proved successful in this work, and the theoretical understanding gained supports the application of this method to further data sets.
AFRIKAANSE OPSOMMING: Foutdiagnose is ’n belangrike komponent van prosesmonitering, en is relevant binne die groter konteks van die ontwikkeling van veiliger, skoner en meer koste-effektiewe prosesse. Data-gedrewe toesigvrye of kenmerkekstraksie-benaderings tot foutdiagnose benut die vele metings wat op moderne prosesaanlegte beskikbaar is. Party van die huidige toesigvrye benaderings word deur aannames rakende liniariteit belemmer, wat as motivering dien om nie-liniêre metodes te ondersoek. Die diversiteit van datastrukture is ook verdere motivering vir ondersoek na nuwe kenmerkekstraksiemetodes in prosesmonitering. Lukrake-woude is ’n nuwe statistiese inferensie-tegniek, waarvan die akkuraatheid toegeskryf kan word aan die nie-liniêre aard van besluitnemingsboomlede en die bekwaamheid van ensembles. Lukrake-woudkomitees verskaf meer as net voorspellings; modelinligting oor datapuntnabyheid kan benut word om lukrakewoudkenmerke te verskaf. Metingbelangrikheidsaanduiers wys watter metings in ’n noue verhouding met ’n gekose uitsetveranderlike verkeer, terwyl parsiële afhanklikhede aandui wat die verhouding van ’n belangrike meting tot die gekose uitsetveranderlike is. Die doel van hierdie studie was dus om die uitvoerbaarheid van ’n nuwe toesigvrye metode vir prosesmonitering gebaseer op lukrake-woude te ondersoek. Die ondersoekte hipotese lui: toesigvrye prosesmonitering en foutdiagnose kan verbeter word deur kenmerke te gebruik wat met lukrake-woude geëkstraheer is, waar die verdere interpretasie van foutkondisies deur addisionele lukrake-woude-tegnieke bygestaan word. Eksperimentele resultate wat in hierdie werkstuk voorgelê is, ondersteun hierdie hipotese. ’n Intreestudie is gedoen om die gehalte van lukrake-woudkenmerke te assesseer. Daar is bevind dat dit moeilik is om lukrake-woudkenmerke in terme van die geometrie van die oorspronklike metingspasie te interpreteer. Verder is daar bevind dat lukrake-woudkartering en -dekartering baie akkuraat is vir opleidingsdata, maar dat dit swak ekstrapolasie-eienskappe toon vir ongesiene data wat in gebiede buite dié van die opleidingsdata val. Lukrake-woudkenmerkekstraksie is in toesigvrye-foutdiagnose vir gestadigde-toestandprosesse toegepas, en is met liniêre en nie-liniêre metodes vergelyk. Resultate met lukrake-woude is vergelykbaar met dié van bestaande metodes, en die meerderheid lukrake-woudopsporings is aan metingrekonstruksiefoute toe te skryf. Verdere ondersoek het getoon dat die sukses van res-opsporing op die beperkte uitsetwaardes en swak veralgemenende eienskappe van besluitnemingsbome berus. Lukrake-woude-metingbelangrikheidsaanduiers en parsiële afhanklikhede is ingelyf in ’n visualiseringstegniek wat vir die interpretasie van foutkondisies voorsiening maak. ’n Dinamiese aanwending van veranderingspuntopsporing met lukrake-woude is as meer suksesvol bewys as ’n bestaande metode gebaseer op hoofkomponentanalise. Die sukses van die lukrake-woudmetode is weereens aan rekonstruksie-reswaardes toe te skryf. ’n Voorstel wat na aanleiding van hierde studie gemaak is, is dat die lukrake-woudveranderingspunt- en foutopsporingsmetodes by ’n soortgelyke stel metodes gevoeg kan word. Daar is in hierdie werk bevind dat die afstand-vanaf-modeldiagnostiek gebaseer op lukrake-woudkartering en -dekartering suksesvol is vir foutopsporing. Die teoretiese begrippe wat ontsluier is, ondersteun die toepassing van hierdie metodes op verdere datastelle.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Goodwin, Christopher C. H. « The Influence of Cost-sharing Programs on Southern Non-industrial Private Forests ». Thesis, Virginia Tech, 2001. http://hdl.handle.net/10919/30895.

Texte intégral
Résumé :
This study was undertaken in response to concerns that the decreasing levels of funding for government tree planting cost share programs will result in significant reductions in non-industrial private tree planting efforts in the South. The purpose of this study is to quantify how the funding of various cost share programs, and market signals interact and affect the level of private tree planting. The results indicate that the ACP, CRP, and Soil Bank programs have been more influential than the FIP, FRM, FSP, SIP, and State run subsidy programs. Reductions in the CRP funding will result in less tree planting; while it is not clear that funding reductions in FIP, or other programs targeted toward reforestation after harvest, will have a negative impact on tree planting levels.
Master of Science
Styles APA, Harvard, Vancouver, ISO, etc.
27

Dunja, Vrbaški. « Primena mašinskog učenja u problemu nedostajućih podataka pri razvoju prediktivnih modela ». Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2020. https://www.cris.uns.ac.rs/record.jsf?recordId=114270&source=NDLTD&language=en.

Texte intégral
Résumé :
Problem nedostajućih podataka je često prisutan prilikom razvojaprediktivnih modela. Umesto uklanjanja podataka koji sadrževrednosti koje nedostaju mogu se primeniti metode za njihovuimputaciju. Disertacija predlaže metodologiju za pristup analiziuspešnosti imputacija prilikom razvoja prediktivnih modela. Naosnovu iznete metodologije prikazuju se rezultati primene algoritamamašinskog učenja, kao metoda imputacije, prilikom razvoja određenih,konkretnih prediktivnih modela.
The problem of missing data is often present when developing predictivemodels. Instead of removing data containing missing values, methods forimputation can be applied. The dissertation proposes a methodology foranalysis of imputation performance in the development of predictive models.Based on the proposed methodology, results of the application of machinelearning algorithms, as an imputation method in the development of specificmodels, are presented.
Styles APA, Harvard, Vancouver, ISO, etc.
28

Chery, Joseph Erol. « Adjusting to random demands of patient care : a predictive model for nursing staff scheduling at Naval Medical Center San Diego / ». Thesis, Monterey, Calif. : Naval Postgraduate School, 2008. http://edocs.nps.edu/npspubs/scholarly/theses/2008/Sept/08Sep%5FChery.pdf.

Texte intégral
Résumé :
Thesis (M.S. in Operations Research)--Naval Postgraduate School, September 2008.
Thesis Advisor(s): Fricker, Ronald D. "September 2008." Description based on title screen as viewed on November 5, 2008. Includes bibliographical references (p. 43-46). Also available in print.
Styles APA, Harvard, Vancouver, ISO, etc.
29

Tran, khac Viet. « Le rôle des facteurs environnementaux sur la concentration des métaux-tracesdans les lacs urbains -Lac de Pampulha, Lac de Créteil et 49 lacs péri-urbains d’Ile de France ». Thesis, Paris Est, 2016. http://www.theses.fr/2016PESC1160/document.

Texte intégral
Résumé :
Les lacs jouent un rôle particulier dans le cycle de l’eau dans les bassins versants urbains. La stratification thermique et le temps de séjour de l’eau élevé favorisent le développement phytoplanctonique. La plupart des métaux sont naturellement présents dans l’environnement à l’état de traces. Ils sont essentiels pour les organismes vivants. Néanmoins, certains métaux sont connus pour leurs effets toxiques sur les animaux et les humains. La concentration totale des métaux ne reflète pas leur toxicité. Elle dépend de leurs propriétés et de leur spéciation (fractions particulaires, dissoutes: labiles ou biodisponibles et inertes). Dans les systèmes aquatiques, les métaux peuvent être absorbés par des ligands organiques ou minéraux. Leur capacité à se complexer avec la matière organique dissoute (MOD), particulièrement les substances humiques, a été largement étudiée. Dans les lacs, le développement phytoplanctonique peut produire de la MOD non-humique, connue pour sa capacité complexante des métaux. Pourtant, peu de recherche sur la spéciation des métaux dans la colonne d’eau des lacs urbains a été réalisée jusqu’à présent.Les objectifs principaux de cette thèse sont (1) d’obtenir une base de données fiables des concentrations en métaux traces dans la colonne d’eau de lacs urbains représentatifs; (2) d’évaluer leur biodisponibilité via une technique de spéciation adéquate ; (3) d’analyser leur évolution saisonnière et spatiale et leur spéciation; (4) d’étudier l’impact des variables environnementales, en particulier de la MOD autochtone sur leur biodisponibilité; (5) de lier la concentration des métaux au mode d’occupation du sol du bassin versant.Notre méthodologie est basée sur un suivi in-situ des lacs en complément d’analyses spécifiques en laboratoire. L’étude a été conduite sur trois sites: le lac de Créteil (France), le lac de Pampulha (Brésil) et 49 lacs péri-urbains (Ile de France). Sur le lac de Créteil, plusieurs dispositifs de mesure en continu nous ont fourni une partie de la base de données limnologiques. Dans le bassin versant du lac de Pampulha, la pression anthropique est très importante. Le climat et le régime hydrologique des 2 lacs sont très différents. Les 49 lacs de la région d’Ile de France ont été échantillonnés une fois pendant trois étés successifs (2011-2013). Ces lacs nous ont fourni une base de données synoptique, représentative de la contamination métallique à l’échelle d’une région fortement anthropisée.Afin d’expliquer le rôle des variables environnementales sur la concentration métallique, le modèle Random Forest a été appliqué sur les bases de données du lac de Pampulha et des 49 lacs urbains avec 2 objectifs spécifiques: (1) dans le lac de Pampulha, comprendre le rôle des variables environnementales sur la fraction labile des métaux traces, potentiellement biodisponible et (2) dans les 49 lacs, comprendre la relation des variables environnementales, particulièrement au niveau du bassin versant, sur la concentration dissoute des métaux. L’analyse des relations entre métaux et variables environnementales constitue l’un des principaux résultats de cette thèse. Dans le lac de Pampulha, environ 80% de la variance du cobalt labile est expliqué par des variables limnologiques: Chla, O2, pH et P total. Pour les autres métaux, le modèle n’a pas réussi à expliquer plus de 50 % de la relation entre fraction labile et variables limnologiques. Dans les 49 lacs, le modèle Random Forest a donné un bon résultat pour le cobalt (60% de la variance expliquée) et un très bon résultat pour le nickel (86% de la variance expliquée). Pour Ni les variables explicatives sont liées au mode d’occupation du sol : « Activités » (Equipements pour l’eau et l’assainissement, entrepôts logistiques, bureaux…) et « Décharge ». Ce résultat est en accord avec le cas du lac de Créteil où la concentration en Ni dissous est très élevée et où les catégories d’occupation du sol « Activités » et « Décharges » sont dominantes
Lakes have a particular influence on the water cycle in urban catchments. Thermal stratification and a longer water residence time in the lake boost the phytoplankton production. Most metals are naturally found in the environment in trace amounts. Trace metals are essential to growth and reproduction of organisms. However, some are also well known for their toxic effects on animals and humans. Total metal concentrations do not reflect their ecotoxicity that depends on their properties and speciation (particulate, dissolved: labile or bioavailable and inert fractions). Trace metals can be adsorbed to various components in aquatic systems including inorganic and organic ligands. The ability of metal binding to dissolved organic matter (DOM), in particular humic substances, has been largely studied. In urban lakes, the phytoplankton development can produce autochthonous DOM, non humic substances that can have the ability of metal binding.. But there are few studies about trace metal speciation in lake water column.The main objectives of this thesis are (1) to obtain a consistent database of trace metal concentrations in the water column of representative urban lakes; (2) to access their bioavailability through an adapted speciation technique; (3) to analyze the seasonal and spatial evolution of the metals and their speciation; (4) to study the potential impact of environmental variables, particularly of dissolved organic matter related to phytoplankton production on metal bioavailability and (5) to link the metal concentrations to the land use in the lake watershed.Our methodology is based on a dense field survey of the water bodies in addition to specific laboratory analysis. The research has been conducted on three study sites: Lake Créteil (France), Lake Pampulha (Brazil) and a panel of 49 peri-urban lakes (Ile de France). Lake Créteil is an urban lake impacted by anthropogenic pollution. It benefits of a large number of monitoring equipment, which allowed us to collect a part of the data set. In Lake Pampulha catchment, the anthropogenic pressure is high. Lake Pampulha has to face with many pollution point and non-point sources. The climate and limnological characteristics of the lakes are also very different. The panel of 49 lakes of Ile de France was sampled once during three successive summers (2011-2013); they provided us with a synoptic, representative data set of the regional metal contamination in a densely anthropized region.In order to explain the role of the environmental variables on the metal concentrations, we applied the Random Forest model on the Lake Pampulha dataset and on the 49 urban lake dataset with 2 specific objectives: (1) in Lake Pampulha, understanding the role of environmental variables on the trace metal labile concentration, considered as potentially bioavailable and (2) in the 49 lakes, understanding the relationship of the environmental variables, more particularly the watershed variables, on the dissolved metal concentrations. The analysis of the relationships between the trace metal speciation and the environmental variables provided the following key results of this thesis.In Lake Pampulha, around 80% of the variance of the labile cobalt is explained by some limnological variables: Chl a, O2, pH, and total phosphorus. For the other metals, the RF model did not succeed in explaining more than 50% of the relationships between the metals and the limnological variables.In the 49 urban lakes in Ile de France, the RF model gave a good result for Co (66% of explained variance) and very satisfying for Ni (86% of explained variance). For Ni, the best explanatory variables are landuse variables such as “activities” (facilities for water, sanitation and energy, logistical warehouses, shops, office…) and “landfill”. This result fits with Lake Creteil where dissolved Ni concentration is particularly high and where the “activities” and “landfill” landuse categories are the highest
Styles APA, Harvard, Vancouver, ISO, etc.
30

Mistry, Pritesh. « A Knowledge Based Approach of Toxicity Prediction for Drug Formulation. Modelling Drug Vehicle Relationships Using Soft Computing Techniques ». Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14440.

Texte intégral
Résumé :
This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.
Styles APA, Harvard, Vancouver, ISO, etc.
31

Heinken, Thilo, et Eckart Winkler. « Non-random dispersal by ants : long-term field data versus model predictions of population spread of a forest herb ». Universität Potsdam, 2009. http://opus.kobv.de/ubp/volltexte/2010/4648/.

Texte intégral
Résumé :
Myrmecochory, i.e. dispersal of seeds by ants towards and around their nests, plays an important role in temperate forests. Yet hardly any study has examined plant population spread over several years and the underlying joint contribution of a hierarchy of dispersal modes and plant demography. We used a seed-sowing approach with three replicates to examine colonization patterns of Melampyrum pratense, an annual myrmecochorous herb, in a mixed Scots pine forest in northeastern Germany. Using a spatially explicit individualbased (SEIB) model population patterns over 4 years were explained by short-distance transport of seeds by small ant species with high nest densities, resulting in random spread. However, plant distributions in the field after another 4 years were clearly deviating from model predictions. Mean annual spread rate increased from 0.9 m to 5.1 m per year, with a clear inhomogeneous component. Obviously, after a lag-phase of several years, non-random seed dispersal by large red wood ants (Formica rufa) was determining the species’ spread, thus resulting in stratified dispersal due to interactions with different-sized ant species. Hypotheses on stratified dispersal, on dispersal lag, and on non-random dispersal were verified using an extended SEIB model, by comparison of model outputs with field patterns (individual numbers, population areas, and maximum distances). Dispersal towards red wood ant nests together with seed loss during transport and redistribution around nests were essential features of the model extension. The observed lag-phase in the initiation of non-random, medium-distance transport was probably due to a change of ant behaviour towards a new food source of increasing importance, being a meaningful example for a lag-phase in local plant species invasion. The results demonstrate that field studies should check model predictions wherever possible. Future research will show whether or not the M. pratense–ant system is representative for migration patterns of similar animal dispersal systems after having crossed range edges by long-distance dispersal events.
Styles APA, Harvard, Vancouver, ISO, etc.
32

Rico-Fontalvo, Florentino Antonio. « A Decision Support Model for Personalized Cancer Treatment ». Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5621.

Texte intégral
Résumé :
This work is motivated by the need of providing patients with a decision support system that facilitates the selection of the most appropriate treatment strategy in cancer treatment. Treatment options are currently subject to predetermined clinical pathways and medical expertise, but generally, do not consider the individual patient characteristics or preferences. Although genomic patient data are available, this information is rarely used in the clinical setting for real-life patient care. In the area of personalized medicine, the advancement in the fundamental understanding of cancer biology and clinical oncology can promote the prevention, detection, and treatment of cancer diseases. The objectives of this research are twofold. 1) To develop a patient-centered decision support model that can determine the most appropriate cancer treatment strategy based on subjective medical decision criteria, and patient's characteristics concerning the treatment options available and desired clinical outcomes; and 2) to develop a methodology to organize and analyze gene expression data and validate its accuracy as a predictive model for patient's response to radiation therapy (tumor radiosensitivity). The complexity and dimensionality of the data generated from gene expression microarrays requires advanced computational approaches. The microarray gene expression data processing and prediction model is built in four steps: response variable transformation to emphasize the lower and upper extremes (related to Radiosensitive and Radioresistant cell lines); dimensionality reduction to select candidate gene expression probesets; model development using a Random Forest algorithm; and validation of the model in two clinical cohorts for colorectal and esophagus cancer patients. Subjective human decision-making plays a significant role in defining the treatment strategy. Thus, the decision model developed in this research uses language and mechanisms suitable for human interpretation and understanding through fuzzy sets and degree of membership. This treatment selection strategy is modeled using a fuzzy logic framework to account for the subjectivity associated to the medical strategy and the patient's characteristics and preferences. The decision model considers criteria associated to survival rate, adverse events and efficacy (measured by radiosensitivity) for treatment recommendation. Finally, a sensitive analysis evaluates the impact of introducing radiosensitivity in the decision-making process. The intellectual merit of this research stems from the fact that it advances the science of decision-making by integrating concepts from the fields of artificial intelligence, medicine, biology and biostatistics to develop a decision aid approach that considers conflictive objectives and has a high practical value. The model focuses on criteria relevant to cancer treatment selection but it can be modified and extended to other scenarios beyond the healthcare environment.
Styles APA, Harvard, Vancouver, ISO, etc.
33

Brokamp, Richard C. « Land Use Random Forests for Estimation of Exposure to Elemental Components of Particulate Matter ». University of Cincinnati / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1463130851.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
34

Maginnity, Joseph D. « Comparing the Uses and Classification Accuracy of Logistic and Random Forest Models on an Adolescent Tobacco Use Dataset ». The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586997693789325.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
35

Al, Tobi Amjad Mohamed. « Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models ». Thesis, University of St Andrews, 2018. http://hdl.handle.net/10023/17050.

Texte intégral
Résumé :
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the performance (accuracy) of anomaly-based network Intrusion Detection Systems (IDS) that are built using predictive models in a batch-learning setup. This thesis investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these Intrusion Detection models. Specifically, this thesis studied the adaptability features of three well known Machine Learning algorithms: C5.0, Random Forest, and Support Vector Machine. The ability of these algorithms to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. A new dataset (STA2018) was generated for this thesis and used for the analysis. This thesis has demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation (test) traffic have different statistical properties. Further investigation was undertaken to analyse the effects of feature selection and data balancing processes on a model's accuracy when evaluation traffic with different significant features were used. The effects of threshold adaptation on reducing the accuracy degradation of these models was statistically analysed. The results showed that, of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates. This thesis then extended the analysis to apply threshold adaptation on sampled traffic subsets, by using different sample sizes, sampling strategies and label error rates. This investigation showed the robustness of the Random Forest algorithm in identifying the best threshold. The Random Forest algorithm only needed a sample that was 0.05% of the original evaluation traffic to identify a discriminating threshold with an overall accuracy rate of nearly 90% of the optimal threshold.
Styles APA, Harvard, Vancouver, ISO, etc.
36

Ozturk, Mehmet. « The Factors Affecting Wind Erosion in Southern Utah ». DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7610.

Texte intégral
Résumé :
Wind erosion is a global issue and affecting millions of people in drylands by causing environmental issues (acceleration of snow melting), public health concerns (respiratory diseases), and socioeconomic problems (costs of damages and cleaning public properties after dust storms). Disturbances in drylands can be irreversible, thus leading to natural disasters such as the 1930s Dust Bowl. With increasing attention on aeolian studies, many studies have been conducted using ground-based measurements or wind tunnel studies. Ground-based measurements are important for validating model predictions and testing the effect and interactions of different factors known to affect wind erosion. Here, a machine-learning model (random forest) was used to describe sediment flux as a function of wind speed, soil moisture, precipitation, soil roughness, soil crusts, and soil texture. Model performance was compared to previous results before analyzing four new years of sediment flux data and including estimates of soil moisture to the model. The random forest model provided a better result than a regression tree with a higher variance explained (7.5% improvement). With additional soil moisture data, the model performance increased by 13.13%. With full dataset, the model provided an increase of 30.50% in total performance compared to the previous study. This research was one of the rare studies which represented a large-scale network of BSNEs and a long time series of data to quantify seasonal sediment flux under different soil covers in southern Utah. The results will also be helpful to the managers for controlling the effects on wind erosion, scientists to choose variables for further modeling or local people to increase the public awareness about the effects of wind erosion.
Styles APA, Harvard, Vancouver, ISO, etc.
37

Landmér, Pedersen Jesper. « Weighing Machine Learning Algorithms for Accounting RWISs Characteristics in METRo : A comparison of Random Forest, Deep Learning & ; kNN ». Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-85586.

Texte intégral
Résumé :
The numerical model to forecast road conditions, Model of the Environment and Temperature of Roads (METRo), laid the foundation of solving the energy balance and calculating the temperature evolution of roads. METRo does this by providing a numerical modelling system making use of Road Weather Information Stations (RWIS) and meteorological projections. While METRo accommodates tools for correcting errors at each station, such as regional differences or microclimates, this thesis proposes machine learning as a supplement to the METRo prognostications for accounting station characteristics. Controlled experiments were conducted by comparing four regression algorithms, that is, recurrent and dense neural network, random forest and k-nearest neighbour, to predict the squared deviation of METRo forecasted road surface temperatures. The results presented reveal that the models utilising the random forest algorithm yielded the most reliable predictions of METRo deviations. However, the study also presents the promise of neural networks and the ability and possible advantage of seasonal adjustments that the networks could offer.
Styles APA, Harvard, Vancouver, ISO, etc.
38

Fagua, José Camilo. « Geospatial Modeling of Land Cover Change in the Chocó-Darien Global Ecoregion of South America : Assessing Proximate Causes and Underlying Drivers of Deforestation and Reforestation ». DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7362.

Texte intégral
Résumé :
The Chocó-Darien Global Ecoregion (CGE) in South America is one of 25 global biodiversity hotspots prioritized for conservation. I performed the first land-use and land-cover (LULC) change analysis for the entire CGE in this dissertation. There were three main objectives: 1) Select the best available imagery to build annual land-use and land-cover maps from 2001 to 2015 across the CGE. 2) Model LULC across the CGE to assess forest change trends from 2002 to 2015 and identify the effect of proximate causes of deforestation and reforestation. 3) Estimate the effects of underlying drivers on deforestation and reforestation across the CGE between 2002 and 2015. I developed annual LULC maps across the CGE from 2002 to 2015 using MODIS (Moderate Resolution Imaging Spectro radiometer) vegetation index products and random forest classification. The LULC maps resulted in high accuracies (Kappa = 0.87; SD = 0.008). We detected a gradual replacement of forested areas with agriculture and secondary vegetation (agriculture reverting to early regeneration of natural vegetation) across the CGE. Forest loss was higher between 2010-2015 when compared to 2002-2010. LULC change trends, proximate causes, and reforestation transitions varied according to administrative authority (countries: PanamanianCGE, Colombian CGE, and Ecuadorian CGE). Population growth and road density were underlying drivers of deforestation. Armed conflicts, Gross Domestic Product, and average annual rain were proximate causes and underlying drivers related reforestation.
Styles APA, Harvard, Vancouver, ISO, etc.
39

Frost, Scott M. « Fire Environment Analysis at Army Garrison Camp Williams in Relation to Fire Behavior Potential for Gauging Fuel Modification Needs ». DigitalCommons@USU, 2015. https://digitalcommons.usu.edu/etd/4560.

Texte intégral
Résumé :
Large fires (400 ha +) occur about every seven to ten years in the vegetation types located at US Army Garrison Camp Williams (AGCW) practice range located near South Jordan, Utah. In 2010 and 2012, wildfires burned beyond the Camp’s boundaries into the wildland-urban interface. The political and public reaction to these fire escapes was intense. Researchers at Utah State University were asked to organize a system of fuel treatments that could be developed to prevent future escapes. The first step of evaluation was to spatially predict fuel model types derived from a random forests classification approach. Fuel types were mapped according to fire behavior fuel models with an overall validation of 72.3% at 0.5 m resolution. Next, using a combination of empirical and semi-empirical based methods, potential fire behavior was analyzed for the dominant vegetation types at AGCW on a climatological basis. Results suggest the need for removal of woody vegetation within 20 m of firebreaks and a minimum firebreak width of 8 m in grassland fuels. In Utah juniper (Juniperus osteosperma (Torr.) Little), results suggest canopy coverage of 25% or less while in Gambel oak (Quercus gambelii Nutt.) stands along the northern boundary of the installation, a fuelbreak width of 60 m for secondary breaks and 90 m for primary breaks is recommended.
Styles APA, Harvard, Vancouver, ISO, etc.
40

Lood, Olof. « Prediktering av grundvattennivåi område utan grundvattenrör : Modellering i ArcGIS Pro och undersökningav olika miljövariablers betydelse ». Thesis, Uppsala universitet, Institutionen för geovetenskaper, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-448020.

Texte intégral
Résumé :
Myndigheten Sveriges Geologiska Undersökning (SGU) har ett nationellt ansvar för att övervaka Sveriges grundvattennivåer. Eftersom det inte är möjligt att få ett heltäckande mätstationssystem måste grundvattennivån beräknas på vissa platser. Därför är det intressant att undersöka sambandet mellan grundvattennivån och utvald geografisk information, så kallade miljövariabler. På sikt kan maskininlärning komma att användas inom SGU för att beräkna grundvattennivån och då kan en förstudie vara till stor hjälp. Examensarbetets syfte är att genomföra en sådan förstudie genom att undersöka vilka miljövariabler som har störst betydelse för grundvattennivån och kartlägga modellosäkerheter vid grundvattenprediktering. Förstudien genomförs på sju områden inom SGUs grundvattennät där mätstationerna finns i grupper likt kluster. I förstudien används övervakad maskininlärning som i detta examensarbete innebär att medianvärden på grundvattennivån och miljövariablerna används för att träna modellerna. Med hjälp av statistisk data från modellerna kan prestandan utvärderas och justeringar göras. Algoritmen som används heter Random Forest som skapar ett klassifikations- och regressionsträd, vilket lär modellen att utifrån given indata fatta beslut som liknar männiksans beslutfattande. Modellerna ställs upp i ArcGIS Pros verktyg Forest-based Classification and Regression. På grund av områdenas geografiska spridning sätts flera separata modeller upp. Resultatet visar att det är möjligt att prediktera grundvattennivån men betydelsen av de olika miljövariablerna varierar mellan de sju undersökta områdena. Orsaken till detta lär vara geografiska skillnader. Oftast har den absoluta höjden och markens lutningsriktning mycket stor betydelse. Höjd- och avståndsskillnad till låg och hög genomsläpplig jord har större betydelse än vad höjd- och avståndsskillnad har till medelhög genomsläpplig jord. Höjd- och avståndsskillnad har större betydelse till större vattendrag än till mindre vattendrag. Modellernas r2-värde är något låga men inom rimliga gränser för att vara hydrologiska modeller. Standardfelen är oftast inom rimliga gränser. Osäkerheten har visats genom ett     90 %-igt konfidensintervall. Osäkerheterna ökar med ökat avstånd till mätstationerna och är som högst vid hög altitud. Orsaken lär vara för få ingående observationer och för få observationer på hög höjd. Nära mätstationer, bebyggelse och i dalgångar är osäkerheterna i de flesta fallen inom rimliga gränser.
The Swedish authority Geological Survey of Sweden (SGU) has a national responsibility to oversee the groundwater levels. A national network of measurement stations has been established to facilitate this. The density of measurement stations varies considerably. Since it will never be feasible to cover the entire country with measurement stations, the groundwater levels need to be computed in areas that are not in the near vicinity of a measurement station. For that reason, it is of interest to investigate the correlation between the groundwater levels and selected geographical information, so called environmental variables. In the future, SGU may use machine learning to compute the groundwater levels. The focus of this master's thesis is to study the importance of the environmental variables and model uncertainties in order to determine if this is a feasible option for implementation on a national basis. The study uses data from seven areas of the Groundwater network of SGU, where the measuring stations are in clusters. The pilot study uses a supervised machine learning method which in this case means that the median groundwater levels and the environmental variables train the models. By evaluating the model's statistical data output the performance can gradually be improved. The algorithm used is called “Random Forest” and uses a classification and regression tree to learn how to make decisions throughout a network of nodes, branches and leaves due to the input data. The models are set up by the prediction tool “Forest-based Classification and Regression” in ArcGIS Pro. Because the areas are geographically spread out, eight unique models are set up. The results show that it’s possible to predict groundwater levels by using this method but that the importance of the environmental variables varies between the different areas used in this study. The cause of this may be due to geographical and topographical differences. Most often, the absolute level over mean sea level and slope direction are the most important variables. Planar and height distance differences to low and high permeable soils have medium high importance while the distance differences to medium high permeable soils have lower importance. Planar and height distance differences are more important to lakes and large watercourses than to small watercourses and ditches.  The model’s r2-values are slightly low in theory but within reasonable limits to be a hydrological model. The Standard Errors Estimate (SSE) are also in most cases within reasonable limits. The uncertainty is displayed by a 90 % confidence interval. The uncertainties increase with increased distance to measuring stations and become greatest at high altitude. The cause of this may be due to having too few observations, especially in areas with high altitude. The uncertainties are smaller close to the stations and in valleys.
SGUs grundvattennät
Styles APA, Harvard, Vancouver, ISO, etc.
41

Carter, Kristina A. « A Comparison of Variable Selection Methods for Modeling Human Judgment ». Ohio University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1552494031580848.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
42

Kennedy, Brian Michael Kennedy. « Leveraging Multimodal Tumor mRNA Expression Data from Colon Cancer : Prospective Observational Studies for Hypothesis Generating and Predictive Modeling ». The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1498742562364379.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
43

Badayos, Noah Garcia. « Machine Learning-Based Parameter Validation ». Diss., Virginia Tech, 2014. http://hdl.handle.net/10919/47675.

Texte intégral
Résumé :
As power system grids continue to grow in order to support an increasing energy demand, the system's behavior accordingly evolves, continuing to challenge designs for maintaining security. It has become apparent in the past few years that, as much as discovering vulnerabilities in the power network, accurate simulations are very critical. This study explores a classification method for validating simulation models, using disturbance measurements from phasor measurement units (PMU). The technique used employs the Random Forest learning algorithm to find a correlation between specific model parameter changes, and the variations in the dynamic response. Also, the measurements used for building and evaluating the classifiers were characterized using Prony decomposition. The generator model, consisting of an exciter, governor, and its standard parameters have been validated using short circuit faults. Single-error classifiers were first tested, where the accuracies of the classifiers built using positive, negative, and zero sequence measurements were compared. The negative sequence measurements have consistently produced the best classifiers, with majority of the parameter classes attaining F-measure accuracies greater than 90%. A multiple-parameter error technique for validation has also been developed and tested on standard generator parameters. Only a few target parameter classes had good accuracies in the presence of multiple parameter errors, but the results were enough to permit a sequential process of validation, where elimination of a highly detectable error can improve the accuracy of suspect errors dependent on the former's removal, and continuing the procedure until all corrections are covered.
Ph. D.
Styles APA, Harvard, Vancouver, ISO, etc.
44

Vaughan, Angus A. « Discharge-Suspended Sediment Relations : Near-channel Environment Controls Shape and Steepness, Land Use Controls Median and Low Flow Conditions ». DigitalCommons@USU, 2016. https://digitalcommons.usu.edu/etd/5191.

Texte intégral
Résumé :
We analyzed recent total suspended solids (TSS) data from 45 gages on 36 rivers throughout the state of Minnesota. Watersheds range from 32 to 14,600 km2 and represent a variety of distinct settings in terms of topography, land cover, and geologic history. Our study rivers exhibited three distinct patterns in the relationship between discharge and TSS: simple power functions, threshold power functions, and peaked or negative power functions. Differentiating rising and falling limb samples, we generated sediment rating curves (SRC) of form TSS = aQb, Q being normalized discharge. Rating parameters a and b describe the vertical offset and steepness of the relationships. We also used the fitted SRCs to estimate TSS values at low flows and to quantify event-scale hysteresis. In addition to quantifying the watershed-average topographic, climatic/hydrologic, geologic, soil and land cover conditions, we used high-resolution lidar topography data to characterize the near-channel environment upstream of gages. We used Random Forest statistical models to analyze the relationship between basin and channel features and the rating parameters. The models enabled us to identify morphometric variables that provided the greatest explanatory power and examine the direction, form, and strength of the partial dependence of the response variables on individual predictor variables. The models explained between 43% and 60% of the variance in the rating curve parameters and determined that Q-TSS relation steepness (exponent) was most related to near-channel morphological characteristics including near-channel local relief, channel gradient, and proportion of lakes along the channel network. Land use within the watershed explained most variation in the vertical offset (coefficient) of the SRCs and in TSS concentrations at low flows.
Styles APA, Harvard, Vancouver, ISO, etc.
45

Straková, Kristýna. « Datamining a využití rozhodovacích stromů při tvorbě Scorecards ». Master's thesis, Vysoká škola ekonomická v Praze, 2014. http://www.nusl.cz/ntk/nusl-201627.

Texte intégral
Résumé :
The thesis presents a comparison of several selected modeling methods used by financial institutions for (not exclusively) decision-making processes. First theoretical part describes well known modeling methods such as logistic regression, decision trees, neural networks, alternating decision trees and relatively new method called "Random forest". The practical part of thesis outlines some processes within financial institutions, in which selected modeling methods are used. On real data of two financial institutions logistic regression, decision trees and decision forest are compared which each other. Method of neural network is not included due to its complex interpretability. In conclusion, based on resulting models, thesis is trying to answers, whether logistic regression (method most widely used by financial institutions) remains most suitable.
Styles APA, Harvard, Vancouver, ISO, etc.
46

Jacobsson, Marcus, et Viktor Inkapööl. « Prediktion av optimal tidpunkt för köp av flygbiljetter med hjälp av maskininlärning ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281767.

Texte intégral
Résumé :
The work presented in this study is based on the desire of cutting consumer costs related to purchase of airfare tickets. In detail, the study has investigated whether it is possible to classify optimal purchase decisions for specific flight routes with high accuracy using machine learning models trained with basic data containing only price and search date for a given date of departure. The models were based on Random Forest Classifier and trained on search data up to 90 days ahead of every leave date in July 2016-2018, and tested on the same kind of data for 2019. After preparation of data and tuning of hyperparameters the final models managed to correctly classify optimal purchase with an accuracy of 88% for the trip Stockholm-Mallorca and 84% for the trip Stockholm-Bangkok. Based on the assumption that the number of searches correlates with demand and in turn actual purchases, the study calculated the average expected savings per ticket using the model on the specific routes to be 21% and 17% respectively. Furthermore, the study has also examined how a business model for price comparison could be reshaped to incorporate these findings. The framework was set up using Business Model Canvas and resulted in the recommendation of implementing a premium service where users would be given the information wether to buy or wait based on a search.
Arbetet presenterat i studien är baserat på målet att sänka konsumentkostnader relaterat till köp av flygresor. Mer specifikt har studien undersökt huruvida det är möjligt att predicera optimala köpbeslut för specifika flygrutter med hjälp av maskininlärningsmodeller tränade på grundläggande data innehållande endast information om pris och sökdatum för varje givet avresedatum. Modellerna baserades på Random Forest Classifier och tränades på sökdata upp till 90 dagar före avresa för varje avresedag i juli 2016–2018, och testades på likadan data för 2019. Efter förberedelse av data och tuning av hyperparametrar lyckades modellerna med en träffsäkerhet på 88% respektive 84% predicera optimalt köp för rutterna Stockholm-Mallorca respektive Stockholm-Bangkok. Baserat på antagande om att antalet sökningar korrelerar med efterfrågan och vidare faktiska köp, beräknade studien att den genomsnittliga förväntade besparingen per biljett vid användning av modeller på de undersökta rutterna till 21% respektive 17%. Vidare undersökte studien hur en affärsmodell för prisjämförelse kan omformas för att inkorporera resultaten. Ramverkat som användes för detta var Business Model Canvas och mynnade ut i en rekommendation av implementering av en premiumtjänst genom vilken användare ges information biljett ska köpas eller ej vid en given sökning.
Styles APA, Harvard, Vancouver, ISO, etc.
47

Yang, Kaolee. « A Statistical Analysis of Medical Data for Breast Cancer and Chronic Kidney Disease ». Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1587052897029939.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
48

Karlsson, Daniel, et Alex Lindström. « Automated Learning and Decision : Making of a Smart Home System ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234313.

Texte intégral
Résumé :
Smart homes are custom-fitted systems for users to manage their home environments. Smart homes consist of devices which has the possibility to communicate between each other. In a smart home system, the communication is used by a central control unit to manage the environment and the devices in it. Setting up a smart home today involves a lot of manual customizations to make it function as the user wishes. What smart homes lack is the possibility to learn from users behaviour and habits in order to provide a customized environment for the user autonomously. The purpose of this thesis is to examine whether environmental data can be collected and used in a small smart home system to learn about the users behaviour. To collect data and attempt this learning process, a system is set up. The system uses a central control unit for mediation between wireless electrical outlets and sensors. The sensors track motion, light, temperature as well as humidity. The devices and sensors along with user interactions in the environment make up the collected data. Through studying the collected data, the system is able to create rules. These rules are used for the system to make decisions within its environment to suit the users’ needs. The performance of the system varies depending on how the data collection is handled. Results find that collecting data in intervals as well as when an action is made from the user is important.
Smarta hem är system avsedda för att hjälpa användare styra sin hemmiljö. Ett smart hem är uppbyggt av enheter med möjlighet att kommunicera med varandra. För att kontrollera enheterna i ett smart hem, används en central styrenhet. Att få ett smart hem att vara anpassat till användare är ansträngande och tidskrävande. Smarta hemsystem saknar i stor utsträckning möjligheten att lära sig av användarens beteende. Vad ett sådant lärande skulle kunna möjliggöra är ett skräddarsytt system utan användarens involvering. Syftet med denna avhandling är att undersöka hur användardata från en hemmiljö kan användas i ett smart hemsystem för att lära sig av användarens beteende. Ett litet smart hemsystem har skapats för att studera ifall denna inlärningsmetod är applicerbar. Systemet består av sensorer, trådlösa eluttag och en central styrenhet. Den centrala styrenheten används för att kontrollera de olika enheterna i miljön. Sensordata som sparas av systemet består av rörelse, ljusstyrka, temperatur och luftfuktighet. Systemet sparar även användarens beteende i miljön. Systemet skapar regler utifrån sparad data med målet att kunna styra enheterna i miljön på ett sätt som passar användaren. Systemets agerande varierade beroende på hur data samlades in. Resultatet visar vikten av att samla in data både i intervaller och när användare tar ett beslut i miljön.
Styles APA, Harvard, Vancouver, ISO, etc.
49

Zarebanadkoki, Samane. « Essays on Health Economics Using Big Data ». UKnowledge, 2019. https://uknowledge.uky.edu/agecon_etds/82.

Texte intégral
Résumé :
This dissertation consists of three essays addressing different topics in health economics. In the first essay, we perform a systematic review of peer-reviewed articles examining consumer preference for the main electronic cigarette (e-cigarette) attributes namely flavor, nicotine strength, and type. The search resulted in a pool of 12,933 articles; 66 articles met the inclusion criteria for this review. Current literature suggests consumers preferred flavored e-cigarettes, and such preference varies with age groups and smoking status. Consumer preference for nicotine strength and types depend on smoking status, e-cigarette use history, and gender. Adolescents consider flavor the most important factor trying e-cigarettes and were more likely to initiate vaping through flavored e-cigarettes. Young adults prefer sweet, menthol, and cherry flavors, while non-smokers, in particular, prefer coffee and menthol flavors. Adults in general also prefer sweet flavors (though smokers like tobacco flavor the most) and dislike flavors that elicit bitterness or harshness. Non-smokers and inexperienced e-cigarettes users tend to prefer no nicotine or low nicotine e-cigarettes while smokers and experienced e-cigarettes users prefer medium and high nicotine e-cigarettes. Weak evidence exists regarding a positive interaction between menthol flavor and nicotine strength. In the second essay, we investigate U.S. adult consumer preference for three key e-cigarette attributes––flavor, nicotine strength, and type––by applying a discrete choice model to the Nielsen scanner data (Consumer Panel data combined with retail data) for 2013 through 2017, generating novel findings as well as complementing the large literature on the topic using focus groups, surveys, and experiments. We found that (adult) vapers prefer tobacco flavor, medium nicotine strength, and disposables, and such preference can vary over cigarette smoking status, purchase frequency, gender, race, and age. In particular, smokers prefer tobacco flavor, non-smokers or female vapers prefer medium strength, and infrequent vapers prefer disposables. Vapers also display loyalty (inertia) to e-cigarette brands, flavor, and nicotine strength. One key policy implication is that a flavor ban will likely have a relatively larger impact on adolescents and young adults than adults. The third essay employs a machine learning algorithm, particularly a random forest, to identify the importance of BMI information during kindergarten on predicting children most likely to be obese by the 4th grade. We use the Arkansas BMI screening program dataset. The potential value of BMI information during early childhood to predict the likelihood of obesity later in life is one of the main benefits of a BMI screening program. This study identifies the value of this information by comparing the results of two random forests trained with and without kindergarten BMI information to assess the ability of BMI screening to improve a predictive model beyond personal, demographic, and socioeconomic measures that are typically used to identify children at high risk of excess weight gain. The BMI z-score from kindergarten is the most important variable and increases the accuracy of the prediction by 14%. The ability of BMI screening programs to identify children at greatest risk of becoming obese is an important but neglected dimension that should be used in evaluating the overall utility. In the last essay, we use Nielson retail scanner dataset and apply a difference-in-differences (DID) approach and synthetic control method, and we test whether consumers in Utah reduced beef purchases after the 2009 Salmonella outbreak of ground beef products. The result of DID approach indicates that the Salmonella event reduced ground beef purchases in Utah by 17% in four weeks after the recall. Price elasticity of demand is also estimated to be -2.04; therefore, the reduction in ground beef purchases as a result of recall is comparable to almost 8.3% increase in the price of this product. Using the synthetic control method that allows us to use all of the control states to produce synthetic Utah, we found the effect of this event minimal compared to the DID effect.
Styles APA, Harvard, Vancouver, ISO, etc.
50

Staberg, Pontus, Emil Häglund et Jakob Claesson. « Injury Prediction in Elite Ice Hockey using Machine Learning ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235959.

Texte intégral
Résumé :
Sport clubs are always searching for innovative ways to improve performance and obtain a competitive edge. Sports analytics today is focused primarily on evaluating metrics thought to be directly tied to performance. Injuries indirectly decrease performance and cost substantially in terms of wasted salaries. Existing sports injury research mainly focuses on correlating one specific feature at a time to the risk of injury. This paper provides a multidimensional approach to non-contact injury prediction in Swedish professional ice hockey by applying machine learning on historical data. Several features are correlated simultaneously to injury probability. The project’s aim is to create an injury predicting algorithm which ranks the different features based on how they affect the risk of injury. The paper also discusses the business potential and strategy of a start-up aiming to provide a solution for predicting injury risk through statistical analysis.
Idrottsklubbar letar ständigt efter innovativa sätt att förbättra prestation och erhålla konkurrensfördelar. Idag fokuserar data- analys inom idrott främst på att utvärdera mätvärden som tros vara direkt korrelerade med prestation. Skador sänker indirekt prestationen och kostar markant i bortslösade spelarlöner. Tidigare studier på skador inom idrotten fokuserar huvudsakligen på att korrelera ett mätvärde till en skada i taget. Den här rapporten ger ett multidimensionellt angreppssätt till att förutse skador inom svensk elitishockey genom att applicera maskininlärning på historisk data. Flera attribut korreleras samtidigt för att få fram en skadesannolikhet. Målet med den här rapporten är att skapa en algoritm för att förutse skador och även ranka olika attribut baserat på hur de påverkar skaderisken. I rapporten diskuteras även affärsmöjligheterna för en sådan lösning och hur en potentiell start-up ska positionera sig på marknaden.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie