Дисертації з теми "Modèle « Random Forest »"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Modèle « Random Forest »".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Mita, Mara. "Assessment of seismic displacements of existing landslides through numerical modelling and simplified methods." Electronic Thesis or Diss., Université Gustave Eiffel, 2023. http://www.theses.fr/2023UEFL2075.
Landslides are common secondary effects related to earthquakes which can be responsible for greater damages than the ground shaking alone. Predicting these phenomena is therefore essential for risk management in seismic regions. Nowadays, landslides permanent co-seismic displacements are assessed by the traditional « rigid-sliding block » method proposed by Newmark (1965). Despite its limitations, this method has two advantages: i) relatively short computation times, ii) compatibility with GIS software for regional-scale analyses. Alternatively, more complex numerical analyses can be performed to simulate seismic waves propagation into slopes and related effects. However, due to their longer computation times, their use is usually limited to slope-scale analyses. This study aims at better understanding in which conditions (i.e. combinations of introduced relevant parameters), analytical and numerical methods predict different landslides earthquake-induced displacements. At this regard, 216 2D landslide prototypes were designed by combining geometrical and geotechnical parameters inferred by statistical analysis on data collected by literature review. Landslide prototypes were forced by 17 signals with constant Arias Intensity (AI ~ 0.1 m/s) and variable mean period. Results allowed defining a preliminary Random Forest model to predict a priori, the expected difference between displacements by the two methods. Analysis of results allowed: i) identifying parameters affecting displacement variation according to the two methods, ii) concluding that in here considered AI level, computed displacements differences are negligible in most of the cases
Walschaerts, Marie. "La santé reproductive de l'homme : méthodologie et statistique." Toulouse 3, 2011. http://thesesups.ups-tlse.fr/1470/.
Male reproductive health is an indicator of his overall health. It is also closely linked to environmental exposures and living habits. Nowadays, surveillance of male fertility shows a secular decline in sperm quality and increased disease and malformations of the male reproductive tract. The objective of this work is to study the male reproductive health in an epidemiologic aspect and through various statistical tools. Initially, we were interested in the pathology of testicular cancer, its incidence and its risk factors. Then, we studied the population of men consulting for male infertility, their andrological examination, their therapeutic care and their parenthood project. Finally, the birth event was analyzed through survival models: the Cox model and the survival trees. We compared different methods of stable selection variables (the stepwise bootstrapped and the bootstrap penalisation L1 method based on Cox model, and the bootstrap node-level stabilization method and random survival forests) in order to obtain a final model easy to interpret and which improve prediction. In South of France, the incidence of testicular cancer doubled over the past 20 years. The birth cohort effect, i. E. The generational effect, suggests a hypothesis of a deleterious effect of environmental exposure on male reproductive health. However, the living environment of man during his adult life does not seem to be a potential risk factor for testicular cancer, suggesting hypothesis of exposure to endocrine disruptors in utero. The responsibility of man for difficulties in conceiving represents 50% of cases of infertility, making the management of male infertility essential. In our cohort, 85% of male partners presented an abnormal clinical examination (either a medical history or the presence of an anomaly in andrological examination). Finally, one in two couples who consulted for male infertility successfully had a child. The age of men over 35 appears to be a major risk factor, which should encourage couples to start their parenthood project earlier. Taking into account the survival time in the reproductive outcome of these infertile couples, the inclusion of large numbers of covariates gives models often unstable. We associated the bootstrap method to variables selection approaches. Although the method of Random Survival Forests is the best in the prediction performance, the results are not easily interpretable. Results are different according to the size of the sample. Based on the Cox model, the stepwise algorithm is inappropriate when the number of events is too small. The bootstrap node-level stabilization method does not seem better in prediction performance than a simple survival tree (difficulty to prune the tree). Finally, the Cox model based on selection variables with the penalisation L1 method seems a good compromise between interpretation and prediction
Asritha, Kotha Sri Lakshmi Kamakshi. "Comparing Random forest and Kriging Methods for Surrogate Modeling." Thesis, Blekinge Tekniska Högskola, Fakulteten för datavetenskaper, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20230.
Pettersson, Anders. "High-Dimensional Classification Models with Applications to Email Targeting." Thesis, KTH, Matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168203.
Företag kan använda e-mejl för att på ett enkelt sätt sprida viktig information, göra reklam för nya produkter eller erbjudanden och mycket mer, men för många e-mejl kan göra att kunder slutar intressera sig för innehållet, genererar badwill och omöjliggöra framtida kommunikation. Att kunna urskilja vilka kunder som är intresserade av det specifika innehållet skulle vara en möjlighet att signifikant förbättra ett företags användning av e-mejl som kommunikationskanal. Denna studie fokuserar på att urskilja kunder med hjälp av statistisk inlärning applicerad på historisk data tillhandahållen av musikstreaming-företaget Spotify. En binärklassificeringsmodell valdes, där responsvariabeln beskrev huruvida kunden öppnade e-mejlet eller inte. Två olika metoder användes för att försöka identifiera de kunder som troligtvis skulle öppna e-mejlen, logistisk regression, både med och utan regularisering, samt random forest klassificerare, tack vare deras förmåga att hantera högdimensionella data. Metoderna blev sedan utvärderade på både ett träningsset och ett testset, med hjälp av flera olika statistiska valideringsmetoder så som korsvalidering och ROC kurvor. Modellerna studerades under både scenarios med stora stickprov och högdimensionella data. Där scenarion med högdimensionella data representeras av att antalet observationer, N, är av liknande storlek som antalet förklarande variabler, p, och scenarion med stora stickprov representeras av att N ≫ p. Lasso-baserad variabelselektion utfördes för båda dessa scenarion för att studera informationsvärdet av förklaringsvariablerna. Denna studie visar att det är möjligt att signifikant förbättra öppningsfrekvensen av e-mejl genom att selektera kunder, även när man endast använder små mängder av data. Resultaten visar att en enorm ökning i antalet träningsobservationer endast kommer förbättra modellernas förmåga att urskilja kunder marginellt.
Henriksson, Erik, and Kristopher Werlinder. "Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.
Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.
Hawkins, Susan. "The stability of host-pathogen multi-strain models." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:c324b259-57ee-4cc4-b68c-21b4d98414da.
Ferrat, L. "Machine learning and statistical analysis of complex mathematical models : an application to epilepsy." Thesis, University of Exeter, 2019. http://hdl.handle.net/10871/36090.
Castillo, Beldaño Ana Isabel. "Modelo de fuga y políticas de retención en una empresa de mejoramiento del hogar." Tesis, Universidad de Chile, 2014. http://repositorio.uchile.cl/handle/2250/130827.
El dinamismo que ha presentado la industria del mejoramiento del hogar en el último tiempo, ha llevado a que las empresas involucradas deban preocuparse por entender el comportamiento de compra de sus consumidores, ya que no solo deben enfocar sus recursos y estrategias en capturar nuevos clientes sino también en la retención de éstos. El objetivo de este trabajo es estimar la fuga de clientes en una empresa de mejoramiento del hogar con el fin de generar estrategias de retención. Para ello se definirán criterios de fuga y se determinarán probabilidades para gestionar acciones sobre una fracción de clientes propensos a fugarse. Para alcanzar los objetivos mencionados, se trabajará sólo con clientes que forman parte de la cartera de un vendedor y se hará uso de las siguientes herramientas: estadística descriptiva, técnica RFM y la comparación de los modelos predictivos Árbol de decisión y Random Forest, donde la principal diferencia de estos últimos es la cantidad de variables y árboles que se construyen para la predicción de las probabilidades de fuga. Los resultados obtenidos entregan tres criterios de fuga, de manera que un cliente es catalogado como fugado cuando supera cualquiera de las cotas máximas, es decir, 180 días para el caso del recency, 20 para R/F o una variación de monto menores al -80%, por lo que la muestra queda definida con un 53,9% de clientes fugados versus un 46,1% de clientes activos. Con respecto a los modelos predictivos se tiene que el Árbol de decisión entrega un mejor nivel de certeza con un 84,1% versus un 74,7% del Random Forest, por lo que se eligió el primero obteniendo a través de las probabilidades de fuga 4 tipos de clientes: Leales (37,9%), Normales (7,8%), Propensos a fugarse (15,6%) y Fugados (38,7%). Se tiene que las causas de fuga corresponden a largos períodos de inactividad, atrasos en los ciclos de compras y una disminución en los montos y números de transacciones al igual que un aumento en el monto de transacciones negativas aludidas directamente a devoluciones y notas de crédito, por lo que las principales acciones de retención serían promociones, club de fidelización, descuentos personalizados y mejorar gestión en despachos y niveles de stock para que el cliente vuelva efectuar una compra en un menor plazo. Finalmente, a partir de este trabajo, se concluye que al retener 5% de clientes de probabilidades entre [0,5 y 0,75] y con el 50% de los mayores montos de transacciones se obtienen ingresos por USD $205 mil en 6 meses, representando el 5,5% de los clientes. Se propone validar este trabajo en nuevos clientes, generar alguna encuesta de satisfacción y mejorar el desempeño de los vendedores con una optimización de cartera.
Teang, Kanha, and Yiran Lu. "Property Valuation by Machine Learning and Hedonic Pricing Models : A Case study on Swedish Residential Property." Thesis, KTH, Fastigheter och byggande, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-298307.
Fastighetsvärdering är ett kritiskt koncept för en mängd olika applikationer på fastighetsmarknaden som transaktioner, skatter, investeringar och inteckningar. Det finns dock liten konsekvens i vilken metod som är bäst för att uppskatta fastighetsvärdet. Denna uppsats syftar till att undersöka och jämföra skillnaderna i Stockholms fastighetsvärderingsresultat bland parametriska hedoniska prissättningsmodeller (HPM) inklusive linjära och log-linjära regressionsmodeller, och Random Forest (RF) som maskininlärningsalgoritm. Uppgifterna består av 114,293 armlängds-transaktioner för hyresgästen från januari 2005 till december 2014. Samma variabler tillämpas på både HPM-regressionsmodellerna och RF. Det finns två antagna tekniker för uppdelning av data i utbildning och testning av datamängder: slumpmässig uppdelning och uppdelning baserat på transaktionsåren. Dessa datamängder kommer att användas för att träna och testa alla modeller. Prestationsutvärderingen och mätningen av varje modell baseras på fyra resultatindikatorer: R-kvadrat, MSE, RMSE och MAPE. Resultaten från båda uppdelningsförhållandena har visat att noggrannheten hos slumpmässig skog är den högsta bland regressionsmodellerna. Diskussionerna pekar på orsakerna till modellernas prestandaförändringar när de tillämpats på olika datamängder erhållna från olika datasplittringstekniker. Begränsningar påpekas också i slutet av studien för framtida förbättringar.
Ramosaj, Burim [Verfasser], Markus [Akademischer Betreuer] Pauly, and Jörg [Gutachter] Rahnenführer. "Analyzing consistency and statistical inference in Random Forest models / Burim Ramosaj ; Gutachter: Jörg Rahnenführer ; Betreuer: Markus Pauly." Dortmund : Universitätsbibliothek Dortmund, 2020. http://d-nb.info/1218781378/34.
Kalmár, Marcus, and Joel Nilsson. "The art of forecasting – an analysis of predictive precision of machine learning models." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-280675.
Wu, Shuang. "Algebraic area distribution of two-dimensional random walks and the Hofstadter model." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS459/document.
This thesis is about the Hofstadter model, i.e., a single electron moving on a two-dimensional lattice coupled to a perpendicular homogeneous magnetic field. Its spectrum is one of the famous fractals in quantum mechanics, known as the Hofstadter's butterfly. There are two main subjects in this thesis: the first is the study of the deep connection between the Hofstadter model and the distribution of the algebraic area enclosed by two-dimensional random walks. The second focuses on the distinctive features of the Hofstadter's butterfly and the study of the bandwidth of the spectrum. We found an exact expression for the trace of the Hofstadter Hamiltonian in terms of the Kreft coefficients, and for the higher moments of the bandwidth.This thesis is organized as follows. In chapter 1, we begin with the motivation of our work and a general introduction to the Hofstadter model as well as to random walks will be presented. In chapter 2, we will show how to use the connection between random walks and the Hofstadter model. A method to calculate the generating function of the algebraic area distribution enclosed by planar random walks will be explained in details. In chapter 3, we will present another method to study these issues, by using the point spectrum traces to recover the full Hofstadter trace. Moreover, the advantage of this construction is that it can be generalized to the almost Mathieu operator. In chapter 4, we will introduce the method which was initially developed by D.J.Thouless to calculate the bandwidth of the Hofstadter spectrum. By following the same logic, I will show how to generalize the Thouless bandwidth formula to its n-th moment, to be more precisely defined later
Maginnity, Joseph D. "Comparing the Uses and Classification Accuracy of Logistic and Random Forest Models on an Adolescent Tobacco Use Dataset." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1586997693789325.
Poitevin, Caroline Myriam. "Non-random inter-specific encounters between Amazon understory forest birds : what are theyand how do they change." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2016. http://hdl.handle.net/10183/150626.
Inter-specific associations of birds are complex social phenomena, frequently detected and often stable over time and space. So far, the social structure of these associations has been largely deduced from subjective assessments in the field or by counting the number of inter-specific encounters at the whole-group level, without considering changes to individual pairwise interactions. Here, we look for evidence of non-random association between pairs of bird species, delimit groups of more strongly associated species and examine differences in social structure between old growth and secondary forest habitat. We used records of bird species detection from mist-netting capture and from acoustic recordings to identify pairwise associations that were detected more frequently than expected under a null distribution, and compared the strength of these associations between old-growth and secondary forest Amazonian tropical forest. We also used the pairwise strength associations to visualize the social network structure and its changes between habitat types. We found many strongly positive interactions between species, but no evidence of repulsion. Network analyses revealed several modules of species that broadly agree with the subjective groupings described in the ornithological literature. Furthermore, both network structure and association strength changed drastically with habitat disturbance, with the formation of a few new associations but a general trend towards the breaking of associations between species. Our results show that social grouping in birds is real and may be strongly affected by habitat degradation, suggesting that the stability of the associations is threatened by anthropogenic disturbance.
Ospina, Arango Juan David. "Predictive models for side effects following radiotherapy for prostate cancer." Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S046/document.
External beam radiotherapy (EBRT) is one of the cornerstones of prostate cancer treatment. The objectives of radiotherapy are, firstly, to deliver a high dose of radiation to the tumor (prostate and seminal vesicles) in order to achieve a maximal local control and, secondly, to spare the neighboring organs (mainly the rectum and the bladder) to avoid normal tissue complications. Normal tissue complication probability (NTCP) models are then needed to assess the feasibility of the treatment and inform the patient about the risk of side effects, to derive dose-Volume constraints and to compare different treatments. In the context of EBRT, the objectives of this thesis were to find predictors of bladder and rectal complications following treatment; to develop new NTCP models that allow for the integration of both dosimetric and patient parameters; to compare the predictive capabilities of these new models to the classic NTCP models and to develop new methodologies to identify dose patterns correlated to normal complications following EBRT for prostate cancer treatment. A large cohort of patient treated by conformal EBRT for prostate caner under several prospective French clinical trials was used for the study. In a first step, the incidence of the main genitourinary and gastrointestinal symptoms have been described. With another classical approach, namely logistic regression, some predictors of genitourinary and gastrointestinal complications were identified. The logistic regression models were then graphically represented to obtain nomograms, a graphical tool that enables clinicians to rapidly assess the complication risks associated with a treatment and to inform patients. This information can be used by patients and clinicians to select a treatment among several options (e.g. EBRT or radical prostatectomy). In a second step, we proposed the use of random forest, a machine-Learning technique, to predict the risk of complications following EBRT for prostate cancer. The superiority of the random forest NTCP, assessed by the area under the curve (AUC) of the receiving operative characteristic (ROC) curve, was established. In a third step, the 3D dose distribution was studied. A 2D population value decomposition (PVD) technique was extended to a tensorial framework to be applied on 3D volume image analysis. Using this tensorial PVD, a population analysis was carried out to find a pattern of dose possibly correlated to a normal tissue complication following EBRT. Also in the context of 3D image population analysis, a spatio-Temporal nonparametric mixed-Effects model was developed. This model was applied to find an anatomical region where the dose could be correlated to a normal tissue complication following EBRT
Ichard, Cécile. "Random media and processes estimation using non-linear filtering techniques : application to ensemble weather forecast and aircraft trajectories." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30153/document.
Aircraft trajectory prediction error can be explained by different factors. One of them is the weather forecast uncertainties. For example, the wind forecast error has a non negligible impact on the along track accuracy for the predicted aircraft position. From a different perspective, that means that aircrafts can be used as local sensors to estimate the weather forecast error. In this work we describe the estimation problem as several acquisition processes of a same random field. When the field is homogeneous, we prove that they are equivalent to random processes evolving in a random media for which a Feynman-Kac formulation is done. Then we give a particle-based approximation and provide convergence results of the ensuing estimators. When the random field is not homogeneous but can be decomposed in homogeneous sub-domains, a different model is proposed based on the coupling of different acquisition processes. From there, a Feynman-Kac formulation is derived and its particle-based approximation is suggested. Furthermore, we develop an aircraft trajectory prediction model. Finally we demonstrate on a simulation set-up that our algorithms can estimate the wind forecast errors using the aircraft observations delivered along their trajectory
Rusu, Corneliu. "Risk Factors for Suicidal Behaviour Among Canadian Civilians and Military Personnel: A Recursive Partitioning Approach." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37371.
Abud, Luciana de Melo e. "Modelos computacionais prognósticos de lesões traumáticas do plexo braquial em adultos." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-20082018-140641/.
Studies of prognosis refer to the prediction of the course of a disease in patients and are employed by health professionals in order to improve patients\' recovery chances and quality. Under a computational perspective, the creation of a prognostic model is a classification task that aims to identify to which class (within a predefined set of classes) a new sample belongs. The goal of this project is the creation of prognostic models for traumatic injuries of the brachial plexus, a network of nerves that innervates the upper limbs, using data from adult patients with this kind of injury. The data come from the Neurology Institute Deolindo Couto (INDC) of Rio de Janeiro Federal University (UFRJ) and they are characterized by dozens of clinical features that are collected by means of electronic questionnaires. With the use of these prognostic models we intended to automatically identify possible predictors of the course of brachial plexus injuries. Decision trees are classifiers that are frequently used for the creation of prognostic models since they are a transparent technique that produces results that can be clinically examined and interpreted. Random Forests are a technique that uses a set of decision trees to determine the final classification results and can significantly improve model\'s accuracy and generalization, yet they are still not commonly used for the creation of prognostic models. In this project we explored the use of random forests for that purpose, as well as the use of interpretation methods for the resulting models, since model transparency is an important aspect in clinical domains. Model assessment was achieved by means of methods whose application over a small set of samples is suitable, since the available prognostic data refer to only 44 patients from INDC. Additionally, we adapted the random forests technique to include missing data, that are frequent among the data used in this project. Four prognostic models were created - one for each recovery goal, those being absence of pain and satisfactory strength evaluated over shoulder abduction, elbow flexion and external shoulder rotation. The models\' accuracies were estimated between 77% and 88%, calculated through the leave-one-out cross validation method. These models will evolve with the inclusion of new data from new patients that will arrive at the INDC and they will be used as part of a clinical decision support system, with the purpose of prediction of a patient\'s recovery considering his or her clinical characteristics.
Lundström, Love, and Oscar Öhman. "Machine Learning in credit risk : Evaluation of supervised machine learning models predicting credit risk in the financial sector." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-164101.
När banker lånar ut pengar till en annan part uppstår en risk i att låntagaren inte uppfyller sitt antagande mot banken. Denna risk kallas för kredit risk och är den största risken en bank står inför. Enligt Basel föreskrifterna måste en bank avsätta en viss summa kapital för varje lån de ger ut för att på så sätt skydda sig emot framtida finansiella kriser. Denna summa beräknas fram utifrån varje enskilt lån med tillhörande risk-vikt, RWA. De huvudsakliga parametrarna i RWA är sannolikheten att en kund ej kan betala tillbaka lånet samt summan som banken då förlorar. Idag kan banker använda sig av interna modeller för att estimera dessa parametrar. Då bundet kapital medför stora kostnader för banker, försöker de sträva efter att hitta bättre verktyg för att uppskatta sannolikheten att en kund fallerar för att på så sätt minska deras kapitalkrav. Därför har nu banker börjat titta på möjligheten att använda sig av maskininlärningsalgoritmer för att estimera dessa parametrar. Maskininlärningsalgoritmer såsom Logistisk regression, Neurala nätverk, Beslutsträd och Random forest, kan användas för att bestämma kreditrisk. Genom att träna algoritmer på historisk data med kända resultat kan parametern, chansen att en kund ej betalar tillbaka lånet (PD), bestämmas med en högre säkerhet än traditionella metoder. På den givna datan som denna uppsats bygger på visar det sig att Logistisk regression är den algoritm med högst träffsäkerhet att klassificera en kund till rätt kategori. Däremot klassifiserar denna algoritm många kunder som falsk positiv vilket betyder att den predikterar att många kunder kommer betala tillbaka sina lån men i själva verket inte betalar tillbaka lånet. Att göra detta medför en stor kostnad för bankerna. Genom att istället utvärdera modellerna med hjälp av att införa en kostnadsfunktion för att minska detta fel finner vi att Neurala nätverk har den lägsta falsk positiv ration och kommer därmed vara den model som är bäst lämpad att utföra just denna specifika klassifierings uppgift.
Svensson, William. "CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS?" Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447384.
Barth, Danielle. "To HAVE and to BE: Function Word Reduction in Child Speech, Child Directed Speech and Inter-adult Speech." Thesis, University of Oregon, 2016. http://hdl.handle.net/1794/19687.
Liu, Xiaoyang. "Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/93737.
Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions.
Galleguillos, Aguilar Matías. "Desarrollo de un modelo predictivo de deserción de estudiantes de primer año en institución de educación superior." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/170006.
En Chile, durante los últimos 30 años ha habido un crecimiento significativo en el acceso de las personas a la educación superior. Acompañado de este crecimiento se ha visto un aumento en la deserción universitaria, siendo particularmente elevada la de alumnos de primer año. Este problema tiene grandes costos de distinta índole tanto para los alumnos como para las universidades, haciendo que se haya posicionado como una de las métricas más importantes que se utiliza para acreditar a las instituciones. La Universidad de las Américas se ha visto enfrentada a una alta tasa de deserción, traduciéndose en que en el año 2013 haya contribuido de manera importante a la pérdida de su acreditación, por lo que se transformó en tema prioritario a resolver. Por esto se ideó un plan para ayudar a los alumnos con mayor probabilidad de desertar. Actualmente UDLA no posee un sistema automatizado que clasifique a los alumnos en base a análisis de datos de su comportamiento, sólo se cuenta con un sistema de reglas creado en base al conocimiento de deserción de miembros de la universidad, por lo que tiene una alta tasa de errores. En el último estudio publicado por el Servicio de Información de Educación Superior sobre retención de alumnos de primer año, construido con datos de alumnos que ingresaron a estudiar el año 2016, la Universidad de las Américas se ubica en la posición 47 de 58 universidades. Por esto, desarrollar un sistema capaz de identificar a los alumnos que estén en riesgo de desertar sigue siendo un tema prioritario para la institución. El objetivo del presente trabajo es desarrollar un sistema capaz de entregar un índice de riesgo de deserción de cada alumno de primer año. Para esto se propone plantear el proceso de asignar riesgo como un problema de clasificación y afrontarlo con herramientas de inteligencia computacional. Para resolver el problema se dividió el semestre en tramos y se entrenó un modelo para cada uno de éstos. La precisión del primer modelo fue más baja que la de estudios similares que afrontaron el mismo problema en otras universidades del mundo, teniendo un 70,1% de aciertos. El modelo de cada tramo entregó mejores resultados que los del tramo anterior, siendo el del final del semestre el de mejores resultados llegando a un 82,5% de precisión, lo que se asemeja a otros trabajos.
Sun, Wangru. "Modèle de forêts enracinées sur des cycles et modèle de perles via les dimères." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS007/document.
The dimer model, also known as the perfect matching model, is a probabilistic model originally introduced in statistical mechanics. A dimer configuration of a graph is a subset of the edges such that every vertex is incident to exactly one edge of the subset. A weight is assigned to every edge, and the probability of a configuration is proportional to the product of the weights of the edges present. In this thesis we mainly study two related models and in particular their limiting behavior. The first one is the model of cycle-rooted-spanning-forests (CRSF) on tori, which is in bijection with toroidal dimer configurations via Temperley's bijection. This gives rise to a measure on CRSF. In the limit that the size of torus tends to infinity, the CRSF measure tends to an ergodic Gibbs measure on the whole plane. We study the connectivity property of the limiting object, prove that it is determined by the average height change of the limiting ergodic Gibbs measure and give a phase diagram. The second one is the bead model, a random point field on $\mathbb{Z}\times\mathbb{R}$ which can be viewed as a scaling limit of dimer model on a hexagon lattice. We formulate and prove a variational principle similar to that of the dimer model \cite{CKP01}, which states that in the scaling limit, the normalized height function of a uniformly chosen random bead configuration lies in an arbitrarily small neighborhood of a surface $h_0$ that maximizes some functional which we call as entropy. We also prove that the limit shape $h_0$ is a scaling limit of the limit shapes of a properly chosen sequence of dimer models. There is a map form bead configurations to standard tableaux of a (skew) Young diagram, and the map is measure preserving if both sides take uniform measures. The variational principle of the bead model yields the existence of the limit shape of a random standard Young tableau, which generalizes the result of \cite{PR}. We derive also the existence of an arctic curve of a discrete point process that encodes the standard tableaux, raised in \cite{Rom}
Dinger, Steven. "Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1562923541849035.
Zhang, Qing Frankowski Ralph. "An empirical evaluation of the random forests classifier models for variable selection in a large-scale lung cancer case-control study /." See options below, 2006. http://proquest.umi.com/pqdweb?did=1324365481&sid=1&Fmt=2&clientId=68716&RQT=309&VName=PQD.
Palczewska, Anna Maria. "Interpretation, Identification and Reuse of Models. Theory and algorithms with applications in predictive toxicology." Thesis, University of Bradford, 2014. http://hdl.handle.net/10454/7349.
Lanka, Venkata Raghava Ravi Teja Lanka. "VEHICLE RESPONSE PREDICTION USING PHYSICAL AND MACHINE LEARNING MODELS." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1511891682062084.
Appelquist, Niklas, and Emelia Karlsson. "Kan en bättre prediktion uppnås genom en kategorispecifik modell? : Teknologiprojekt på Kickstarter och maskininlärning." Thesis, Uppsala universitet, Institutionen för informatik och media, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413736.
Crowdfunding is used to collect money via internet for potential projects through a large number of backers which contribute with small pledges. Kickstarter is one of the largest crowdfunding platforms today. Despite the big interest in crowdfunding a lot of launched campaigns fail to reach their goal and projects of the category technology shows the largest rate of failure on Kickstarter. Therefore, it is important to be able to predict which campaigns are likely to succeed or fail. This thesis aims to explore the possibility of reaching a higher accuracy when predicting the success of launched projects with machine learning with a smaller amount of category-specific data. The data consists om 192 548 launched projects on Kickstarter and has been collected through Kaggle.com. Two models of the type Random Forest has been developed where one model has been trained with general data over all projects and one model has been trained with category specific data over technology projects. The results show that the technology model show a higher accuracy rate with 68,37 % compared to the reference model with 68,00 %.
Raynal, Louis. "Bayesian statistical inference for intractable likelihood models." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS035/document.
In a statistical inferential process, when the calculation of the likelihood function is not possible, approximations need to be used. This is a fairly common case in some application fields, especially for population genetics models. Toward this issue, we are interested in approximate Bayesian computation (ABC) methods. These are solely based on simulated data, which are then summarised and compared to the observed ones. The comparisons are performed depending on a distance, a similarity threshold and a set of low dimensional summary statistics, which must be carefully chosen.In a parameter inference framework, we propose an approach combining ABC simulations and the random forest machine learning algorithm. We use different strategies depending on the parameter posterior quantity we would like to approximate. Our proposal avoids the usual ABC difficulties in terms of tuning, while providing good results and interpretation tools for practitioners. In addition, we introduce posterior measures of error (i.e., conditionally on the observed data of interest) computed by means of forests. In a model choice setting, we present a strategy based on groups of models to determine, in population genetics, which events of an evolutionary scenario are more or less well identified. All these approaches are implemented in the R package abcrf. In addition, we investigate how to build local random forests, taking into account the observation to predict during their learning phase to improve the prediction accuracy. Finally, using our previous developments, we present two case studies dealing with the reconstruction of the evolutionary history of Pygmy populations, as well as of two subspecies of the desert locust Schistocerca gregaria
Olofsson, Nina. "A Machine Learning Ensemble Approach to Churn Prediction : Developing and Comparing Local Explanation Models on Top of a Black-Box Classifier." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210565.
Metoder för att prediktera utträde är vanliga inom Customer Relationship Management och har visat sig vara värdefulla när det kommer till att behålla kunder. För att kunna prediktera utträde med så hög säkerhet som möjligt har den senasteforskningen fokuserat på alltmer komplexa maskininlärningsmodeller, såsom ensembler och hybridmodeller. En konsekvens av att ha alltmer komplexa modellerär dock att det blir svårare och svårare att förstå hur en viss modell har kommitfram till ett visst beslut. Tidigare studier inom maskininlärningsinterpretering har haft ett globalt perspektiv för att förklara svårförståeliga modeller. Denna studieutforskar lokala förklaringsmodeller för att förklara individuella beslut av en ensemblemodell känd som 'Random Forest'. Prediktionen av utträde studeras påanvändarna av Tink – en finansapp. Syftet med denna studie är att ta lokala förklaringsmodeller ett steg längre genomatt göra jämförelser av indikatorer för utträde mellan olika användargrupper. Totalt undersöktes tre par av grupper som påvisade skillnader i tre olika variabler. Sedan användes lokala förklaringsmodeller till att beräkna hur viktiga alla globaltfunna indikatorer för utträde var för respektive grupp. Resultaten visade att detinte fanns några signifikanta skillnader mellan grupperna gällande huvudindikatorerna för utträde. Istället visade resultaten skillnader i mindre viktiga indikatorer som hade att göra med den typ av information som lagras av användarna i appen. Förutom att undersöka skillnader i indikatorer för utträde resulterade dennastudie i en välfungerande modell för att prediktera utträde med förmågan attförklara individuella beslut. Random Forest-modellen visade sig vara signifikantbättre än ett antal enklare modeller, med ett AUC-värde på 0.93.
Duroux, Roxane. "Inférence pour les modèles statistiques mal spécifiés, application à une étude sur les facteurs pronostiques dans le cancer du sein." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066224/document.
The thesis focuses on inference of statistical misspecified models. Every result finds its application in a prognostic factors study for breast cancer, thanks to the data collection of Institut Curie. We consider first non-proportional hazards models, and make use of the marginal survival of the failure time. This model allows a time-varying regression coefficient, and therefore generalizes the proportional hazards model. On a second time, we study step regression models. We propose an inference method for the changepoint of a two-step regression model, and an estimation method for a multiple-step regression model. Then, we study the influence of the subsampling rate on the performance of median forests and try to extend the results to random survival forests through an application. Finally, we present a new dose-finding method for phase I clinical trials, in case of partial ordering
Geylan, Gökçe. "Training Machine Learning-based QSAR models with Conformal Prediction on Experimental Data from DNA-Encoded Chemical Libraries." Thesis, Uppsala universitet, Institutionen för farmaceutisk biovetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447354.
Zhang, Yi. "Strategies for Combining Tree-Based Ensemble Models." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1021.
Tuulaikhuu, Baigal-Amar. "Influences of toxicants on freshwater biofilms and fish: from experimental approaches to statistical models." Doctoral thesis, Universitat de Girona, 2016. http://hdl.handle.net/10803/392157.
Los principales objetivos de esta tesis doctoral son: i) evaluar la toxicidad del arsénico en dos elementos clave que interactúan en el ecosistema acuático, biofilm y peces, para proporcionar información sobre los efectos de niveles de contaminación realistas a nivel ambiental y sus interacciones con otros factores que modulan la toxicidad, tales como el reciclado de los nutrientes; y ii) clasificar predictores de toxicidad para los peces y cuantificar las diferencias de sensibilidad entre las especies. Nuestros resultados ponen de manifiesto el interés y la aplicación de la incorporación de algunas de las complejidades de los sistemas naturales en ecotoxicología y destaca que el criterio actual de concentración continua para el arsénico debe ser actualizada. Se examinaron los factores que mejor predicen la toxicidad en un conjunto amplio de peces utilizando la técnica denominada “Random Forests” y se evaluó la importancia de la sensibilidad diferencial entre las especies de peces utilizando el análisis de la covarianza. Nuestro resultado indica que se debe tener precaución al extrapolar los resultados toxicológicos ya que las especies de peces difieren en la sensibilidad y responden de manera diferente a diferentes productos químicos.
Dunja, Vrbaški. "Primena mašinskog učenja u problemu nedostajućih podataka pri razvoju prediktivnih modela." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2020. https://www.cris.uns.ac.rs/record.jsf?recordId=114270&source=NDLTD&language=en.
The problem of missing data is often present when developing predictivemodels. Instead of removing data containing missing values, methods forimputation can be applied. The dissertation proposes a methodology foranalysis of imputation performance in the development of predictive models.Based on the proposed methodology, results of the application of machinelearning algorithms, as an imputation method in the development of specificmodels, are presented.
Brüls, Maxim. "FAULT DETECTION FOR SMALL-SCALE PHOTOVOLTAIC POWER INSTALLATIONS : A Case Study of a Residential Solar Power System." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-35965.
Chalupa, Daniel. "Rozšiřující modul platformy 3D Slicer pro segmentaci tomografických obrazů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-316852.
Mercadier, Mathieu. "Banking risk indicators, machine learning and one-sided concentration inequalities." Thesis, Limoges, 2020. http://aurore.unilim.fr/theses/nxfile/default/a5bdd121-a1a2-434e-b7f9-598508c52104/blobholder:0/2020LIMO0001.pdf.
This doctoral thesis is a collection of three essays aiming to implement, and if necessary to improve, financial risk measures and to assess banking risks, using machine learning methods. The first chapter offers an elementary formula inspired by CreditGrades, called E2C, estimating CDS spreads, whose accuracy is improved by a random forest algorithm. Our results emphasize the E2C's key role and the additional contribution of a specific company's debt rating and size. The second chapter infers a one-sided version of the inequality bounding the probability of a unimodal random variable. Our results show that the unimodal assumption for stock returns is generally accepted, allowing us to refine individual risk measures' bounds, to discuss implications for tail risk multipliers, and to infer simple versions of bounds of systemic measures. The third chapter provides a decision support tool clustering listed banks depending on their riskiness using an adjusted version of the k-means algorithm. This entirely automatic process is based on a very large set of stand-alone and systemic risk indicators reduced to representative factors. The obtained results are aggregated per country and region, offering the opportunity to study zones of fragility. They underline the importance of paying a particular attention to the ambiguous impact of banks' size on systemic measures
Al, Tobi Amjad Mohamed. "Anomaly-based network intrusion detection enhancement by prediction threshold adaptation of binary classification models." Thesis, University of St Andrews, 2018. http://hdl.handle.net/10023/17050.
Victors, Mason Lemoyne. "A Classification Tool for Predictive Data Analysis in Healthcare." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/5639.
Olaya, Marín Esther Julia. "Ecological models at fish community and species level to support effective river restoration." Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/28853.
Olaya Marín, EJ. (2013). Ecological models at fish community and species level to support effective river restoration [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/28853
TESIS
Jobe, Ndey Isatou. "Nonlinearity In Exchange Rates : Evidence From African Economies." Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-297055.
Fouemkeu, Norbert. "Modélisation de l’incertitude sur les trajectoires d’avions." Thesis, Lyon 1, 2010. http://www.theses.fr/2010LYO10217/document.
In this thesis we propose probabilistic and statistic models based on multidimensional data for forecasting uncertainty on aircraft trajectories. Assuming that during the flight, aircraft follows his 3D trajectory contained into his initial flight plan, we used all characteristics of flight environment as predictors to explain the crossing time of aircraft at given points on their planned trajectory. These characteristics are: weather and atmospheric conditions, flight current parameters, information contained into the flight plans and the air traffic complexity. Typically, in this study, the dependent variable is difference between actual time observed during flight and planned time to cross trajectory planned points: this variable is called temporal difference. We built four models using method based on partitioning recursive of the sample. The first called classical CART is based on Breiman CART method. Here, we use regression trees to build points typology of aircraft trajectories based on previous characteristics and to forecast crossing time of aircrafts on these points. The second model called amended CART is the previous model improved. This latter is built by replacing forecasting estimated by the mean of dependent variable inside the terminal nodes of classical CART by new forecasting given by multiple regression inside these nodes. This new model developed using Stepwise algorithm is parcimonious because for each terminal node it permits to explain the flight time by the most relevant predictors inside the node. The third model is built based on MARS (Multivariate adaptive regression splines) method. Besides continuity of the dependent variable estimator, this model allows to assess the direct and interaction effects of the explanatory variables on the crossing time on flight trajectory points. The fourth model uses boostrap sampling method. It’s random forests where for each bootstrap sample from the initial data, a tree regression model is built like in CART method. The general model forecasting is obtained by aggregating forecasting on the set of trees. Despite the overfitting observed on this model, it is robust and constitutes a solution against instability problem concerning regression trees obtained from CART method. The models we built have been assessed and validated using data test. Their using to compute the sector load forecasting in term to aircraft count entering the sector shown that, the forecast time horizon about 20 minutes with the interval time larger than 20 minutes, allowed to obtain forecasting with relative errors less than 10%. Among all these models, classical CART and random forests are more powerful. Hence, for regulator authority these models can be a very good help for managing the sector load of the airspace controlled
Laqrichi, Safae. "Approche pour la construction de modèles d'estimation réaliste de l'effort/coût de projet dans un environnement incertain : application au domaine du développement logiciel." Thesis, Ecole nationale des Mines d'Albi-Carmaux, 2015. http://www.theses.fr/2015EMAC0013/document.
Software effort estimation is one of the most important tasks in the management of software projects. It is the basis for planning, control and decision making. Achieving reliable estimates in projects upstream phases is a complex and difficult activity because, among others, of the lack of information about the project and its future, the rapid changes in the methods and technologies related to the software field and the lack of experience with similar projects. Many estimation models exist, but it is difficult to identify a successful model for all types of projects and that is applicable to all companies (different levels of experience, mastered technologies and project management practices). Overall, all of these models form the strong assumption that (1) the data collected are complete and sufficient, (2) laws linking the parameters characterizing the projects are fully identifiable and (3) information on the new project are certain and deterministic. However, in reality on the ground, that is difficult to be ensured.Two problems then emerge from these observations: how to select an estimation model for a specific company ? and how to conduct an estimate for a new project that presents uncertainties ?The work of this thesis interested in answering these questions by proposing a general estimation framework. This framework covers two phases: the construction phase of the estimation system and system usage phase for estimating new projects. The construction phase of the rating system consists of two processes: 1) evaluation and reliable comparison of different estimation models then selection the most suitable estimation model, 2) construction of a realistic estimation system from the selected estimation model and 3) use of the estimation system in estimating effort of new projects that are characterized by uncertainties. This approach acts as an aid to decision making for project managers in supporting the realistic estimate of effort, cost and time of their software projects. The implementation of all processes and practices developed as part of this work has given rise to an open-source computer prototype. The results of this thesis fall in the context of ProjEstimate FUI13 project
Ekeberg, Lukas, and Alexander Fahnehjelm. "Maskininlärning som verktyg för att extrahera information om attribut kring bostadsannonser i syfte att maximera försäljningspris." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240401.
Den svenska bostadsmarknaden har blivit alltmer digitaliserad under det senaste årtiondet med nuvarande praxis att säljaren publicerar sin bostadsannons online. En fråga som uppstår är hur en säljare kan optimera sin annons för att maximera budpremie. Denna studie analyserar tre maskininlärningsmetoder för att lösa detta problem: Linear Regression, Decision Tree Regressor och Random Forest Regressor. Syftet är att utvinna information om de signifikanta attribut som påverkar budpremien. Det dataset som använts innehåller lägenheter som såldes under åren 2014-2018 i Stockholmsområdet Östermalm / Djurgården. Modellerna som togs fram uppnådde ett R²-värde på approximativt 0.26 och Mean Absolute Error på approximativt 0.06. Signifikant information kunde extraheras from modellerna trots att de inte var exakta i att förutspå budpremien. Sammanfattningsvis skapar ett stort antal visningar och en publicering i april de bästa förutsättningarna för att uppnå en hög budpremie. Säljaren ska försöka hålla antal dagar sedan publicering under 15.5 dagar och undvika att publicera på tisdagar.
Romão, Joana Mendonça Vasconcelos. "Modelos para estimar taxas de retenção de clientes : aplicação a uma carteira de seguro automóvel." Master's thesis, Instituto Superior de Economia e Gestão, 2019. http://hdl.handle.net/10400.5/19740.
O acesso à informação tem-se tornado cada vez mais fácil. A comparação entre condições tarifárias de diferentes seguradoras é hoje mais frequente, com efeito nas taxas de retenção de clientes e respetivos contratos de seguro. A importância que é dada a este tema é cada vez maior e a construção de ferramentas para estimar as referidas taxas permite tomar medidas para a retenção de negócio rentável e o agravamento dos prémios de contratos menos rentáveis. Este trabalho teve como objetivo estimar a probabilidade de retenção à data de vencimento de uma apólice de seguro, numa carteira do ramo automóvel. Verificado o problema de desequilíbrio entre as classes da variável resposta, a escolha das metodologias a usar baseou-se essencialmente na procura de aumentar a exatidão do modelo final e contornar esse problema.
With an increasingly easy accessibility to information, there is a growing concern about customer retention rates. Insurers are giving more importance on having accurate tools to monitor the policies renewal process, making them allowed to keep with the profitable business and increase premiums on the less profitable one. The objective of this study was to estimate the probability of renewing a policy in a motor insurance portfolio. To be working with an imbalance data set made us try different modelling methodologies, where all of them were chosen based on the need to increase the predictive performance of the model.
info:eu-repo/semantics/publishedVersion
Jouganous, Julien. "Modélisation et simulation de la croissance de métastases pulmonaires." Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0154/document.
This thesis deals with mathematical modeling and simulation of lung metastases growth.We first present a partial differential equations model to simulate the growth and possibly the response to some types of treatments of metastases to the lung. This model must be personalized to be used individually on clinical cases. Consequently, we developed a calibration technic based on medical images of the tumor. Several applications on clinical cases are presented.Then we introduce a simplification of the first model and the calibration algorithm. This new method, more robust, is tested on 36 clinical cases. The results are presented in the third chapter. To finish, a machine learning algorithm
Taillardat, Maxime. "Méthodes Non-Paramétriques de Post-Traitement des Prévisions d'Ensemble." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLV072/document.
In numerical weather prediction, ensemble forecasts systems have become an essential tool to quantifyforecast uncertainty and to provide probabilistic forecasts. Unfortunately, these models are not perfect and a simultaneouscorrection of their bias and their dispersion is needed.This thesis presents new statistical post-processing methods for ensemble forecasting. These are based onrandom forests algorithms, which are non-parametric.Contrary to state of the art procedures, random forests can take into account non-linear features of atmospheric states. They easily allowthe addition of covariables (such as other weather variables, seasonal or geographic predictors) by a self-selection of the mostuseful predictors for the regression. Moreover, we do not make assumptions on the distribution of the variable of interest. This new approachoutperforms the existing methods for variables such as surface temperature and wind speed.For variables well-known to be tricky to calibrate, such as six-hours accumulated rainfall, hybrid versions of our techniqueshave been created. We show that these versions (and our original methods) are better than existing ones. Especially, they provideadded value for extreme precipitations.The last part of this thesis deals with the verification of ensemble forecasts for extreme events. We have shown several properties ofthe Continuous Ranked Probability Score (CRPS) for extreme values. We have also defined a new index combining the CRPS and the extremevalue theory, whose consistency is investigated on both simulations and real cases.The contributions of this work are intended to be inserted into the forecasting and verification chain at Météo-France
Le, Faou Yohann. "Contributions à la modélisation des données de durée en présence de censure : application à l'étude des résiliations de contrats d'assurance santé." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS527.
In this thesis, we study duration models in the context of the analysis of contract termination time in health insurance. Identified from the 17th century and the original work of Graunt J. (1662) on mortality, the bias induced by the censoring of duration data observed in this context must be corrected by the statistical models used. Through the problem of the measure of dependence between successives durations, and the problem of the prediction of contract termination time in insurance, we study the theoretical and practical properties of different estimators that rely on a proper weighting of the observations (the so called IPCW method) designed to compensate this bias. The application of these methods to customer value estimation is also carefully discussed