Dissertations / Theses on the topic 'XGBOOST MODEL'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 18 dissertations / theses for your research on the topic 'XGBOOST MODEL.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Matos, Sara Madeira. "Interpretable models of loss given default." Master's thesis, Instituto Superior de Economia e Gestão, 2021. http://hdl.handle.net/10400.5/20981.
Full textA gestão do risco de crédito é uma área em que os reguladores esperam que os bancos adotem modelos de risco transparentes e auditáveis colocando de parte o uso de modelos de black-box apesar destes serem mais precisos. Neste estudo, mostramos que os bancos não precisam de sacrificar a precisão preditiva ao custo da transparência do modelo para estar em conformidade com os requisitos regulatórios. Ilustramos isso mostrando que as previsões de perdas de crédito fornecidas por um modelo black-box podem ser facilmente explicadas em termos dos seus inputs.
Credit risk management is an area where regulators expect banks to have transparent and auditable risk models, which would preclude the use of more accurate black-box models. Furthermore, the opaqueness of these models may hide unknown biases that may lead to unfair lending decisions. In this study, we show that banks do not have to sacrifice predictive accuracy at the cost of model transparency to be compliant with regulatory requirements. We illustrate this by showing that the predictions of credit losses given by a black-box model can be easily explained in terms of their inputs. Because black-box models fit better the data, banks should consider the determinants of credit losses suggested by these models in lending decisions and pricing of credit exposures.
info:eu-repo/semantics/publishedVersion
Wigren, Richard, and Filip Cornell. "Marketing Mix Modelling: A comparative study of statistical models." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160082.
Full textPettersson, Gustav, and John Almqvist. "Lavinprognoser och maskininlärning : Att prediktera lavinprognoser med maskininlärning och väderdata." Thesis, Uppsala universitet, Institutionen för informatik och media, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387205.
Full textThis research project examines the feasibility of using machine learning to predict avalanche dangerby usingXGBoostand openly available weather data. Avalanche forecasts and meterological modelledweather data have been gathered for the six areas in Sweden where Naturvårdsverket throughlavin-prognoser.seissues avalanche forecasts. The avanlanche forecasts are collected fromlavinprognoser.seand the modelled weather data is collected from theMESANmodel, which is produced and providedby the Swedish Meteorological and Hydrological Institute. 40 machine learning models, in the form ofXGBoost, have been trained on this data set, with the goal of assessing the main aspects of an avalan-che forecast and the overall avalanche danger. The results show it is possible to predict the day to dayavalanche danger for the 2018/19 season inSödra Jämtlandsfjällenwith an accuracy of 71% and a MeanAverage Error of 0.256, by applying machine learning to the weather data for that region. The contribu-tion ofXGBoostin this context, is demonstrated by applying the simpler method ofLogistic Regressionon the data set and comparing the results. Thelogistic regressionperforms worse with an accuracy of56% and a Mean Average Error of 0.459. The contribution of this research is a proof of concept, showingfeasibility in predicting avalanche danger in Sweden, with the help of machine learning and weather data.
Karlsson, Henrik. "Uplift Modeling : Identifying Optimal Treatment Group Allocation and Whom to Contact to Maximize Return on Investment." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157962.
Full textHenriksson, Erik, and Kristopher Werlinder. "Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.
Full textMålet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.
Kinnander, Mathias. "Predicting profitability of new customers using gradient boosting tree models : Evaluating the predictive capabilities of the XGBoost, LightGBM and CatBoost algorithms." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19171.
Full textSvensson, William. "CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS?" Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447384.
Full textLiu, Xiaoyang. "Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/93737.
Full textMachine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions.
Sharma, Vibhor. "Early Stratification of Gestational Diabetes Mellitus (GDM) by building and evaluating machine learning models." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281398.
Full textGraviditetsdiabetes Mellitus (GDM), ett tillstånd som involverar onormala ni- våer av glukos i blodplasma har haft en snabb kraftig ökning bland de drab- bade mammorna som tillhör olika regioner och etniciteter runt om i världen. Den nuvarande metoden för screening och diagnos av GDM är begränsad till Oralt glukosetoleranstest (OGTT). Med tillkomsten av maskininlärningsalgo- ritmer har hälso- och sjukvården sett en ökning av maskininlärningsmetoder för sjukdomsdiagnos som alltmer används i en klinisk installation. Ändå inom GDM-området har det inte använts stor spridning av dessa algoritmer för att generera multiparametriska diagnostiska modeller för att hjälpa klinikerna för ovannämnda tillståndsdiagnos.I litteraturen finns det en uppenbar brist på tillämpning av maskininlär- ningsalgoritmer för GDM-diagnosen. Det har begränsats till den föreslagna användningen av några mycket enkla algoritmer som logistisk regression. Där- för har vi försökt att ta itu med detta forskningsgap genom att använda ett brett spektrum av maskininlärningsalgoritmer, kända för att vara effektiva för binär klassificering, för GDM-klassificering tidigt bland gesterande mamma. Det- ta kan hjälpa klinikerna för tidig diagnos av GDM och kommer att erbjuda chanser att mildra de negativa utfallen relaterade till GDM bland de dödande mamma och deras avkommor.Vi inrättade en empirisk studie för att undersöka prestandan för olika ma- skininlärningsalgoritmer som används specifikt för uppgiften att klassificera GDM. Dessa algoritmer tränades på en uppsättning valda prediktorvariabler av experterna. Jämfört sedan resultaten med de befintliga maskininlärnings- metoderna i litteraturen för GDM-klassificering baserat på en uppsättning pre- standametriker. Vår modell kunde inte överträffa de redan föreslagna maskininlärningsmodellerna för GDM-klassificering. Vi kunde tillskriva den valda uppsättningen prediktorvariabler och underrapportering av olika prestanda- metriker som precision i befintlig litteratur vilket leder till brist på informerad jämförelse.
Gregório, Rafael Leite. "Modelo híbrido de avaliação de risco de crédito para corporações brasileiras com base em algoritmos de aprendizado de máquina." Universidade Católica de Brasília, 2018. https://bdtd.ucb.br:8443/jspui/handle/tede/2432.
Full textApproved for entry into archive by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:24Z (GMT) No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5)
Made available in DSpace on 2018-08-08T13:33:24Z (GMT). No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) Previous issue date: 2018-07-09
The credit risk assessment has a relevant role for financial institutions because it is associated with possible losses and has a large impact on the balance sheets. Although there are several researches on applications of machine learning and finance models, a study is still lacking that integrates available knowledge about credit risk assessment. This paper aims at specifying the machine learning model of the probability of default of publicly traded companies present in the Bovespa Index (corporations) and, based on the estimations of the model, to obtain risk assessment metrics based on risk letters. We converged methodologies verified in the literature and we estimated models that comprise fundamentalist (balance sheet) and governance data, macroeconomic and even variables resulting from the application of the proprietary model of KMV credit risk assessment. We test the XGboost and LinearSVM algorithms, which have very different characteristics among them, but are potentially useful to the problem. Parameter Grids were performed to identify the most representative variables and to specify the best performing model. The model selected was XGboost, and performance was very similar to the results obtained for the North American stock market in analogous research. The estimated credit ratings suggest that they are more sensitive to the economic and financial situation of the companies than that verified by traditional Rating Agencies.
A avaliação do risco de crédito tem papel relevante para as instituições financeiras por estar associada a possíveis perdas que podem gerar grande impacto nos balanços. Embora existam várias pesquisas sobre aplicações de modelos de aprendizado de máquina e finanças, ainda não há estudo que integre o conhecimento disponível sobre avaliação de risco de crédito. Este trabalho visa especificar modelo de aprendizado de máquina da probabilidade de descumprimento de empresas de capital aberto presentes no Índice Bovespa (corporações) e, fruto das estimações do modelo, obter métrica de avaliação de risco baseada em letras (ratings) de risco. Convergiu-se metodologias verificadas na literatura e estimou-se modelos que compreendem componentes fundamentalistas (de balanço) e de governança corporativa, macroeconômicos e ainda variáveis produto da aplicação do modelo proprietário de avaliação de risco de crédito KMV. Testou-se os algoritmos XGboost e LinearSVM, os quais possuem características bastante distintas entre si, mas são potencialmente úteis ao problema exposto. Foram realizados Grids de parâmetros para identificação das variáveis mais representativas e para a especificação do modelo com melhor desempenho. O modelo selecionado foi o XGboost, tendo sido observado desempenho bastante semelhante aos resultados obtidos para o mercado de ações norte-americano em pesquisa análoga. Os ratings de crédito estimados mostram-se mais sensíveis à situação econômico-financeira das empresas ante o verificado por agências de rating tradicionais.
D'AMATO, VINCENZO STEFANO. "Deep Multi Temporal Scale Networks for Human Motion Analysis." Doctoral thesis, Università degli studi di Genova, 2023. https://hdl.handle.net/11567/1104759.
Full textLiu, Hsin-Yu, and 劉欣諭. "Constructing the Conservative Equity Portfolio by the XGBoost Model." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6dx4h9.
Full text國立中山大學
財務管理學系研究所
107
This study uses the data of Taiwan-listed companies from 1997 to 2018 and applying the conservative investment formula proposed by Blitz and Vliet (2018) in the Taiwan market, using three simple factors, low volatility, high dividends and positive momentum to form a portfolio. Then use the machine learning algorithm (XGBoost Model) proposed by Chen and Guestrin (2016), and use the above three factors, but consider the different calculation periods to build a return model. In order to test the effectiveness of the model, we use the original conservative portfolio as the benchmark and adjust the stock weight according to the predicted returns of the model. After applying the model, we not only increase the CAGR by 3% but also reduce the volatility by 1%. Finally, we combine the conservative formula, the return model based on machine learning and the factor weighting approach proposed by Ghayur, Heaney and Platt (2018) to construct a value-added index portfolio, and finally obtain an information ratio of 0.71.
Herrmann, Vojtěch. "Moderní predikční metody pro finanční časové řady." Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-437908.
Full textYeh, Jih-Yang, and 葉日揚. "A Phishing Website Detection Service Mechanism Utilizing XGBoost Classification Model and Key-term Extraction Method." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/pan8hn.
Full text國立臺灣科技大學
資訊工程系
107
This research proposes a phishing website detection mechanism that combines an XGBoost based phishing website classifier and the key-term extraction method. Some pre-processing techniques are also developed to enhance the performance. XGBoost is well known for its high efficiency and accuracy, and the key-term based detection method helps to minimize the false positive rate of the phishing website classification model. The key-term extraction method is based on two observation: Phishers usually try to make phishing websites look similar to their imitation targets, therefore there must be clues, or key terms, behind website related sources that reveal their imitation target; On the other hand, legitimate websites must be ranked high in search engines, so the ranking of search results of key terms serve as a good reference. The main function of this method is to capture the specific target of the phishing website if there has one and correct the legitimate websites that are misclassified. In addition, the proposed mechanism introduces a sliding window technique to reduce training costs, so as to reach the same performance with the smaller training data. The framework proposed in this research uses the data crawling from PhishTank and Alexa, and experiments are conducted after labeling. Without the key-term detection method, the accuracy rate is about 98\%. After enabling the key-term method, the number of the misclassified legitimate website is further reduced such that the accuracy rate raised to 99%.
Salvaire, Pierre Antony Jean Marie. "Explaining the predictions of a boosted tree algorithm : application to credit scoring." Master's thesis, 2019. http://hdl.handle.net/10362/85991.
Full textThe main goal of this report is to contribute to the adoption of complex « Black Box » machine learning models in the field of credit scoring for retail credit. Although numerous investigations have been showing the potential benefits of using complex models, we identified the lack of interpretability as one of the main vector preventing from a full and trustworthy adoption of these new modeling techniques. Intrinsically linked with recent data concerns such as individual rights for explanation, fairness (introduced in the GDPR1) or model reliability, we believe that this kind of research is crucial for easing its adoption among credit risk practitioners. We build a standard Linear Scorecard model along with a more advanced algorithm called Extreme Gradient Boosting (XGBoost) on a retail credit open source dataset. The modeling scenario is a binary classification task consisting in identifying clients that will experienced 90 days past due delinquency state or worse. The interpretation of the Scorecard model is performed using the raw output of the algorithm while more complex data perturbation technique, namely Partial Dependence Plots and Shapley Additive Explanations methods are computed for the XGBoost algorithm. As a result, we observe that the XGBoost algorithm is statistically more performant at distinguishing “bad” from “good” clients. Additionally, we show that the global interpretation of the XGBoost is not as accurate as the Scorecard algorithm. At an individual level however (for each instance of the dataset), we show that the level of interpretability is very similar as they are both able to quantify the contribution of each variable to the predicted risk of a specific application.
KELLER, AISHWARYA. "HYBRID RESAMPLING AND XGBOOST PREDICTION MODEL USING PATIENT'S INFORMATION AND DRAWING AS FEATURES FOR PARKINSON'S DISEASE DETECTION." Thesis, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19442.
Full textKUMAR, SUNIL. "COMBINATORIAL THERAPY FOR TUMOR TREATMENT." Thesis, 2023. http://dspace.dtu.ac.in:8080/jspui/handle/repository/20430.
Full text(5930375), Junhui Wang. "SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNING." Thesis, 2019.
Find full text