Acceder

Bibliografías temáticas / XGBOOST MODEL / Tesis

Siga este enlace para ver otros tipos de publicaciones sobre el tema: XGBOOST MODEL.

Tesis sobre el tema "XGBOOST MODEL"

Autor: Grafiati

Publicado: 16 de enero de 2024

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 18 mejores tesis para su investigación sobre el tema "XGBOOST MODEL".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Matos, Sara Madeira. "Interpretable models of loss given default". Master's thesis, Instituto Superior de Economia e Gestão, 2021. http://hdl.handle.net/10400.5/20981.

Texto completo

Resumen

Mestrado em Econometria Aplicada e Previsão
A gestão do risco de crédito é uma área em que os reguladores esperam que os bancos adotem modelos de risco transparentes e auditáveis colocando de parte o uso de modelos de black-box apesar destes serem mais precisos. Neste estudo, mostramos que os bancos não precisam de sacrificar a precisão preditiva ao custo da transparência do modelo para estar em conformidade com os requisitos regulatórios. Ilustramos isso mostrando que as previsões de perdas de crédito fornecidas por um modelo black-box podem ser facilmente explicadas em termos dos seus inputs.
Credit risk management is an area where regulators expect banks to have transparent and auditable risk models, which would preclude the use of more accurate black-box models. Furthermore, the opaqueness of these models may hide unknown biases that may lead to unfair lending decisions. In this study, we show that banks do not have to sacrifice predictive accuracy at the cost of model transparency to be compliant with regulatory requirements. We illustrate this by showing that the predictions of credit losses given by a black-box model can be easily explained in terms of their inputs. Because black-box models fit better the data, banks should consider the determinants of credit losses suggested by these models in lending decisions and pricing of credit exposures.
info:eu-repo/semantics/publishedVersion

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Wigren, Richard y Filip Cornell. "Marketing Mix Modelling: A comparative study of statistical models". Thesis, Linköpings universitet, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-160082.

Texto completo

Resumen

Deciding the optimal media advertisement spending is a complex issue that many companies today are facing. With the rise of new ways to market products, the choices can appear infinite. One methodical way to do this is to use Marketing Mix Modelling (MMM), in which statistical modelling is used to attribute sales to media spendings. However, many problems arise during the modelling. Modelling and mitigation of uncertainty, time-dependencies of sales, incorporation of expert information and interpretation of models are all issues that need to be addressed. This thesis aims to investigate the effectiveness of eight different statistical and machine learning methods in terms of prediction accuracy and certainty, each one addressing one of the previously mentioned issues. It is concluded that while Shapley Value Regression has the highest certainty in terms of coefficient estimation, it sacrifices some prediction accuracy. The overall highest performing model is the Bayesian hierarchical model, achieving both high prediction accuracy and high certainty.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Pettersson, Gustav y John Almqvist. "Lavinprognoser och maskininlärning : Att prediktera lavinprognoser med maskininlärning och väderdata". Thesis, Uppsala universitet, Institutionen för informatik och media, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-387205.

Texto completo

Resumen

Denna forskningsansats undersöker genomförbarheten i att prediktera lavinfara med hjälp av ma-skininlärning i form avXGBoostoch väderdata. Lavinprognoser och meterologisk vädermodelldata harsamlats in för de sex svenska fjällområden där Naturvårdsveket genomlavinprognoser.sepublicerar lavin-prognoser. Lavinprognoserna har hämtats frånlavinprognoser.seoch den vädermodelldata som användsär hämtad från prognosmodellen MESAN, som produceras och tillhandahålls av Sveriges meteorologiskaoch hydrologiska institut. 40 modeller av typenXGBoosthar sedan tränats på denna datamängd, medsyfte att prediktera olika aspekter av en lavinprognos och den övergripande lavinfaran. Resultaten visaratt det möjligt att prediktera den dagligalavinfaranunder säsongen 2018/19 i Södra Jämtlandsfjällenmed en träffsäkerhet på 71% och enmean average errorpå 0,295, genom att applicera maskininlärningpå väderleken för det området. Värdet avXGBoosti sammanhanget har styrkts genom att jämföradessa resultat med resultaten från den enklare metoden logistisk regression, vilken uppvisade en sämreträffsäkerhet på 56% och enmean average errorpå 0,459. Forskningsansatsens bidrag är ett ”proof ofconcept” som visar på genomförbarheten av att med hjälp av maskininlärning och väderdata predikteralavinprognoser.
This research project examines the feasibility of using machine learning to predict avalanche dangerby usingXGBoostand openly available weather data. Avalanche forecasts and meterological modelledweather data have been gathered for the six areas in Sweden where Naturvårdsverket throughlavin-prognoser.seissues avalanche forecasts. The avanlanche forecasts are collected fromlavinprognoser.seand the modelled weather data is collected from theMESANmodel, which is produced and providedby the Swedish Meteorological and Hydrological Institute. 40 machine learning models, in the form ofXGBoost, have been trained on this data set, with the goal of assessing the main aspects of an avalan-che forecast and the overall avalanche danger. The results show it is possible to predict the day to dayavalanche danger for the 2018/19 season inSödra Jämtlandsfjällenwith an accuracy of 71% and a MeanAverage Error of 0.256, by applying machine learning to the weather data for that region. The contribu-tion ofXGBoostin this context, is demonstrated by applying the simpler method ofLogistic Regressionon the data set and comparing the results. Thelogistic regressionperforms worse with an accuracy of56% and a Mean Average Error of 0.459. The contribution of this research is a proof of concept, showingfeasibility in predicting avalanche danger in Sweden, with the help of machine learning and weather data.

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Karlsson, Henrik. "Uplift Modeling : Identifying Optimal Treatment Group Allocation and Whom to Contact to Maximize Return on Investment". Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157962.

Texto completo

Resumen

This report investigates the possibilities to model the causal effect of treatment within the insurance domain to increase return on investment of sales through telemarketing. In order to capture the causal effect, two or more subgroups are required where one group receives control treatment. Two different uplift models model the causal effect of treatment, Class Transformation Method, and Modeling Uplift Directly with Random Forests. Both methods are evaluated by the Qini curve and the Qini coefficient. To model the causal effect of treatment, the comparison with a control group is a necessity. The report attempts to find the optimal treatment group allocation in order to maximize the precision in the difference between the treatment group and the control group. Further, the report provides a rule of thumb that ensure that the control group is of sufficient size to be able to model the causal effect. If has provided the data material used to model uplift and it consists of approximately 630000 customer interactions and 60 features. The total uplift in the data set, the difference in purchase rate between the treatment group and control group, is approximately 3%. Uplift by random forest with a Euclidean distance splitting criterion that tries to maximize the distributional divergence between treatment group and control group performs best, which captures 15% of the theoretical best model. The same model manages to capture 77% of the total amount of purchases in the treatment group by only giving treatment to half of the treatment group. With the purchase rates in the data set, the optimal treatment group allocation is approximately 58%-70%, but the study could be performed with as much as approximately 97%treatment group allocation.

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Henriksson, Erik y Kristopher Werlinder. "Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.

Texto completo

Resumen

The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data.
Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Kinnander, Mathias. "Predicting profitability of new customers using gradient boosting tree models : Evaluating the predictive capabilities of the XGBoost, LightGBM and CatBoost algorithms". Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-19171.

Texto completo

Resumen

In the context of providing credit online to customers in retail shops, the provider must perform risk assessments quickly and often based on scarce historical data. This can be achieved by automating the process with Machine Learning algorithms. Gradient Boosting Tree algorithms have demonstrated to be capable in a wide range of application scenarios. However, they are yet to be implemented for predicting the profitability of new customers based solely on the customers’ first purchases. This study aims to evaluate the predictive performance of the XGBoost, LightGBM, and CatBoost algorithms in this context. The Recall and Precision metrics were used as the basis for assessing the models’ performance. The experiment implemented for this study shows that the model displays similar capabilities while also being biased towards the majority class.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Svensson, William. "CAN STATISTICAL MODELS BEAT BENCHMARK PREDICTIONS BASED ON RANKINGS IN TENNIS?" Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447384.

Texto completo

Resumen

The aim of this thesis is to beat a benchmark prediction of 64.58 percent based on player rankings on the ATP tour in tennis. That means that the player with the best rank in a tennis match is deemed as the winner. Three statistical model are used, logistic regression, random forest and XGBoost. The data are over a period between the years 2000-2010 and has over 60 000 observations with 49 variables each. After the data was prepared, new variables were created and the difference between the two players in hand taken all three statistical models did outperform the benchmark prediction. All three variables had an accuracy around 66 percent with the logistic regression performing the best with an accuracy of 66.45 percent. The most important variable overall for the models is the total win rate on different surfaces, the total win rate and rank.

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Liu, Xiaoyang. "Machine Learning Models in Fullerene/Metallofullerene Chromatography Studies". Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/93737.

Texto completo

Resumen

Machine learning methods are now extensively applied in various scientific research areas to make models. Unlike regular models, machine learning based models use a data-driven approach. Machine learning algorithms can learn knowledge that are hard to be recognized, from available data. The data-driven approaches enhance the role of algorithms and computers and then accelerate the computation using alternative views. In this thesis, we explore the possibility of applying machine learning models in the prediction of chromatographic retention behaviors. Chromatographic separation is a key technique for the discovery and analysis of fullerenes. In previous studies, differential equation models have achieved great success in predictions of chromatographic retentions. However, most of the differential equation models require experimental measurements or theoretical computations for many parameters, which are not easy to obtain. Fullerenes/metallofullerenes are rigid and spherical molecules with only carbon atoms, which makes the predictions of chromatographic retention behaviors as well as other properties much simpler than other flexible molecules that have more variations on conformations. In this thesis, I propose the polarizability of a fullerene molecule is able to be estimated directly from the structures. Structural motifs are used to simplify the model and the models with motifs provide satisfying predictions. The data set contains 31947 isomers and their polarizability data and is split into a training set with 90% data points and a complementary testing set. In addition, a second testing set of large fullerene isomers is also prepared and it is used to testing whether a model can be trained by small fullerenes and then gives ideal predictions on large fullerenes.
Machine learning models are capable to be applied in a wide range of areas, such as scientific research. In this thesis, machine learning models are applied to predict chromatography behaviors of fullerenes based on the molecular structures. Chromatography is a common technique for mixture separations, and the separation is because of the difference of interactions between molecules and a stationary phase. In real experiments, a mixture usually contains a large family of different compounds and it requires lots of work and resources to figure out the target compound. Therefore, models are extremely import for studies of chromatography. Traditional models are built based on physics rules, and involves several parameters. The physics parameters are measured by experiments or theoretically computed. However, both of them are time consuming and not easy to be conducted. For fullerenes, in my previous studies, it has been shown that the chromatography model can be simplified and only one parameter, polarizability, is required. A machine learning approach is introduced to enhance the model by predicting the molecular polarizabilities of fullerenes based on structures. The structure of a fullerene is represented by several local structures. Several types of machine learning models are built and tested on our data set and the result shows neural network gives the best predictions.

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Sharma, Vibhor. "Early Stratification of Gestational Diabetes Mellitus (GDM) by building and evaluating machine learning models". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281398.

Texto completo

Resumen

Gestational diabetes Mellitus (GDM), a condition involving abnormal levels of glucose in the blood plasma has seen a rapid surge amongst the gestating mothers belonging to different regions and ethnicities around the world. Cur- rent method of screening and diagnosing GDM is restricted to Oral Glucose Tolerance Test (OGTT). With the advent of machine learning algorithms, the healthcare has seen a surge of machine learning methods for disease diag- nosis which are increasingly being employed in a clinical setup. Yet in the area of GDM, there has not been wide spread utilization of these algorithms to generate multi-parametric diagnostic models to aid the clinicians for the aforementioned condition diagnosis.In literature, there is an evident scarcity of application of machine learn- ing algorithms for the GDM diagnosis. It has been limited to the proposed use of some very simple algorithms like logistic regression. Hence, we have attempted to address this research gap by employing a wide-array of machine learning algorithms, known to be effective for binary classification, for GDM classification early on amongst gestating mother. This can aid the clinicians for early diagnosis of GDM and will offer chances to mitigate the adverse out- comes related to GDM among the gestating mother and their progeny.We set up an empirical study to look into the performance of different ma- chine learning algorithms used specifically for the task of GDM classification. These algorithms were trained on a set of chosen predictor variables by the ex- perts. Then compared the results with the existing machine learning methods in the literature for GDM classification based on a set of performance metrics. Our model couldn’t outperform the already proposed machine learning mod- els for GDM classification. We could attribute it to our chosen set of predictor variable and the under reporting of various performance metrics like precision in the existing literature leading to a lack of informed comparison.
Graviditetsdiabetes Mellitus (GDM), ett tillstånd som involverar onormala ni- våer av glukos i blodplasma har haft en snabb kraftig ökning bland de drab- bade mammorna som tillhör olika regioner och etniciteter runt om i världen. Den nuvarande metoden för screening och diagnos av GDM är begränsad till Oralt glukosetoleranstest (OGTT). Med tillkomsten av maskininlärningsalgo- ritmer har hälso- och sjukvården sett en ökning av maskininlärningsmetoder för sjukdomsdiagnos som alltmer används i en klinisk installation. Ändå inom GDM-området har det inte använts stor spridning av dessa algoritmer för att generera multiparametriska diagnostiska modeller för att hjälpa klinikerna för ovannämnda tillståndsdiagnos.I litteraturen finns det en uppenbar brist på tillämpning av maskininlär- ningsalgoritmer för GDM-diagnosen. Det har begränsats till den föreslagna användningen av några mycket enkla algoritmer som logistisk regression. Där- för har vi försökt att ta itu med detta forskningsgap genom att använda ett brett spektrum av maskininlärningsalgoritmer, kända för att vara effektiva för binär klassificering, för GDM-klassificering tidigt bland gesterande mamma. Det- ta kan hjälpa klinikerna för tidig diagnos av GDM och kommer att erbjuda chanser att mildra de negativa utfallen relaterade till GDM bland de dödande mamma och deras avkommor.Vi inrättade en empirisk studie för att undersöka prestandan för olika ma- skininlärningsalgoritmer som används specifikt för uppgiften att klassificera GDM. Dessa algoritmer tränades på en uppsättning valda prediktorvariabler av experterna. Jämfört sedan resultaten med de befintliga maskininlärnings- metoderna i litteraturen för GDM-klassificering baserat på en uppsättning pre- standametriker. Vår modell kunde inte överträffa de redan föreslagna maskininlärningsmodellerna för GDM-klassificering. Vi kunde tillskriva den valda uppsättningen prediktorvariabler och underrapportering av olika prestanda- metriker som precision i befintlig litteratur vilket leder till brist på informerad jämförelse.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Gregório, Rafael Leite. "Modelo híbrido de avaliação de risco de crédito para corporações brasileiras com base em algoritmos de aprendizado de máquina". Universidade Católica de Brasília, 2018. https://bdtd.ucb.br:8443/jspui/handle/tede/2432.

Texto completo

Resumen

Submitted by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:03Z No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5)
Approved for entry into archive by Sara Ribeiro (sara.ribeiro@ucb.br) on 2018-08-08T13:33:24Z (GMT) No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5)
Made available in DSpace on 2018-08-08T13:33:24Z (GMT). No. of bitstreams: 1 RafaelLeiteGregorioDissertacao2018.pdf: 1382550 bytes, checksum: 9c6e4f1d3c561482546aca581262b92b (MD5) Previous issue date: 2018-07-09
The credit risk assessment has a relevant role for financial institutions because it is associated with possible losses and has a large impact on the balance sheets. Although there are several researches on applications of machine learning and finance models, a study is still lacking that integrates available knowledge about credit risk assessment. This paper aims at specifying the machine learning model of the probability of default of publicly traded companies present in the Bovespa Index (corporations) and, based on the estimations of the model, to obtain risk assessment metrics based on risk letters. We converged methodologies verified in the literature and we estimated models that comprise fundamentalist (balance sheet) and governance data, macroeconomic and even variables resulting from the application of the proprietary model of KMV credit risk assessment. We test the XGboost and LinearSVM algorithms, which have very different characteristics among them, but are potentially useful to the problem. Parameter Grids were performed to identify the most representative variables and to specify the best performing model. The model selected was XGboost, and performance was very similar to the results obtained for the North American stock market in analogous research. The estimated credit ratings suggest that they are more sensitive to the economic and financial situation of the companies than that verified by traditional Rating Agencies.
A avaliação do risco de crédito tem papel relevante para as instituições financeiras por estar associada a possíveis perdas que podem gerar grande impacto nos balanços. Embora existam várias pesquisas sobre aplicações de modelos de aprendizado de máquina e finanças, ainda não há estudo que integre o conhecimento disponível sobre avaliação de risco de crédito. Este trabalho visa especificar modelo de aprendizado de máquina da probabilidade de descumprimento de empresas de capital aberto presentes no Índice Bovespa (corporações) e, fruto das estimações do modelo, obter métrica de avaliação de risco baseada em letras (ratings) de risco. Convergiu-se metodologias verificadas na literatura e estimou-se modelos que compreendem componentes fundamentalistas (de balanço) e de governança corporativa, macroeconômicos e ainda variáveis produto da aplicação do modelo proprietário de avaliação de risco de crédito KMV. Testou-se os algoritmos XGboost e LinearSVM, os quais possuem características bastante distintas entre si, mas são potencialmente úteis ao problema exposto. Foram realizados Grids de parâmetros para identificação das variáveis mais representativas e para a especificação do modelo com melhor desempenho. O modelo selecionado foi o XGboost, tendo sido observado desempenho bastante semelhante aos resultados obtidos para o mercado de ações norte-americano em pesquisa análoga. Os ratings de crédito estimados mostram-se mais sensíveis à situação econômico-financeira das empresas ante o verificado por agências de rating tradicionais.

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

D'AMATO, VINCENZO STEFANO. "Deep Multi Temporal Scale Networks for Human Motion Analysis". Doctoral thesis, Università degli studi di Genova, 2023. https://hdl.handle.net/11567/1104759.

Texto completo

Resumen

The movement of human beings appears to respond to a complex motor system that contains signals at different hierarchical levels. For example, an action such as ``grasping a glass on a table'' represents a high-level action, but to perform this task, the body needs several motor inputs that include the activation of different joints of the body (shoulder, arm, hand, fingers, etc.). Each of these different joints/muscles have a different size, responsiveness, and precision with a complex non-linearly stratified temporal dimension where every muscle has its temporal scale. Parts such as the fingers responds much faster to brain input than more voluminous body parts such as the shoulder. The cooperation we have when we perform an action produces smooth, effective, and expressive movement in a complex multiple temporal scale cognitive task. Following this layered structure, the human body can be described as a kinematic tree, consisting of joints connected. Although it is nowadays well known that human movement and its perception are characterised by multiple temporal scales, very few works in the literature are focused on studying this particular property. In this thesis, we will focus on the analysis of human movement using data-driven techniques. In particular, we will focus on the non-verbal aspects of human movement, with an emphasis on full-body movements. The data-driven methods can interpret the information in the data by searching for rules, associations or patterns that can represent the relationships between input (e.g. the human action acquired with sensors) and output (e.g. the type of action performed). Furthermore, these models may represent a new research frontier as they can analyse large masses of data and focus on aspects that even an expert user might miss. The literature on data-driven models proposes two families of methods that can process time series and human movement. The first family, called shallow models, extract features from the time series that can help the learning algorithm find associations in the data. These features are identified and designed by domain experts who can identify the best ones for the problem faced. On the other hand, the second family avoids this phase of extraction by the human expert since the models themselves can identify the best set of features to optimise the learning of the model. In this thesis, we will provide a method that can apply the multi-temporal scales property of the human motion domain to deep learning models, the only data-driven models that can be extended to handle this property. We will ask ourselves two questions: what happens if we apply knowledge about how human movements are performed to deep learning models? Can this knowledge improve current automatic recognition standards? In order to prove the validity of our study, we collected data and tested our hypothesis in specially designed experiments. Results support both the proposal and the need for the use of deep multi-scale models as a tool to better understand human movement and its multiple time-scale nature.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Liu, Hsin-Yu y 劉欣諭. "Constructing the Conservative Equity Portfolio by the XGBoost Model". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6dx4h9.

Texto completo

Resumen

碩士
國立中山大學
財務管理學系研究所
107
This study uses the data of Taiwan-listed companies from 1997 to 2018 and applying the conservative investment formula proposed by Blitz and Vliet (2018) in the Taiwan market, using three simple factors, low volatility, high dividends and positive momentum to form a portfolio. Then use the machine learning algorithm (XGBoost Model) proposed by Chen and Guestrin (2016), and use the above three factors, but consider the different calculation periods to build a return model. In order to test the effectiveness of the model, we use the original conservative portfolio as the benchmark and adjust the stock weight according to the predicted returns of the model. After applying the model, we not only increase the CAGR by 3% but also reduce the volatility by 1%. Finally, we combine the conservative formula, the return model based on machine learning and the factor weighting approach proposed by Ghayur, Heaney and Platt (2018) to construct a value-added index portfolio, and finally obtain an information ratio of 0.71.

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Herrmann, Vojtěch. "Moderní predikční metody pro finanční časové řady". Master's thesis, 2021. http://www.nusl.cz/ntk/nusl-437908.

Texto completo

Resumen

This thesis deals with comparing two approaches to modelling and predicting time series: a traditional one (the ARIMAX model) and a modern one (gradiently boosted decision trees within the framework of the XGBoost library). In the first part of the thesis we introduce the theoretical framework of supervised learning, the ARIMAX model and gradient boosting in the context of decision trees. In the second part we fit the ARIMAX and XGBoost models which both predict a specific time series, the daily volume of the S&P 500 index, which is a crucial task in many branches. After that we compare the results of the two approaches, we describe the advantages of the XGBoost model, which presumably lead to its better results in this specific simulation study and we show the importance of hyperparameter optimization. Afterwards, we compare the practicality of the methods, especially in regards to their computational demands. In the last part of the thesis, a hybrid model theory is derived and algorithms to get the optimal hybrid model are proposed. These algorithms are then used for the mentioned prediction problem. The optimal hybrid model combines ARIMAX and XGBoost models and performs better than each of the individual models on its own. 1

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Yeh, Jih-Yang y 葉日揚. "A Phishing Website Detection Service Mechanism Utilizing XGBoost Classification Model and Key-term Extraction Method". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/pan8hn.

Texto completo

Resumen

碩士
國立臺灣科技大學
資訊工程系
107
This research proposes a phishing website detection mechanism that combines an XGBoost based phishing website classifier and the key-term extraction method. Some pre-processing techniques are also developed to enhance the performance. XGBoost is well known for its high efficiency and accuracy, and the key-term based detection method helps to minimize the false positive rate of the phishing website classification model. The key-term extraction method is based on two observation: Phishers usually try to make phishing websites look similar to their imitation targets, therefore there must be clues, or key terms, behind website related sources that reveal their imitation target; On the other hand, legitimate websites must be ranked high in search engines, so the ranking of search results of key terms serve as a good reference. The main function of this method is to capture the specific target of the phishing website if there has one and correct the legitimate websites that are misclassified. In addition, the proposed mechanism introduces a sliding window technique to reduce training costs, so as to reach the same performance with the smaller training data. The framework proposed in this research uses the data crawling from PhishTank and Alexa, and experiments are conducted after labeling. Without the key-term detection method, the accuracy rate is about 98\%. After enabling the key-term method, the number of the misclassified legitimate website is further reduced such that the accuracy rate raised to 99%.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Salvaire, Pierre Antony Jean Marie. "Explaining the predictions of a boosted tree algorithm : application to credit scoring". Master's thesis, 2019. http://hdl.handle.net/10362/85991.

Texto completo

Resumen

Dissertation report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge Management
The main goal of this report is to contribute to the adoption of complex « Black Box » machine learning models in the field of credit scoring for retail credit. Although numerous investigations have been showing the potential benefits of using complex models, we identified the lack of interpretability as one of the main vector preventing from a full and trustworthy adoption of these new modeling techniques. Intrinsically linked with recent data concerns such as individual rights for explanation, fairness (introduced in the GDPR1) or model reliability, we believe that this kind of research is crucial for easing its adoption among credit risk practitioners. We build a standard Linear Scorecard model along with a more advanced algorithm called Extreme Gradient Boosting (XGBoost) on a retail credit open source dataset. The modeling scenario is a binary classification task consisting in identifying clients that will experienced 90 days past due delinquency state or worse. The interpretation of the Scorecard model is performed using the raw output of the algorithm while more complex data perturbation technique, namely Partial Dependence Plots and Shapley Additive Explanations methods are computed for the XGBoost algorithm. As a result, we observe that the XGBoost algorithm is statistically more performant at distinguishing “bad” from “good” clients. Additionally, we show that the global interpretation of the XGBoost is not as accurate as the Scorecard algorithm. At an individual level however (for each instance of the dataset), we show that the level of interpretability is very similar as they are both able to quantify the contribution of each variable to the predicted risk of a specific application.

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

KELLER, AISHWARYA. "HYBRID RESAMPLING AND XGBOOST PREDICTION MODEL USING PATIENT'S INFORMATION AND DRAWING AS FEATURES FOR PARKINSON'S DISEASE DETECTION". Thesis, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19442.

Texto completo

Resumen

In the list of most commonly occurring neurodegenerative disorders, Parkinson’s disease ranks second while Alzheimer’s disease tops the list. It has no definite examination for an exact diagnosis. It has been observed that the handwriting of an individual suffering from Parkinson's disease deteriorates considerably. Therefore, many computer vision and micrography-based methods have been used by researchers to explore handwriting as a detection parameter. Yet, these methods suffer from two major drawbacks, i.e., the prediction model's biasedness due to the imbalance in the data and low rate of classification accuracy. The proposed technique is designed to alleviate prediction bias and low classification accuracy by use of hybrid resampling (Synthetic Minority Oversampling Technique and Wilson's Edited Nearest Neighbours) techniques and Extreme Gradient Boosting (XGBoost). Additionally, there is proof of innate neurological dissimilarities between men and women and the aged and the young. There is also a significant link of the dominant hand of the person and the side of the body where initial manifestation begins. Further, the gender, age, and handedness information have not been utilized for Parkinson’s disease detection. In this research work, a prediction method is developed incorporating age, gender, and dominant hand as features to identify Parkinson’s disease. The proposed hybrid resampling and XGBoost method's experimental results yield an accuracy of 98.24% highest so far when age is taken as a parameter along with nine statistical parameters (root mean square, largest value of radius difference between ET and HT, smallest value of radius difference between ET and HT, standard deviation of ET and HT radius difference, mean relative tremor, maximum ET, minimum HT, standard deviation of exam template values, number of instances where the HT and ET radius difference undergoes a change from negative value to positive value or vice versa) achieved on the HandPD dataset. The conventional accuracy is 98.24% (meanders) and 95.37% (spirals) when age is used along with nine statistical parameters extracted from the dataset. It becomes 97.02% (meanders) and 97.12% (spirals) when age, gender and handedness information are utilised. The proposed method results were compared with existing methods, and it is evident that the method outperforms its predecessors.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

KUMAR, SUNIL. "COMBINATORIAL THERAPY FOR TUMOR TREATMENT". Thesis, 2023. http://dspace.dtu.ac.in:8080/jspui/handle/repository/20430.

Texto completo

Resumen

Cancer is a complex and multifaceted disease that continues to pose a significant challenge to global health. As the second leading cause of death worldwide. Early detection and noninvasive techniques of detecting cancer are necessary to improve treatment outcomes, save lives and improve the quality of life. Biopsies of tumors are often expensive and invasive and raise the risk of serious complications like infection, excessive bleeding, and puncture damage to nearby tissues and organs. Early detection biomarkers are often variably expressed in different patients and may even be below the detection level at an early stage. Hence PBMC that shows alteration in gene profile as a result of interaction with tumor antigens may serve as a better early detection biomarker. Also, such alterations in immune gene profile in PBMCs are more detectable in a wide variety of cancer patients despite their variability in different cancer mutants. Tumor cell biomarkers lack specificity, and tumor heterogeneity complicates accurate diagnosis and treatment. Changing biomarker expression affects treatment responses, and technical challenges impact utility. Synthetic drugs targeting tumor cells often trigger tumour cells to acquire resistance against them. Tumor progression is an outcome of tumor growth regulation in conjunction with tumor evasion by immune modulation. Therefore, understanding of immunological biomarkers is equally important. Hence, designing a prudent chemotherapeutic combination requires a detailed understanding of gene regulation altering cancer prognosis and its impact on immune regulation . Immunotherapy also has its side effects and does not provide an adequate response in all patients, and its inherent variability in patient response often makes them prohibitive. Hence, a concomitant targeting of tumour cells and modulation of immune cell function may be a particularly beneficial mechanism for cancer treatment. Machine learning tools are crucial for early cancer detection and immune modulation due to their ability to analyze complex data and identify patterns that may not be apparent through viii | P a g e traditional methods. Potential diagnostic biomarkers were predicted for breast cancer using eXplainable Artificial Intelligence (XAI) on XGBoost machine learning (ML) models trained on a binary classification dataset containing the expression data of PBMCs from 252 breast cancer patients and 194 healthy women. After effectively adding SHAP values further into the XGBoost model, ten important genes related to breast cancer development were discovered to be effective potential biomarkers. It was discovered that SVIP, BEND3, MDGA2, LEF1-AS1, PRM1, TEX14, MZB1, TMIGD2, KIT, and FKBP7 are key genes that impact model prediction. These genes may serve as early, non-invasive diagnostic and prognostic biomarkers for breast cancer patients. The impact of concomitant intervention cancer progression and immune regulation therefore necessitated identification of such biomarkers that have dual impact. Gene expression data of HNSC tumor samples and PBMCs of tumor patient datasets were analysed for the identification of differentially expressed genes. 110 DEGs were found to be common in both datasets. Further, it was identified that these 110 DEGs were involved in biological processes related to tumor regulation. Potential Immunological biomarkers were identified for HNSC cancer. The Genes that play a role in both tumour growth and immune suppression were identified by enrichment analysis followed by gene expression analysis. 10 such genes were shortlisted, Foxp3, CD274, IDO1, IL-10, SOCS1, PRKDC, AXL, CDK6, TGFB1, FADD. CD274 and IDO1 were found to have the highest degree of interaction based on their network of interactions. Synthetic drugs including many of FDA approved drugs might cause significant side effects, leading to adverse impacts on patients' quality of life. Additionally, some cancer cells may develop resistance to synthetic drugs over time, reducing treatment efficacy. Moreover, targeted therapies may only be effective in cancers with specific molecular characteristics, limiting their broad applicability. To address these limitations, ongoing research focuses on developing more targeted and personalized therapies, combining synthetic drugs with other ix | P a g e treatment modalities, and exploring alternative natural compounds with multi-target effects. Multi-target natural compounds offer the advantage of targeting multiple pathways involved in cancer progression without significant side effects. These compounds, derived from plants and other natural sources, hold promise in cancer treatment due to their diverse mechanisms of action and potential for reduced toxicity. Natural compounds that help in tumour suppression as well as functional immune modulation were identified for their dual roles. Np care and GEO databases were used for retravel of natural compounds. 102 potential anti-cancer natural compounds treatment gene expression data was analysed and key differentially regulated genes by them were identified. These 102 natural compounds were analysed for their ability to alter the expression of 110 commonly differentially expressed (identified in first objective). Salidroside was altering maximum number of 66 gene from them. Gallic acid and Shikonin were found to be the natural compounds that target CD274 and IDO1 respectively. Galic acid is extracted from leaves of bearberry, in pomegranate root bark, gallnuts, witch hazel, both in free-state and as part of the tannin molecule, whereas Shikonin is found in the extracts of dried root of the plant Lithospermum erythrorhizon. Studies have demonstrated that both Shikonin and Gallica acid exhibits anti-cancer properties. Single drug treatment can lead to the development of drug resistance, where cancer cells become less responsive to the treatment over time. Some cancers may be inherently resistant to certain drugs, restricting their effectiveness. Moreover, high doses of a single drug can cause severe side effects, impacting patients' quality of life. Additionally, single-drug therapy may not be effective due to the heterogeneity of cancer cells, allowing potential tumor recurrence. Combination therapy targets cancer cells through multiple pathways, reduces drug resistance, and enhances efficacy of treatment outcomes. Synergistic interactions can improve efficacy while minimizing side effects, advancing personalized cancer care for better patient outcomes. A combination of Salidroside, Ginsenoside Rd, Oridonin, Britanin, and Scutellarein was x | P a g e chosen such that they could alter the expression of 108 genes out of the selected 110 genes. The combination was further analyzed for regulating pathways and biological processes that were affected. Expression data analysis of HNSC cancer exhibited 1745 differentially expressed genes. Gallic acid treatment results in the downregulation of 120 genes and upregulation of 35 genes while Shikonin results in the downregulation of 660 genes and upregulation of 38 genes. Pathway analysis of these genes that were modulated by Gallic acid and Shikonin showed them to be crucially involved in pathways that were essential for cancer prognosis. Further Gallic acid and shikonin treatment impact on cancer cell line was analysed individually as well as in combination with the help of in vitro experiments. Gallic acid showed IC50 value of 46.87, 59.37, and 93.75 at 12h, 24h, and 48h treatment, respectively. Shikonin showed IC50 value of 13.86, 11.95, and 10.89 at 12h, 24h, and 48h treatment, respectively. Lowest percentage of cell viability was observed for combination of 80 µl Gallic acid and 16 µl of Shikonin. So, this combination of gallic acid and shikonin could be effective for the HNSC cancer treatment. Our studies showed a multifaceted, multi-dimensional tumor regression by altering autophagy, apoptosis, inhibiting cell proliferation, angiogenesis, metastasis and inflammatory cytokines production. Thus, the study has helped develop a unique combination of natural compounds that will markedly reduce the propensity of development of drug resistance in tumors and immune evasion by the tumors. This study is crucial to developing a combinatorial natural therapeutic cocktail with accentuated immunotherapeutic potential.

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

(5930375), Junhui Wang. "SYSTEMATICALLY LEARNING OF INTERNAL RIBOSOME ENTRY SITE AND PREDICTION BY MACHINE LEARNING". Thesis, 2019.

Buscar texto completo

Resumen

Internal ribosome entry sites (IRES) are segments of the mRNA found in untranslated regions, which can recruit the ribosome and initiate translation independently of the more widely used 5’ cap dependent translation initiation mechanism. IRES play an important role in conditions where has been 5’ cap dependent translation initiation blocked or repressed. They have been found to play important roles in viral infection, cellular apoptosis, and response to other external stimuli. It has been suggested that about 10% of mRNAs, both viral and cellular, can utilize IRES. But due to the limitations of IRES bicistronic assay, which is a gold standard for identifying IRES, relatively few IRES have been definitively described and functionally validated compared to the potential overall population. Viral and cellular IRES may be mechanistically different, but this is difficult to analyze because the mechanistic differences are still not very clearly defined. Identifying additional IRES is an important step towards better understanding IRES mechanisms. Development of a new bioinformatics tool that can accurately predict IRES from sequence would be a significant step forward in identifying IRES-based regulation, and in elucidating IRES mechanism. This dissertation systematically studies the features which can distinguish IRES from nonIRES sequences. Sequence features such as kmer words, and structural features such as predicted MFE of folding, Q_MFE, and sequence/structure triplets are evaluated as possible discriminative features. Those potential features incorporated into an IRES classifier based on XGBboost, a machine learning model, to classify novel sequences as belong to IRES or nonIRES groups. The XGBoost model performs better than previous predictors, with higher accuracy and lower computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by adding global kmer and structural features. The trained XGBoost model has been implemented as the first high-throughput bioinformatics tool for IRES prediction, IRESpy. This website provides a public tool for all IRES researchers and can be used in other genomics applications such as gene annotation and analysis of differential gene expression.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!