Rozprawy doktorskie na temat „ENSEMBLE LEARNING MODELS”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 44 najlepszych rozpraw doktorskich naukowych na temat „ENSEMBLE LEARNING MODELS”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
He, Wenbin. "Exploration and Analysis of Ensemble Datasets with Statistical and Deep Learning Models". The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574695259847734.
Pełny tekst źródłaKim, Jinhan. "J-model : an open and social ensemble learning architecture for classification". Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/7672.
Pełny tekst źródłaGharroudi, Ouadie. "Ensemble multi-label learning in supervised and semi-supervised settings". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSE1333/document.
Pełny tekst źródłaMulti-label learning is a specific supervised learning problem where each instance can be associated with multiple target labels simultaneously. Multi-label learning is ubiquitous in machine learning and arises naturally in many real-world applications such as document classification, automatic music tagging and image annotation. In this thesis, we formulate the multi-label learning as an ensemble learning problem in order to provide satisfactory solutions for both the multi-label classification and the feature selection tasks, while being consistent with respect to any type of objective loss function. We first discuss why the state-of-the art single multi-label algorithms using an effective committee of multi-label models suffer from certain practical drawbacks. We then propose a novel strategy to build and aggregate k-labelsets based committee in the context of ensemble multi-label classification. We then analyze the effect of the aggregation step within ensemble multi-label approaches in depth and investigate how this aggregation impacts the prediction performances with respect to the objective multi-label loss metric. We then address the specific problem of identifying relevant subsets of features - among potentially irrelevant and redundant features - in the multi-label context based on the ensemble paradigm. Three wrapper multi-label feature selection methods based on the Random Forest paradigm are proposed. These methods differ in the way they consider label dependence within the feature selection process. Finally, we extend the multi-label classification and feature selection problems to the semi-supervised setting and consider the situation where only few labelled instances are available. We propose a new semi-supervised multi-label feature selection approach based on the ensemble paradigm. The proposed model combines ideas from co-training and multi-label k-labelsets committee construction in tandem with an inner out-of-bag label feature importance evaluation. Satisfactorily tested on several benchmark data, the approaches developed in this thesis show promise for a variety of applications in supervised and semi-supervised multi-label learning
Henriksson, Aron. "Ensembles of Semantic Spaces : On Combining Models of Distributional Semantics with Applications in Healthcare". Doctoral thesis, Stockholms universitet, Institutionen för data- och systemvetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-122465.
Pełny tekst źródłaAt the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 4 and 5: Unpublished conference papers.
High-Performance Data Mining for Drug Effect Detection
Chakraborty, Debaditya. "Detection of Faults in HVAC Systems using Tree-based Ensemble Models and Dynamic Thresholds". University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1543582336141076.
Pełny tekst źródłaLi, Qiongzhu. "Study of Single and Ensemble Machine Learning Models on Credit Data to Detect Underlying Non-performing Loans". Thesis, Uppsala universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-297080.
Pełny tekst źródłaFranch, Gabriele. "Deep Learning for Spatiotemporal Nowcasting". Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/295096.
Pełny tekst źródłaFranch, Gabriele. "Deep Learning for Spatiotemporal Nowcasting". Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/295096.
Pełny tekst źródłaEkström, Linus, i Andreas Augustsson. "A comperative study of text classification models on invoices : The feasibility of different machine learning algorithms and their accuracy". Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-15647.
Pełny tekst źródłaLundberg, Jacob. "Resource Efficient Representation of Machine Learning Models : investigating optimization options for decision trees in embedded systems". Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162013.
Pełny tekst źródłaOlofsson, Nina. "A Machine Learning Ensemble Approach to Churn Prediction : Developing and Comparing Local Explanation Models on Top of a Black-Box Classifier". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210565.
Pełny tekst źródłaMetoder för att prediktera utträde är vanliga inom Customer Relationship Management och har visat sig vara värdefulla när det kommer till att behålla kunder. För att kunna prediktera utträde med så hög säkerhet som möjligt har den senasteforskningen fokuserat på alltmer komplexa maskininlärningsmodeller, såsom ensembler och hybridmodeller. En konsekvens av att ha alltmer komplexa modellerär dock att det blir svårare och svårare att förstå hur en viss modell har kommitfram till ett visst beslut. Tidigare studier inom maskininlärningsinterpretering har haft ett globalt perspektiv för att förklara svårförståeliga modeller. Denna studieutforskar lokala förklaringsmodeller för att förklara individuella beslut av en ensemblemodell känd som 'Random Forest'. Prediktionen av utträde studeras påanvändarna av Tink – en finansapp. Syftet med denna studie är att ta lokala förklaringsmodeller ett steg längre genomatt göra jämförelser av indikatorer för utträde mellan olika användargrupper. Totalt undersöktes tre par av grupper som påvisade skillnader i tre olika variabler. Sedan användes lokala förklaringsmodeller till att beräkna hur viktiga alla globaltfunna indikatorer för utträde var för respektive grupp. Resultaten visade att detinte fanns några signifikanta skillnader mellan grupperna gällande huvudindikatorerna för utträde. Istället visade resultaten skillnader i mindre viktiga indikatorer som hade att göra med den typ av information som lagras av användarna i appen. Förutom att undersöka skillnader i indikatorer för utträde resulterade dennastudie i en välfungerande modell för att prediktera utträde med förmågan attförklara individuella beslut. Random Forest-modellen visade sig vara signifikantbättre än ett antal enklare modeller, med ett AUC-värde på 0.93.
Henriksson, Erik, i Kristopher Werlinder. "Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302535.
Pełny tekst źródłaMålet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.
Ngo, Khai Thoi. "Stacking Ensemble for auto_ml". Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/83547.
Pełny tekst źródłaMaster of Science
Ferreira, Ednaldo José. "Método baseado em rotação e projeção otimizadas para a construção de ensembles de modelos". Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-27062012-161603/.
Pełny tekst źródłaThe development of new techniques capable of inducing predictive models with low generalization errors has been a constant in machine learning and other related areas. In this context, the composition of an ensemble of models should be highlighted due to its theoretical and empirical potential to minimize the generalization error. Several methods for building ensembles are found in the literature. Among them, the rotation-based (RB) has become known for outperforming other traditional methods. RB method applies the principal components analysis (PCA) for feature extraction as a rotation strategy to provide diversity and accuracy among base models. However, this strategy does not ensure that the resulting direction is appropriate for the supervised learning technique (SLT). Moreover, the RB method is not suitable for rotation-invariant SLTs and also it has not been evaluated with stable ones, which makes RB inappropriate and/or restricted to the use with only some SLTs. This thesis proposes a new approach for feature extraction based on concatenation of rotation and projection optimized for the SLT (called optimized roto-projection). The approach uses a metaheuristic to optimize the parameters from the roto-projection transformation, minimizing the error of the director technique of the optimization process. More emphatically, it is proposed the optimized roto-projection as a fundamental part of a new ensemble method, called optimized roto-projection ensemble (ORPE). The results show that the optimized roto-projection can reduce the dimensionality and the complexities of the data and model. Moreover, optimized roto-projection can increase the performance of the SLT subsequently applied. The ORPE outperformed, with statistical significance, RB and others using stable and unstable SLTs for classification and regression with databases from public and private domains. The ORPE method was unrestricted and highly effective holding the first position in every dominance rankings
Top, Mame Kouna. "Analyse des modèles résines pour la correction des effets de proximité en lithographie optique". Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENT007/document.
Pełny tekst źródłaThe Progress made in microelectronics responds to the matter of production costs reduction and to the search of new markets. These progresses have been possible thanks those made in optical lithography, the printing process principally used in integrated circuit (IC) manufacturing.The miniaturization of integrated circuits has been possible only by pushing the limits of optical resolution. However this miniaturization increases the sensitivity of the transfer, leading to more proximity effects at progressively more advanced technology nodes (45 and 32 nm in transistor gate size). The correction of these optical proximity effects is indispensible in photolithographic processes for advanced technology nodes. Techniques of optical proximity correction (OPC) enable to increase the achievable resolution and the pattern transfer fidelity for advanced lithographic generations. Corrections are made on the mask based on OPC models which connect the image on the resin to the changes made on the mask. The reliability of these OPC models is essential for the improvement of the pattern transfer fidelity.This thesis analyses and evaluates the OPC resist models which simulates the behavior of the resist after the photolithographic process. Data modeling and statistical analysis have been used to study these increasingly empirical resist models. Besides the model calibration data reliability, we worked on the way of using the models calibration platforms generally used in IC manufacturing.This thesis exposed the results of the analysis of OPC resist models and proposes a new methodology for OPC resist models creation, analysis and validation
Whiting, Jeffrey S. "Cognitive and Behavioral Model Ensembles for Autonomous Virtual Characters". Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1873.pdf.
Pełny tekst źródłaIyer, Vasanth. "Ensemble Stream Model for Data-Cleaning in Sensor Networks". FIU Digital Commons, 2013. http://digitalcommons.fiu.edu/etd/973.
Pełny tekst źródłaAli, Rozniza. "Ensemble classification and signal image processing for genus Gyrodactylus (Monogenea)". Thesis, University of Stirling, 2014. http://hdl.handle.net/1893/21734.
Pełny tekst źródłaDarwiche, Aiman A. "Machine Learning Methods for Septic Shock Prediction". Diss., NSUWorks, 2018. https://nsuworks.nova.edu/gscis_etd/1051.
Pełny tekst źródłaLi, Jianeng. "Research on a Heart Disease Prediction Model Based on the Stacking Principle". Thesis, Högskolan Dalarna, Informatik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34591.
Pełny tekst źródłaPouilly-Cathelain, Maxime. "Synthèse de correcteurs s’adaptant à des critères multiples de haut niveau par la commande prédictive et les réseaux de neurones". Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG019.
Pełny tekst źródłaThis PhD thesis deals with the control of nonlinear systems subject to nondifferentiable or nonconvex constraints. The objective is to design a control law considering any type of constraints that can be online evaluated.To achieve this goal, model predictive control has been used in addition to barrier functions included in the cost function. A gradient-free optimization algorithm has been used to solve this optimization problem. Besides, a cost function formulation has been proposed to ensure stability and robustness against disturbances for linear systems. The proof of stability is based on invariant sets and the Lyapunov theory.In the case of nonlinear systems, dynamic neural networks have been used as a predictor for model predictive control. Machine learning algorithms and the nonlinear observers required for the use of neural networks have been studied. Finally, our study has focused on improving neural network prediction in the presence of disturbances.The synthesis method presented in this work has been applied to obstacle avoidance by an autonomous vehicle
Duncan, Andrew Paul. "The analysis and application of artificial neural networks for early warning systems in hydrology and the environment". Thesis, University of Exeter, 2014. http://hdl.handle.net/10871/17569.
Pełny tekst źródłaBellani, Carolina. "Predictive churn models in vehicle insurance". Master's thesis, 2019. http://hdl.handle.net/10362/90767.
Pełny tekst źródłaThe goal of this project is to develop a predictive model to reduce customer churn from a company. In order to reduce churn, the model will identify customers who may be thinking of ending their patronage. The model also seeks to identify the reasons behind the customers decision to leave, to enable the company to take appropriate counter measures. The company in question is an insurance company in Portugal, Tranquilidade, and this project will focus in particular on their vehicle insurance products. Customer churn will be calculated in relation to two insurance policies; the compulsory motor’s (third party liability) policy and the optional Kasko’s (first party liability) policy. This model will use information the company holds internally on their customers, as well as commercial, vehicle, policy details and external information (from census). The first step of the analysis was data pre-processing with data cleaning, transformation and reduction (especially, for redundancy); in particular, concept hierarchy generation was performed for nominal data. As the percentage of churn is not comparable with the active policy products, the dataset is unbalanced. In order to resolve this an under-sampling technique was used. To force the models to learn how to identify the churn cases, samples of the majority class were separated in such a way as to balance with the minority class. To prevent any loss of information, all the samples of the majority class were studied with the minority class. The predictive models used are generalized linear models, random forests and artificial neural networks, parameter tuning was also conducted. A further validation was also performed on a recent new sample, without any data leakage. In relation to compulsory motor’s insurances, the recommended model is an artificial neural network. The model has a first layer of 15 neurons and a second layer of 4 neurons, with an AUC of 68.72%, a sensitivity of 33.14% and a precision of 27%. For the Kasko’s insurances, the suggested model is a random forest with 325 decision trees with an AUC of 72.58%, a sensitivity of 36.85% and a precision of 31.70%. AUCs are aligned with other predictive churn model results, however, precision and sensitivity measures are worse than in telecommunication churn models’, but comparable with insurance churn predictions. Not only do the models allow for the creation of a churn classification, but they are also able to give some insight about this phenomenon, and therefore provide useful information and data which the company can use and analyze in order to reduce the customer churn rate. However, there are some hidden factors that couldn’t be accounted for with the information available, such as; competitors’ market and client interaction, if these could be integrated a better prediction could be achieved.
Amaro, Miguel Mendes. "Credit scoring: comparison of non‐parametric techniques against logistic regression". Master's thesis, 2020. http://hdl.handle.net/10362/99692.
Pełny tekst źródłaOver the past decades, financial institutions have been giving increased importance to credit risk management as a critical tool to control their profitability. More than ever, it became crucial for these institutions to be able to well discriminate between good and bad clients for only accepting the credit applications that are not likely to default. To calculate the probability of default of a particular client, most financial institutions have credit scoring models based on parametric techniques. Logistic regression is the current industry standard technique in credit scoring models, and it is one of the techniques under study in this dissertation. Although it is regarded as a robust and intuitive technique, it is still not free from several critics towards the model assumptions it takes that can compromise its predictions. This dissertation intends to evaluate the gains in performance resulting from using more modern non-parametric techniques instead of logistic regression, performing a model comparison over four different real-life credit datasets. Specifically, the techniques compared against logistic regression in this study consist of two single classifiers (decision tree and SVM with RBF kernel) and two ensemble methods (random forest and stacking with cross-validation). The literature review demonstrates that heterogeneous ensemble approaches have a weaker presence in credit scoring studies and, because of that, stacking with cross-validation was considered in this study. The results demonstrate that logistic regression outperforms the decision tree classifier, has similar performance in relation to SVM and slightly underperforms both ensemble approaches in similar extents.
Santos, Esdras Christo Moura dos. "Predictive modelling applied to propensity to buy personal accidents insurance products". Master's thesis, 2018. http://hdl.handle.net/10362/37698.
Pełny tekst źródłaPredictive models have been largely used in organizational scenarios with the increasing popularity of machine learning. They play a fundamental role in the support of customer acquisition in marketing campaigns. This report describes the development of a propensity to buy model for personal accident insurance products. The entire process from business understanding to the deployment of the final model is analyzed with the objective of linking the theory to practice.
Gau, Olivier. "Ensemble learning with GSGP". Master's thesis, 2020. http://hdl.handle.net/10362/93780.
Pełny tekst źródłaThe purpose of this thesis is to conduct comparative research between Genetic Programming (GP) and Geometric Semantic Genetic Programming (GSGP), with different initialization (RHH and EDDA) and selection (Tournament and Epsilon-Lexicase) strategies, in the context of a model-ensemble in order to solve regression optimization problems. A model-ensemble is a combination of base learners used in different ways to solve a problem. The most common ensemble is the mean, where the base learners are combined in a linear fashion, all having the same weights. However, more sophisticated ensembles can be inferred, providing higher generalization ability. GSGP is a variant of GP using different genetic operators. No previous research has been conducted to see if GSGP can perform better than GP in model-ensemble learning. The evolutionary process of GP and GSGP should allow us to learn about the strength of each of those base models to provide a more accurate and robust solution. The base-models used for this analysis were Linear Regression, Random Forest, Support Vector Machine and Multi-Layer Perceptron. This analysis has been conducted using 7 different optimization problems and 4 real-world datasets. The results obtained with GSGP are statistically significantly better than GP for most cases.
O objetivo desta tese é realizar pesquisas comparativas entre Programação Genética (GP) e Programação Genética Semântica Geométrica (GSGP), com diferentes estratégias de inicialização (RHH e EDDA) e seleção (Tournament e Epsilon-Lexicase), no contexto de um conjunto de modelos, a fim de resolver problemas de otimização de regressão. Um conjunto de modelos é uma combinação de alunos de base usados de diferentes maneiras para resolver um problema. O conjunto mais comum é a média, na qual os alunos da base são combinados de maneira linear, todos com os mesmos pesos. No entanto, conjuntos mais sofisticados podem ser inferidos, proporcionando maior capacidade de generalização. O GSGP é uma variante do GP usando diferentes operadores genéticos. Nenhuma pesquisa anterior foi realizada para verificar se o GSGP pode ter um desempenho melhor que o GP no aprendizado de modelos. O processo evolutivo do GP e GSGP deve permitir-nos aprender sobre a força de cada um desses modelos de base para fornecer uma solução mais precisa e robusta. Os modelos de base utilizados para esta análise foram: Regressão Linear, Floresta Aleatória, Máquina de Vetor de Suporte e Perceptron de Camadas Múltiplas. Essa análise foi realizada usando 7 problemas de otimização diferentes e 4 conjuntos de dados do mundo real. Os resultados obtidos com o GSGP são estatisticamente significativamente melhores que o GP na maioria dos casos.
Nožička, Michal. "Ensemble learning metody pro vývoj skóringových modelů". Master's thesis, 2018. http://www.nusl.cz/ntk/nusl-382813.
Pełny tekst źródłaAbreu, Mariana da Conceição Ferreira. "Modelos de Avaliação de Risco de Crédito: Aplicação de Machine Learning". Master's thesis, 2020. http://hdl.handle.net/10316/94723.
Pełny tekst źródłaExistem vários métodos que ao longo dos anos tem sido empregues na avaliação de risco de crédito, sobretudo, metodologias tradicionais como o Modelo de Análise Discriminante (ADi), Modelo Logit e Modelo Probit, e metodologias mais sofisticadas de Machine Learning, como Árvores de Classificação (AC), Random Forests (RF), Redes Neuronais (RN) e Support Vector Machines (SVM). Na revisão de literatura são apresentados alguns estudos que recorrem a metodologias tradicionais e a metodologias de Machine Learning. Estas últimas não só se apresentam teoricamente como são estudadas na prática para avaliar diferentes aplicações de risco de crédito, sendo aplicados a duas bases reais, disponíveis publicamente, uma referente ao cumprimento de pagamento de cartões de crédito em Taiwan e outra referente ao risco de crédito na Alemanha. Ambas as bases de dados incluem uma variável de resposta binária relativa ao risco de crédito. Em cada modelo experimentaram-se alguns meta-parâmetros, tendo a devida precaução na sua seleção, de forma a não repeti-los nas diferentes combinações do mesmo modelo e, consequentemente, de forma a evitar o overfitting.Este estudo efetua uma análise do desempenho dos modelos de Machine Learning individuais e também do desempenho de uma técnica de Ensemble baseada nos resultados obtidos pelos diferentes modelos, com intuito de determinar qual destes revela um melhor desempenho na avaliação de risco de crédito. A maioria dos resultados deste estudo empírico permitem concluir que os desempenhos da técnica de Ensemble são superiores aos dos modelos individuais. Também o modelo Random Forest realçou os melhores desempenhos de entre todos os modelos individuais.
There are several methods that over the years have been used in credit risk assessment, especially traditional methodologies such as the Discriminant Analysis Model (ADi), Logit Model and Probit Model, and more sophisticated Machine Learning methodologies, such as Classification Trees (AC), Random Forests (RF), Neural Networks (RN) and Support Vector Machines (SVM). In the literature review presents some studies that use traditional methodologies and Machine Learning methodologies. This last not only present themselves theoretically, but are studied in practice to evaluate different applications of credit risk, being applied to two real bases, publicly available, one referring to the fulfillment of credit card payments in Taiwan and the other referring to credit risk. in Germany. Both databases include a binary response variable for credit risk.In each model, some meta-parameters were experimented, taking due care in their selection, so as not to repeat them in the different combinations of the same model and, consequently, in order to avoid overfitting.This study performs an analysis of the performance of the individual Machine Learning models and also of the performance of an Ensemble technique based on the results obtained by the different models, in order to determine which one shows a better performance in the credit risk assessment. Most of the results of this empirical study allow us to conclude that the performances of the Ensemble technique are superior to those of the individual models. Also the Random Forest model highlighted the best performances among all individual models.
Milioli, Heloisa Helena. "Breast cancer intrinsic subtypes: a critical conception in bioinformatics". Thesis, 2017. http://hdl.handle.net/1959.13/1350957.
Pełny tekst źródłaBreast cancers have been uncovered by high-throughput technologies that allow the investigation at the genomic, transcriptomic and proteomic levels. In the early 2000s, the gene expression profiling has led to the classification of five intrinsic subtypes: luminal A, luminal B, HER2-enriched, normal like and basal-like. A decade later, the spectrum of copy number aberrations has further expanded the heterogeneous architecture of this disease with the identification of 10 integrative clusters (IntClusts). The referred classifications aim at explaining the diverse phenotypes and independent outcomes that impact clinical decision-making. However, intrinsic subtypes and IntClusts show limited overlap. In this context, novel methodologies in bioinformatics to analyse large-scale microarray data will contribute to further understanding the molecular subtypes. In this study, we focus on developing new approaches to cover multi-perspective, highly dimensional, and highly complex data analysis in breast cancer. Our goal is to review and reconcile the disease classification, underlying the differences across clinicopathological features and survival outcomes. For this purpose, we have explored the information processed by the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC); one of the largest of its type and depth, with over 2000 samples. A series of distinct approaches combining computer science, statistics, mathematics, and engineering have been applied in order to bring new insights to cancer biology. The translational strategy will facilitate a more efficient and effective incorporation of bioinformatics research into laboratory assays. Further applications of this knowledge are, therefore, critical in order to support novel implementations in the clinical setting; paving the way for future progress in medicine.
Huang, Hong-Zhou, i 黃弘州. "Nonintrusive Appliance Recognition Algorithm based on Ensemble Learning Model". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/3ftb3d.
Pełny tekst źródła國立中興大學
資訊科學與工程學系
103
In this paper, a non-intrusive appliance load monitoring (NILM) scheme based on the Adaboot ensemble algorithm for cheaper and low frequency meter is developed. In order to apply the NILM scheme we need to extract features for appliances. However, it is a challenging task if we want to know the states for each appliance at home just from information of single point aggregate power meter. In literature, it is usually done by applying high frequency meter to extract high frequency feature, e.g., harmonics and electromagnetic interference, to make recognition accuracy better. However, the hardware of high frequency were costly. For typical family, expensive devices would make the NILM impractical and infeasible. To develop a NILM that can be applied on a cheaper and low frequency meter, low frequency features should be used. In addition, these low frequency features should satisfy the additivity property in order to be used in our learning model. The Adaboost ensemble learning model is then used as the recognition algorithm in our work. Multiple features and multiple recognition algorithms are used to get initial recognition results. These results are used as the training data, and adopted the Adaboost ensemble learning model. In this model, the recognition result was decided from multiple features and multiple different weight recognition algorithms. Adaboost ensemble learning algorithm could solve the problem of similar number of votes in typical ensemble learning model. Results show that the proposed Adaboost ensemble learning model could enhance the recognition accuracy.
Chi, Tsai-Lin, i 季彩琳. "Using Ensemble Learning Model for Classifying Fiduciary Purchasing Behavior". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49166414836032969056.
Pełny tekst źródła輔仁大學
企業管理學系管理學碩士班
102
In financial industry, the environment is getting much harsher than ever before. To be outstanding in financial sector, bankers have been tried to satisfy the need of customers as they can whether in service quality or in service area. Considering the characteristics of the consumer credit loans which are less risky and thriving, bankers are keen to sell consumer credit loans. The objective of the proposed study is to explore the performance of classification model for classifying fiduciary purchasing behavior using ensemble learning techniques. This study proposed a hybrid Logistic regression, Discriminant analysis, Extreme learning machine, Support vector machine and Artificial neural networks method to upgrade the performance of classification model comparing to single learner above. To demonstrate the effectiveness of the ensemble learning approach, classification tasks are performed on one consumer dataset of credit loans telemarketing. 15 variables are adopted in this study. The result shows that the proposed approach is better than other five single classification models.
SYU, HUAN-YU, i 許桓瑜. "Prediction Model of Narcolepsy Based on Ensemble Learning Approach". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2d9ud3.
Pełny tekst źródła國立臺北護理健康大學
資訊管理研究所
106
The advent of the era of precision medicine shows that the diagnosis of diseases tends to be personalized and customized. Nowadays, the combination of medicine and information is the trend of the times, and Narcolepsy is a kind of Hypersomnia. Patients often have symptoms such as excessive daytime sleepiness, cataplexy, hypnagogic hallucination, Narcolepsy must be diagnosed by multiple tests of sleep, multi-stage sleep test, etc. Most of the studies related to narcolepsy use only partial or specific tests. In this study, about ten kinds of measurement and questionnaire data related to narcolepsy were collected, and build a classifier based on ensemble learning to classify the narcolepsy type I and narcolepsy type II. All kind of dataset will be training and selecting parameters by five kinds of classifiers, such as support vector machine, decision tree, neural network, nearest neighbor method, and naive Bayes, and training the classifiers of individual datasets with the best model parameters, and integrating the individual classifiers based on ensemble learning and establishing a hybrid model, the accuracy of the individual classifier is about 57.38%~71.64%, and the accuracy of the hybrid model is 80.88%. The result shows that the model based on ensemble learning is better than individual classifier. In the process of construction, the feature importance and reference rules of each data set are also mined through the decision tree. For example, we can use some parameters in the PSG and MSLT or the hallucination in the Comorbidity to further classify the narcolepsy category. These reference rules are available as a reference for future clinical diagnosis of narcolepsy. In clinical practice, it is also possible to prioritize tests with high discrimination in the model. For example, in addition to the necessary MSLT and PSG tests, PET and other tests can be prioritized. The above period can shorten the clinical diagnosis process.
CHIU, YI-HAN, i 邱奕涵. "Using Ensemble Learning to Build the Sales Forecast Model of Baking Industry". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/57v4jt.
Pełny tekst źródła國立高雄第一科技大學
行銷與流通管理系碩士班
106
Recently, the value of bakery industry output is being on the rise in Taiwan. Nowadays, there is a growing focus on healthy diet. Given this, the research use sales data which is from healthy bakeries to build the sales forecast model of baking industry. Besides, the research tries to use data visualization and feature selection to examine each bakery. Eventually, through Ensemble Learning to build the better forecast model of baking industry. The result demonstrate that technique XGBoost is better than other model. In addition, the result would like to help manager to control product and allocate human resources. Eventually, the sales forecast can assist company to set short-term or long-term objectives to make operating plan.
Huang, Yong-Jhih, i 黃雍智. "Applying Deep Learning and Ensemble Learning to Construct Spectrum and Cepstrum of Filtered Phonocardiogram Prediction Model". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/535usy.
Pełny tekst źródła國立中興大學
資訊管理學系所
106
Coronary artery disease is a common chronic disease, as known as ischemic heart disease, which is cardiac dysfunction caused by insufficient blood supply to the heart and kills countless people every year in the world. In recent years, coronary artery disease ranks first in the world’s top ten cause of death. Until now, cardiac auscultation is still an important examination for diagnosing heart diseases. Many heart diseases can be diagnosed effectively by auscultation. However, cardiac auscultation relies on the subjective experience of physicians. In order to provide objective diagnostic and assist physicians in the diagnosis of heart sounds in clinic, this study uses phonocardiograms to build an automatic classification model. This study proposes an automatic classification approach for phonocardiograms using deep learning and ensemble learning with filters. The steps of approach are as follows:First, Savitzky-Golay and Butterworth filters are used to filter the phonocardiograms. Second, phonocardiograms are converted into spectrograms and cepstrums using methods such as short-time Fourier transform and discrete cosine transform. Third: Training convolutional neural networks to build classification models for phonocardiogram. Fourth: Use two ensemble strategies to build ensemble models. Lastly: Balance the quantity of positive and negative samples to increase the sensitivity of the model. The experimental results show that the proposed method is very competitive, which show that the performance of phonocardiogram classification model in the hold out testing is 86.04% MAcc (86.46% sensitivity, 85.63% specificity), and in the 10-fold cross validation is 89.81% MAcc(91.73% sensitivity, 87.91% specificity).
Zheng, Yu-Xuan, i 鄭宇軒. "Sleep Apnea Detection Algorithm using EEG and Oximetry based on Ensemble Learning Model". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/66291000965483738888.
Pełny tekst źródła國立中興大學
資訊科學與工程學系
105
The gold standard for diagnosis of sleep apnea is a formal sleep study established by the polysomnography(PSG). However the high cost and the complex steps of PSG makes a diagnosis of sleep apnea become evenmore difficult. Not to mention the shortage of devices and medical human resources. In this thesis, we propose a sleep apnea detection algorithm based on ensemble machine learning model. By using only Electroencephalography(EEG) and Oximetry, we can significantly reduce the difficulty of diagnosis and the effort of medical persons. The experimental results show that the performance of our approach is comparable to other past works.
Cheng, Lu-Wen, i 程路文. "A prediction model of air pollution and Respiratory Diseases based on Ensemble learning". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/23mv9a.
Pełny tekst źródła元智大學
資訊工程學系
106
The study aimed to determine whether there is an association between air pollutants levels and outpatient clinic visits with chronic obstructive pulmonary disease (COPD) in Taiwan. Data of air pollutant concentrations (PM2.5、PM10、SO2、NO2、CO、O3) were collected from air monitoring stations. We use a case-crossover study design and conditional logistic regression models with odds ratios (OR) and 95% confidence intervals(CI) for evaluating the associations between the air pollutant factor and COPD-associated OC visits. Analyses show the PM2.5, PM10, CO, NO2, SO2 had significant effects on COPD-associated OC visits. In colder days, a significantly greater effect on COPD-associated OC visits O3 had greater lag effects (the lag was 1, 2,4,5 days) on COPD-associated OC visits. Controlling ambient air pollution would provide benefits to COPD patients. In this study, We used XGBoost algorithm to build a prediction model of air pollution and hospital readmission for Chrome Obstructive Pulmonary Disease. Compared with Random Forest, Neural Network, C5.0, AdaBoost and SVM, it was found that the model based on the integrated learning method XGBoost algorithm produces a higher classification of this problem result.
Chang, Hsueh-Wei, i 張學瑋. "Nonintrusive Appliance Recognition Algorithm based on Ensemble Learning Model integrating with Dynamic Time Warping". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/35192290829923038200.
Pełny tekst źródła國立中興大學
資訊科學與工程學系
104
According to the research, if we can provide immediate and fine-grained power information to users, a significant reduction in the energy wastage can be achieved. Non-Intrusive Appliance Load Monitoring is an approach to reach the goal, which is more practical and feasible for typical families. In previous studies, we can discover that there were some disadvantages. First, it usually used high frequency sensor to acquire information, which made the cost of hardware higher. Second, most studies focused on the high consumption or on/off type appliances. As a result, low consumption appliances, multi-state appliances and continuously variable appliances were ignored. In this paper, we proposed a low cost and real-time approach. We use two-step detection in training phase and cluster detection in testing phase to confirm an event. Besides, we use a clustering algorithm-ISODATA to find an appropriate number of state for each appliance in the training set after feature extraction. Finally, we succeed to build the ensemble learning model integrating with dynamic time warping (DTW) model to identify appliances. Experimental results implies that two-step detection and cluster detection method can avoid excessive unknown appliance events, which can improve the accuracy of event detection. In addition, we can solve the problem of tie vote by using ensemble learning model integrating with DTW predictive model, which results in better recognition accuracy than using a single predictive model.
Silvestre, Martinho de Matos. "Three-stage ensemble model : reinforce predictive capacity without compromising interpretability". Master's thesis, 2019. http://hdl.handle.net/10362/71588.
Pełny tekst źródłaOver the last decade, several banks have developed models to quantify credit risk. In addition to the monitoring of the credit portfolio, these models also help deciding the acceptance of new contracts, assess customers profitability and define pricing strategy. The objective of this paper is to improve the approach in credit risk modeling, namely in scoring models to predict default events. To this end, we propose the development of a three-stage ensemble model that combines the results interpretability of the Scorecard with the predictive power of machine learning algorithms. The results show that ROC index improves 0.5%-0.7% and Accuracy 0%-1% considering the Scorecard as baseline.
Chen, Chien-Jen, i 陳建仁. "Combining Hidden Markov Model with Ensemble Learning to Predict Hidden States and Conduct Stochastic Simulation". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/mhh87z.
Pełny tekst źródła國立交通大學
工業工程與管理系所
106
Taiwan’s semiconductor industry, optoelectronics industry, computers and peripheral equipment industry play an important role in the world. Additionally, the rapid development of Artificial Intelligence (AI) and Internet of Things (IoT) have also driven the growth of these industries. Although the overall industry is growing up, there is a significant gap between the firms within the industry. Therefore, this study focuses on those companies which revenues go up and down. First, Hidden Markov Model (HMM) is used to explore the company’s hidden states. Without loss of generality, three hidden states, such as healthy, risky, and sick are used in this thesis. In particular, the hidden states are linked into measurable variables, namely, NPBT (net profit before tax), EPS (earning per share), and ROE (return on equity). In addition, 19 representative independent variables used to predict hidden states and conduct stochastic simulation. This study use ensemble learning to identify the key performance indicators (KPIs) of hidden states and then uses Bayesian Belief Network (BBN) to conduct stochastic simulations. Based on the presented framework, the impact of the abovementioned KPI on the hidden state and NPBT can be quantitatively measured. Finally, management implications are provided to improve the company’s operational efficiency.
Hong, Zih-Siang, i 洪梓翔. "Using Ensemble Learning and Deep Recurrent Neural Network to Construct an Internet Forum Conversation Prediction Model". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/s67dep.
Pełny tekst źródła中原大學
資訊管理研究所
106
The study on natural language dialogue or conversation involves language understanding, reasoning, and basic common sense, therefore it is one of the most challenging artificial intelligence issues. To design a common and general conversation model is even more complicated and difficult. In the past, the studies on natural language processing and dialogue mainly focused on the rule-based and machine learning-based methods. Although these methods can solve part of the dialogue problems in the specific fields, but they have their own learning bottlenecks. Until recurrent neural networks (RNN) and sequence to sequence model is proposed, the research in this field has been further breakthrough. However, although deep learning can automatically extract the features of a large number of dialogue data, it has high requirements on the quantity and quality of data sets, and has the overfitting problem. Therefore, how to extract the useful features from the limited training dataset, and achieve model generalization ability in different situations, is the challenge of deep learning in the natural language dialogue problem. This project is titled “Conversation Model using Deep Recurrent Neural Networks with Ensemble Learning”. The advantage of the ensemble learning is that it enhances the generalization ability of the model to reinforce the prediction, and make the model suitable for the prediction of various contexts and scenarios. In this study, ensemble learning will be applied to the natural language dialogue and conversation model in various and complex contexts and scenarios. This method is a deep neural network conversation model, using the ensemble learning method to train the sub-prediction model of multiple different types, different parameters, and different training data sets. Then to obtain the prediction results by the specific designed ensemble strategy. Through a number of sub-models jointly predicted and judged to get a generalized conversation prediction model.
Wu, Hsuan, i 吳亘. "Constructing a Risk Assessment Model for Small and Medium Enterprises by Ensemble Learning with Macroeconomic Indices". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4573r2.
Pełny tekst źródła國立交通大學
工業工程與管理系所
107
Due to the high connection of the global financial system, the international financial crisis may have a significant influence on the domestic economy and increase the number of non-performing loans from financial institutions. As a result, many financial institutions have begun to construct an objective and fair risk assessment model. However, most financial institutions only take internal information about borrowing SMEs into account when constructing the model. Therefore, considering the macroeconomic environment may affect the risk of default, this thesis selects macroeconomic indices through Pearson Correlation Analysis and Principal Component Analysis to become new variables. On the other hand, the two-stage ensemble learning method, which integrates three classifiers (Logistic Regression, Support Vector Machine, and Gradient Boosting Decision Tree) is applied to construct the model in the thesis. A financial institution in Taiwan provides the actual SMEs loan data as the verification data. According to the result, the risk assessment model proposed in this thesis outperforms other common single-stage classifier models. Furthermore, adding the macroeconomic indices in the model is also proved to enhance the prediction performance.
Siedel, Georg. "Evaluation von Machine-Learning-Modellen und Konzeptionierung eines Modell-Ensembles für die Vorhersage von Unfalldaten". 2020. https://tud.qucosa.de/id/qucosa%3A73972.
Pełny tekst źródłaFrazão, Xavier Marques. "Deep learning model combination and regularization using convolutional neural networks". Master's thesis, 2014. http://hdl.handle.net/10400.6/5605.
Pełny tekst źródłaAshofteh, Afshin. "Data Science for Finance: Targeted Learning from (Big) Data to Economic Stability and Financial Risk Management". Doctoral thesis, 2022. http://hdl.handle.net/10362/135620.
Pełny tekst źródłaThe modelling, measurement, and management of systemic financial stability remains a critical issue in most countries. Policymakers, regulators, and managers depend on complex models for financial stability and risk management. The models are compelled to be robust, realistic, and consistent with all relevant available data. This requires great data disclosure, which is deemed to have the highest quality standards. However, stressed situations, financial crises, and pandemics are the source of many new risks with new requirements such as new data sources and different models. This dissertation aims to show the data quality challenges of high-risk situations such as pandemics or economic crisis and it try to theorize the new machine learning models for predictive and longitudes time series models. In the first study (Chapter Two) we analyzed and compared the quality of official datasets available for COVID-19 as a best practice for a recent high-risk situation with dramatic effects on financial stability. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organizations based on the value of systematic measurement errors. We combined excel files, text mining techniques, and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data. In the second study (Chapter Three) we discussed credit risk as the most significant source of risk in banking as one of the most important sectors of financial institutions. We proposed a new machine learning approach for online credit scoring which is enough conservative and robust for unstable and high-risk situations. This Chapter is aimed at the case of credit scoring in risk management and presents a novel method to be used for the default prediction of high-risk branches or customers. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study its impact on modeling performance on the benefit of the credit provider. The findings show that the new credit scoring methodology represents a reasonable coefficient of determination and a very low false-negative rate. It is computationally less expensive with high accuracy with around 18% improvement in Recall/Sensitivity. Because of the recent perspective of continued credit/behavior scoring, our study suggests using this credit score for non-traditional data sources for online loan providers to allow them to study and reveal changes in client behavior over time and choose the reliable unbanked customers, based on their application data. This is the first study that develops an online non-parametric credit scoring system, which can reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability. In the third study (Chapter Four) we focus on the financial stability challenges faced by insurance companies and pension schemes when managing systematic (undiversifiable) mortality and longevity risk. For this purpose, we first developed a new ensemble learning strategy for panel time-series forecasting and studied its applications to tracking respiratory disease excess mortality during the COVID-19 pandemic. The layered learning approach is a solution related to ensemble learning to address a given predictive task by different predictive models when direct mapping from inputs to outputs is not accurate. We adopt a layered learning approach to an ensemble learning strategy to solve the predictive tasks with improved predictive performance and take advantage of multiple learning processes into an ensemble model. In this proposed strategy, the appropriate holdout for each model is specified individually. Additionally, the models in the ensemble are selected by a proposed selection approach to be combined dynamically based on their predictive performance. It provides a high-performance ensemble model to automatically cope with the different kinds of time series for each panel member. For the experimental section, we studied more than twelve thousand observations in a portfolio of 61-time series (countries) of reported respiratory disease deaths with monthly sampling frequency to show the amount of improvement in predictive performance. We then compare each country’s forecasts of respiratory disease deaths generated by our model with the corresponding COVID-19 deaths in 2020. The results of this large set of experiments show that the accuracy of the ensemble model is improved noticeably by using different holdouts for different contributed time series methods based on the proposed model selection method. These improved time series models provide us proper forecasting of respiratory disease deaths for each country, exhibiting high correlation (0.94) with Covid-19 deaths in 2020. In the fourth study (Chapter Five) we used the new ensemble learning approach for time series modeling, discussed in the previous Chapter, accompany by K-means clustering for forecasting life tables in COVID-19 times. Stochastic mortality modeling plays a critical role in public pension design, population and public health projections, and in the design, pricing, and risk management of life insurance contracts and longevity-linked securities. There is no general method to forecast the mortality rate applicable to all situations especially for unusual years such as the COVID-19 pandemic. In this Chapter, we investigate the feasibility of using an ensemble of traditional and machine learning time series methods to empower forecasts of age-specific mortality rates for groups of countries that share common longevity trends. We use Generalized Age-Period-Cohort stochastic mortality models to capture age and period effects, apply K-means clustering to time series to group countries following common longevity trends, and use ensemble learning to forecast life expectancy and annuity prices by age and sex. To calibrate models, we use data for 14 European countries from 1960 to 2018. The results show that the ensemble method presents the best robust results overall with minimum RMSE in the presence of structural changes in the shape of time series at the time of COVID-19. In this dissertation’s conclusions (Chapter Six), we provide more detailed insights about the overall contributions of this dissertation on the financial stability and risk management by data science, opportunities, limitations, and avenues for future research about the application of data science in finance and economy.