Dissertations / Theses: 'Tree Ensemble'

1

Elias, Joran. "Randomness In Tree Ensemble Methods." The University of Montana, 2009. http://etd.lib.umt.edu/theses/available/etd-10092009-110301/.

Full text

Abstract:

Tree ensembles have proven to be a popular and powerful tool for predictive modeling tasks. The theory behind several of these methods (e.g. boosting) has received considerable attention. However, other tree ensemble techniques (e.g. bagging, random forests) have attracted limited theoretical treatment. Specifically, it has remained somewhat unclear as to why the simple act of randomizing the tree growing algorithm should lead to such dramatic improvements in performance. It has been suggested that a specific type of tree ensemble acts by forming a locally adaptive distance metric [Lin and Jeon, 2006]. We generalize this claim to include all tree ensembles methods and argue that this insight can help to explain the exceptional performance of tree ensemble methods. Finally, we illustrate the use of tree ensemble methods for an ecological niche modeling example involving the presence of malaria vectors in Africa.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Yi. "Strategies for Combining Tree-Based Ensemble Models." NSUWorks, 2017. http://nsuworks.nova.edu/gscis_etd/1021.

Full text

Abstract:

Ensemble models have proved effective in a variety of classification tasks. These models combine the predictions of several base models to achieve higher out-of-sample classification accuracy than the base models. Base models are typically trained using different subsets of training examples and input features. Ensemble classifiers are particularly effective when their constituent base models are diverse in terms of their prediction accuracy in different regions of the feature space. This dissertation investigated methods for combining ensemble models, treating them as base models. The goal is to develop a strategy for combining ensemble classifiers that results in higher classification accuracy than the constituent ensemble models. Three of the best performing tree-based ensemble methods – random forest, extremely randomized tree, and eXtreme gradient boosting model – were used to generate a set of base models. Outputs from classifiers generated by these methods were then combined to create an ensemble classifier. This dissertation systematically investigated methods for (1) selecting a set of diverse base models, and (2) combining the selected base models. The methods were evaluated using public domain data sets which have been extensively used for benchmarking classification models. The research established that applying random forest as the final ensemble method to integrate selected base models and factor scores of multiple correspondence analysis turned out to be the best ensemble approach.

APA, Harvard, Vancouver, ISO, and other styles

3

De, Giorgi Marcello. "Tree ensemble methods for Predictive Maintenance: a case study." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22282/.

Full text

Abstract:

Nel lavoro descritto in questa tesi sono stati creati modelli per la manutenzione predittiva di macchine utensili in ambito industriale; in particolare, i modelli realizzati sono stati addestrati sfruttando degli ensemble tree methods con le finalità di: predire il verificarsi di un guasto in macchina con un anticipo tale da permettere l'organizzazione delle squadre di manutenzione; predire la necessità della sostituzione anticipata dell'utensile utilizzato dalla macchina, per mantenere alti gli standard di qualità. Dopo aver dato uno sfondo al contesto industriale in esame, la tesi illustra i processi seguiti per la creazione e l'aggregazione di un dataset, e l'introduzione di informazioni relative agli eventi in macchina. Analizzato il comportamento di alcune variabili durante la lavorazione ed effettuata una distinzione tra cicli di lavorazione validi e non validi, si procede introducendo gli ensemble tree methods e il motivo della scelta di questa classe di algoritmi. Nel dettaglio, vengono presentati due possibili candidati al problema trattato: Random Forest ed XGBoost; dopo averne descritto il funzionamento, vengono presentati i risultati ottenuti dai modelli proponendo, per stimarne l'efficacia, un funzione di costo atteso come alternativa all'accuracy score. I risultati dei modelli allenati con i due algoritmi proposti vengono infine confrontati.

APA, Harvard, Vancouver, ISO, and other styles

4

Alcaçoas, Dellainey. "Anomaly detection in ring rolling process : Using Tree Ensemble Methods." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-18400.

Full text

Abstract:

Anomaly detection has been studied for many years and has been implemented successfully in many domains. There are various approaches one could adopt to achieve this goal. The core idea behind these is to build a model that is trained in detecting patterns of anomalies. For this thesis, the objective was to detect anomalies and identify the causes for the same given the data about the process in a manufacturing setup. The scenario chosen was of a ring rolling process followed at Ovako steel company in Hofors, Sweden. An approach involving tree ensemble method coupled with manual feature engineering of multivariate time series was adopted. Through the various experiments performed, it was found that the approach was successful in detecting anomalies with an accuracy varying between 79% to 82%. To identify the causes of anomalies, feature importance using Shapley additive explanation method was implemented. Doing so, identified one feature that was very prominent and could be the potential cause for anomaly. In this report, the scope for improvement and future work has also been suggested.

APA, Harvard, Vancouver, ISO, and other styles

5

Gupta, Suraj. "Metagenomic Data Analysis Using Extremely Randomized Tree Algorithm." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/96025.

Full text

Abstract:

Many antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance.
MS

APA, Harvard, Vancouver, ISO, and other styles

6

Assareh, Amin. "OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION." Kent State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1353971575.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Chakraborty, Debaditya. "Detection of Faults in HVAC Systems using Tree-based Ensemble Models and Dynamic Thresholds." University of Cincinnati / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1543582336141076.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Bogdan, Vukobratović. "Hardware Acceleration of Nonincremental Algorithms for the Induction of Decision Trees and Decision Tree Ensembles." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2017. https://www.cris.uns.ac.rs/record.jsf?recordId=102520&source=NDLTD&language=en.

Full text

Abstract:

The thesis proposes novel full decision tree and decision tree ensembleinduction algorithms EFTI and EEFTI, and various possibilities for theirimplementations are explored. The experiments show that the proposed EFTIalgorithm is able to infer much smaller DTs on average, without thesignificant loss in accuracy, when compared to the top-down incremental DTinducers. On the other hand, when compared to other full tree inductionalgorithms, it was able to produce more accurate DTs, with similar sizes, inshorter times. Also, the hardware architectures for acceleration of thesealgorithms (EFTIP and EEFTIP) are proposed and it is shown in experimentsthat they can offer substantial speedups.
У овоj дисертациjи, представљени су нови алгоритми EFTI и EEFTI заформирање стабала одлуке и њихових ансамбала неинкременталномметодом, као и разне могућности за њихову имплементациjу.Експерименти показуjу да jе предложени EFTI алгоритам у могућностида произведе драстично мања стабла без губитка тачности у односу напостојеће top-down инкременталне алгоритме, а стабла знатно већетачности у односу на постојеће неинкременталне алгоритме. Такође супредложене хардверске архитектуре за акцелерацију ових алгоритама(EFTIP и EEFTIP) и показано је да је уз помоћ ових архитектура могућеостварити знатна убрзања.
U ovoj disertaciji, predstavljeni su novi algoritmi EFTI i EEFTI zaformiranje stabala odluke i njihovih ansambala neinkrementalnommetodom, kao i razne mogućnosti za njihovu implementaciju.Eksperimenti pokazuju da je predloženi EFTI algoritam u mogućnostida proizvede drastično manja stabla bez gubitka tačnosti u odnosu napostojeće top-down inkrementalne algoritme, a stabla znatno većetačnosti u odnosu na postojeće neinkrementalne algoritme. Takođe supredložene hardverske arhitekture za akceleraciju ovih algoritama(EFTIP i EEFTIP) i pokazano je da je uz pomoć ovih arhitektura mogućeostvariti znatna ubrzanja.

APA, Harvard, Vancouver, ISO, and other styles

9

Whitley, Michael Aaron. "Using statistical learning to predict survival of passengers on the RMS Titanic." Kansas State University, 2015. http://hdl.handle.net/2097/20541.

Full text

Abstract:

Master of Science
Statistics
Christopher Vahl
When exploring data, predictive analytics techniques have proven to be effective. In this report, the efficiency of several predictive analytics methods are explored. During the time of this study, Kaggle.com, a data science competition website, had the predictive modeling competition, "Titanic: Machine Learning from Disaster" available. This competition posed a classification problem to build a predictive model to predict the survival of passengers on the RMS Titanic. The focus of our approach was on applying a traditional classification and regression tree algorithm. The algorithm is greedy and can over fit the training data, which consequently can yield non-optimal prediction accuracy. In efforts to correct such issues with using the classification and regression tree algorithm, we have implemented cost complexity pruning and ensemble methods such as bagging and random forests. However, no improvement was observed here which may be an artifact associated with the Titanic data and may not be representative of those methods’ performances. The decision trees and prediction accuracy of each method are presented and compared. Results indicate that the predictors sex/title, fare price, age, and passenger class are the most important variables in predicting survival of the passengers.

APA, Harvard, Vancouver, ISO, and other styles

10

Velka, Elina. "Loss Given Default Estimation with Machine Learning Ensemble Methods." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279846.

Full text

Abstract:

This thesis evaluates the performance of three machine learning methods in prediction of the Loss Given Default (LGD). LGD can be seen as the opposite of the recovery rate, i.e. the ratio of an outstanding loan that the loan issuer would not be able to recover in case the customer would default. The methods investigated are decision trees, random forest and boosted methods. All of the methods investigated performed well in predicting the cases were the loan is not recovered, LGD = 1 (100%), or the loan is totally recovered, LGD = 0 (0% ). When the performance of the models was evaluated on a dataset where the observations with LGD = 1 were removed, a significant decrease in performance was observed. The random forest model built on an unbalanced training dataset showed better performance on the test dataset that included values LGD = 1 and the random forest model built on a balanced training dataset performed better on the test set where the observations of LGD = 1 were removed. Boosted models evaluated in this study showed less accurate predictions than other methods used. Overall, the performance of random forest models showed slightly better results than the performance of decision tree models, although the computational time (the cost) was considerably longer when running the random forest models. Therefore decision tree models would be suggested for prediction of the Loss Given Default.
Denna uppsats undersöker och jämför tre maskininlärningsmetoder som estimerar förlust vid fallissemang (Loss Given Default, LGD). LGD kan ses som motsatsen till återhämtningsgrad, dvs. andelen av det utstående lånet som långivaren inte skulle återfå ifall kunden skulle fallera. Maskininlärningsmetoder som undersöks i detta arbete är decision trees, random forest och boosted metoder. Alla metoder fungerade väl vid estimering av lån som antingen inte återbetalas, dvs. LGD = 1 (100%), eller av lån som betalas i sin helhet, LGD = 0 (0%). En tydlig minskning i modellernas träffsäkerhet påvisades när modellerna kördes med ett dataset där observationer med LGD = 1 var borttagna. Random forest modeller byggda på ett obalanserat träningsdataset presterade bättre än de övriga modellerna på testset som inkluderade observationer där LGD = 1. Då observationer med LGD = 1 var borttagna visade det sig att random forest modeller byggda på ett balanserat träningsdataset presterade bättre än de övriga modellerna. Boosted modeller visade den svagaste träffsäkerheten av de tre metoderna som blev undersökta i denna studie. Totalt sett visade studien att random forest modeller byggda på ett obalanserat träningsdataset presterade en aning bättre än decision tree modeller, men beräkningstiden (kostnaden) var betydligt längre när random forest modeller kördes. Därför skulle decision tree modeller föredras vid estimering av förlust vid fallissemang.

APA, Harvard, Vancouver, ISO, and other styles

11

Alfuhaid, Abdulaziz Ataallah. "AN AGENT-BASED SYSTEMATIC ENSEMBLE APPROACH FOR AUTO AUCTION PREDICTION." University of Akron / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=akron1542560217326084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Alesand, Elias. "Identification of Flying Drones in Mobile Networks using Machine Learning." Thesis, Linköpings universitet, Kommunikationssystem, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157627.

Full text

Abstract:

Drone usage is increasing, both in recreational use and in the industry. With it comes a number of problems to tackle. Primarily, there are certain areas in which flying drones pose a security threat, e.g., around airports or other no-fly zones. Other problems can appear when there are drones in mobile networks which can cause interference. Such interference comes from the fact that radio transmissions emitted from drones can travel more freely than those from regular UEs (User Equipment) on the ground since there are few obstructions in the air. Additionally, the data traffic sent from drones is often high volume in the form of video streams. The goal of this thesis is to identify so-called "rogue drones" connected to an LTE network. Rogue drones are flying drones that appear to be regular UEs in the network. Drone identification is a binary classification problem where UEs in a network are classified as either a drone or a regular UE and this thesis proposes machine learning methods that can be used to solve it. Classifications are based on radio measurements and statistics reported by UEs in the network. The data for the work in this thesis is gathered through simulations of a heterogenous LTE network in an urban scenario. The primary idea of this thesis is to use a type of cascading classifier, meaning that classifications are made in a series of stages with increasingly complex models where only a subset of examples are passed forward to subsequent stages. The motivation for such a structure is to minimize the computational requirements at the entity making the classifications while still being complex enough to achieve high accuracy. The models explored in this thesis are two-stage cascading classifiers using decision trees and ensemble learning techniques. It is found that close to 60% of the UEs in the dataset can be classified without errors in the first of the two stages. The rest is forwarded to a more complex model which requires more data from the UEs and can achieve up to 98% accuracy.

APA, Harvard, Vancouver, ISO, and other styles

13

Mitchell, Andrew Computer Science &amp Engineering Faculty of Engineering UNSW. "An approach to boosting from positive-only data." Awarded by:University of New South Wales. Computer Science and Engineering, 2004. http://handle.unsw.edu.au/1959.4/20678.

Full text

Abstract:

Ensemble techniques have recently been used to enhance the performance of machine learning methods. However, current ensemble techniques for classification require both positive and negative data to produce a result that is both meaningful and useful. Negative data is, however, sometimes difficult, expensive or impossible to access. In this thesis a learning framework is described that has a very close relationship to boosting. Within this framework a method is described which bears remarkable similarities to boosting stumps and that does not rely on negative examples. This is surprising since learning from positive-only data has traditionally been difficult. An empirical methodology is described and deployed for testing positive-only learning systems using commonly available multiclass datasets to compare these learning systems with each other and with multiclass learning systems. Empirical results show that our positive-only boosting-like method learns, using stumps as a base learner and from positive data only, successfully, and in the process does not pay too heavy a price in accuracy compared to learners that have access to both positive and negative data. We also describe methods of using positive-only learners on multiclass learning tasks and vice versa and empirically demonstrate the superiority of our method of learning in a boosting-like fashion from positive-only data over a traditional multiclass learner converted to learn from positive-only data. Finally we examine some alternative frameworks, such as when additional unlabelled training examples are given. Some theoretical justifications of the results and methods are also provided.

APA, Harvard, Vancouver, ISO, and other styles

14

Ardeshir, G. "Decision tree simplification for classifier ensembles." Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843022/.

Full text

Abstract:

Design of ensemble classifiers involves three factors: 1) a learning algorithm to produce a classifier (base classifier), 2) an ensemble method to generate diverse classifiers, and 3) a combining method to combine decisions made by base classifiers. With regard to the first factor, a good choice for constructing a classifier is a decision tree learning algorithm. However, a possible problem with this learning algorithm is its complexity which has only been addressed previously in the context of pruning methods for individual trees. Furthermore, the ensemble method may require the learning algorithm to produce a complex classifier. Considering the fact that performance of simplification methods as well as ensemble methods changes from one domain to another, our main contribution is to address a simplification method (post-pruning) in the context of ensemble methods including Bagging, Boosting and Error-Correcting Output Code (ECOC). Using a statistical test, the performance of ensembles made by Bagging, Boosting and ECOC as well as five pruning methods in the context of ensembles is compared. In addition to the implementation a supporting theory called Margin, is discussed and the relationship of Pruning to bias and variance is explained. For ECOC, the effect of parameters such as code length and size of training set on performance of Pruning methods is also studied. Decomposition methods such as ECOC are considered as a solution to reduce complexity of multi-class problems in many real problems such as face recognition. Focusing on the decomposition methods, AdaBoost.OC which is a combination of Boosting and ECOC is compared with the pseudo-loss based version of Boosting, AdaBoost.M2. In addition, the influence of pruning on the performance of ensembles is studied. Motivated by the result that both pruned and unpruned ensembles made by AdaBoost.OC have similar accuracy, pruned ensembles are compared with ensembles of single node decision trees. This results in the hypothesis that ensembles of simple classifiers may give better performance as shown for AdaBoost.OC on the identification problem in face recognition. The implication is that in some problems to achieve best accuracy of an ensemble, it is necessary to select base classifier complexity.

APA, Harvard, Vancouver, ISO, and other styles

15

Ahmad, Amir. "Data Transformation for Decision Tree Ensembles." Thesis, University of Manchester, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508528.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Kobayashi, Izumi. "Randomized ensemble methods for classification trees." Diss., Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2002. http://library.nps.navy.mil/uhtbin/hyperion-image/02sep%5FKobayashi.pdf.

Full text

Abstract:

Thesis (Ph. D. in Operations Research)--Naval Postgraduate School, September 2002.
Dissertation supervisor: Samuel E. Buttrey. Includes bibliographical references (p. 117-119). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

17

Sinsel, Erik W. "Ensemble learning for ranking interesting attributes." Morgantown, W. Va. : [West Virginia University Libraries], 2005. https://eidr.wvu.edu/etd/documentdata.eTD?documentid=4400.

Full text

Abstract:

Thesis (M.S.)--West Virginia University, 2005.
Title from document title page. Document formatted into pages; contains viii, 81 p. : ill. Includes abstract. Includes bibliographical references (p. 72-74).

APA, Harvard, Vancouver, ISO, and other styles

18

Pisetta, Vincent. "New Insights into Decision Trees Ensembles." Thesis, Lyon 2, 2012. http://www.theses.fr/2012LYO20018/document.

Full text

Abstract:

Les ensembles d’arbres constituent à l’heure actuelle l’une des méthodes d’apprentissage statistique les plus performantes. Toutefois, leurs propriétés théoriques, ainsi que leurs performances empiriques restent sujettes à de nombreuses questions. Nous proposons dans cette thèse d’apporter un nouvel éclairage à ces méthodes. Plus particulièrement, après avoir évoqué les aspects théoriques actuels (chapitre 1) de trois schémas ensemblistes principaux (Forêts aléatoires, Boosting et Discrimination Stochastique), nous proposerons une analyse tendant vers l’existence d’un point commun au bien fondé de ces trois principes (chapitre 2). Ce principe tient compte de l’importance des deux premiers moments de la marge dans l’obtention d’un ensemble ayant de bonnes performances. De là, nous en déduisons un nouvel algorithme baptisé OSS (Oriented Sub-Sampling) dont les étapes sont en plein accord et découlent logiquement du cadre que nous introduisons. Les performances d’OSS sont empiriquement supérieures à celles d’algorithmes en vogue comme les Forêts aléatoires et AdaBoost. Dans un troisième volet (chapitre 3), nous analysons la méthode des Forêts aléatoires en adoptant un point de vue « noyau ». Ce dernier permet d’améliorer la compréhension des forêts avec, en particulier la compréhension et l’observation du mécanisme de régularisation de ces techniques. Le fait d’adopter un point de vue noyau permet d’améliorer les Forêts aléatoires via des méthodes populaires de post-traitement comme les SVM ou l’apprentissage de noyaux multiples. Ceux-ci démontrent des performances nettement supérieures à l’algorithme de base, et permettent également de réaliser un élagage de l’ensemble en ne conservant qu’une petite partie des classifieurs le composant
Decision trees ensembles are among the most popular tools in machine learning. Nevertheless, their theoretical properties as well as their empirical performances are subject to strong investigation up to date. In this thesis, we propose to shed light on these methods. More precisely, after having described the current theoretical aspects of three main ensemble schemes (chapter 1), we give an analysis supporting the existence of common reasons to the success of these three principles (chapter 2). This last takes into account the two first moments of the margin as an essential ingredient to obtain strong learning abilities. Starting from this rejoinder, we propose a new ensemble algorithm called OSS (Oriented Sub-Sampling) whose steps are in perfect accordance with the point of view we introduce. The empirical performances of OSS are superior to the ones of currently popular algorithms such as Random Forests and AdaBoost. In a third chapter (chapter 3), we analyze Random Forests adopting a “kernel” point of view. This last allows us to understand and observe the underlying regularization mechanism of these kinds of methods. Adopting the kernel point of view also enables us to improve the predictive performance of Random Forests using popular post-processing techniques such as SVM and multiple kernel learning. In conjunction with random Forests, they show greatly improved performances and are able to realize a pruning of the ensemble by conserving only a small fraction of the initial base learners

APA, Harvard, Vancouver, ISO, and other styles

19

Rosales, Elisa Renee. "Predicting Patient Satisfaction With Ensemble Methods." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/595.

Full text

Abstract:

Health plans are constantly seeking ways to assess and improve the quality of patient experience in various ambulatory and institutional settings. Standardized surveys are a common tool used to gather data about patient experience, and a useful measurement taken from these surveys is known as the Net Promoter Score (NPS). This score represents the extent to which a patient would, or would not, recommend his or her physician on a scale from 0 to 10, where 0 corresponds to "Extremely unlikely" and 10 to "Extremely likely". A large national health plan utilized automated calls to distribute such a survey to its members and was interested in understanding what factors contributed to a patient's satisfaction. Additionally, they were interested in whether or not NPS could be predicted using responses from other questions on the survey, along with demographic data. When the distribution of various predictors was compared between the less satisfied and highly satisfied members, there was significant overlap, indicating that not even the Bayes Classifier could successfully differentiate between these members. Moreover, the highly imbalanced proportion of NPS responses resulted in initial poor prediction accuracy. Thus, due to the non-linear structure of the data, and high number of categorical predictors, we have leveraged flexible methods, such as decision trees, bagging, and random forests, for modeling and prediction. We further altered the prediction step in the random forest algorithm in order to account for the imbalanced structure of the data.

APA, Harvard, Vancouver, ISO, and other styles

20

Ahmed, Istiak. "An ensemble learning approach based on decision trees and probabilistic argumentation." Thesis, Umeå universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-175967.

Full text

Abstract:

This research discusses a decision support system that includes different machine learning approaches (e.g. ensemble learning, decision trees) and a symbolic reasoning approach (e.g. argumentation). The purpose of this study is to define an ensemble learning algorithm based on formal argumentation and decision trees. Using a decision tree algorithmas a base learning algorithm and an argumentation framework as a decision fusion technique of an ensemble architecture, the proposed system produces outcomes. The introduced algorithm is a hybrid ensemble learning approach based on a formal argumentation-based method. It is evaluated with sample data sets (e.g. an open-access data set and an extracted data set from ultrasound images) and it provides satisfactory outcomes. This study approaches the problem that is related to an ensemble learning algorithm and a formal argumentation approach. A probabilistic argumentation framework is implemented as a decision fusion in an ensemble learning approach. An open-access library is also developed for the user. The generic version of the library can be used in different purposes.

APA, Harvard, Vancouver, ISO, and other styles

21

Baffoe, Nana Ama Appiaa. "Diagnostic Tools for Forecast Ensembles." Case Western Reserve University School of Graduate Studies / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=case1522964882574611.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Rangemo, Jesper. "Basisten bestämmer : Ett experiment i att leda en ensemble utefter tre olika metoder." Thesis, Luleå tekniska universitet, Institutionen för konst, kommunikation och lärande, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-67215.

Full text

Abstract:

Det här arbetet behandlar hur en basist kan gå till väga för att påverka sina medmusiker. Jag har genom tre olika metoder undersökt samspel och hur olika metoder påverkar både helhetssoundet och de enskilda musikernas prestation. De tre olika metoderna har till största delen gått ut på att förändra min roll i ensemblen och har bestämt vilka medel jag har att tillgå för att påverka. Varje metod har tilldelats en egenskriven låt som har framförts tillsammans med ett band. Det har gjorts mellan fem och sju inspelningar av varje låt och de inspelningarna har sedan legat till grund för mina reflektioner och analyser. Idén bakom arbetet var till stor del en önskan om att få mer självdistans och medvetenhet i mitt spel. Mycket av det jag gjorde visade sig inte ge det förväntade resultatet. Det är inte alltid självklart att det man tror man förmedlar är det som kommer fram. Men trots allt påverkar alla ens beslut resultatet oavsett om det blir som man tänkt eller inte. Musiken är summan av delarna och man är alltid en del av summan – vilken roll man än tar.

APA, Harvard, Vancouver, ISO, and other styles

23

Gangadhara, Kanthi, and Dubbaka Sai Anusha Reddy. "Comparing Compound and Ordinary Diversity measures Using Decision Trees." Thesis, Högskolan i Borås, Institutionen Handels- och IT-högskolan, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-20385.

Full text

Abstract:

An ensemble of classifiers succeeds in improving the accuracy of the whole when thecomponent classifiers are both diverse and accurate. Diversity is required to ensure that theclassifiers make uncorrelated errors. Theoretical and experimental approaches from previousresearch show very low correlation between ensemble accuracy and diversity measure.Introducing Proposed Compound diversity functions by Albert Hung-Ren KO and RobertSabourin, (2009), by combining diversities and performances of individual classifiers exhibitstrong correlations between the diversities and accuracy. To be consistent with existingarguments compound diversity of measures are evaluated and compared with traditionaldiversity measures on different problems. Evaluating diversity of errors and comparison withmeasures are significant in this study. The results show that compound diversity measuresare better than ordinary diversity measures. However, the results further explain evaluation ofdiversity of errors on available data.
Program: Magisterutbildning i informatik

APA, Harvard, Vancouver, ISO, and other styles

24

Haris, Daniel. "Optimalizace strojového učení pro predikci KPI." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385922.

Full text

Abstract:

This thesis aims to optimize the machine learning algorithms for predicting KPI metrics for an organization. The organization is predicting whether projects meet planned deadlines of the last phase of development process using machine learning. The work focuses on the analysis of prediction models and sets the goal of selecting new candidate models for the prediction system. We have implemented a system that automatically selects the best feature variables for learning. Trained models were evaluated by several performance metrics and the best candidates were chosen for the prediction. Candidate models achieved higher accuracy, which means, that the prediction system provides more reliable responses. We suggested other improvements that could increase the accuracy of the forecast.

APA, Harvard, Vancouver, ISO, and other styles

25

Brugeat, Céline. "Quand l'Amérique collectionnait des cloîtres gothiques : les ensembles de Trie-sur-Baïse, Bonnefont-en-Comminges et Montréjeau." Thesis, Toulouse 2, 2016. http://www.theses.fr/2016TOU20036.

Full text

Abstract:

Trois cloîtres attribués aux couvents de "Trie-sur-Baïse", "Bonnefont-en-Comminges" (aux Cloisters, New York) et "Montréjeau" (Paradise Island, Bahamas) furent remontés outre-Atlantique au cours du XXe siècle. L'installation moderne de tels monuments en Amérique nous incitent à nous intéresser à ce goût, exprimé dés le début du XXe siècle, pour l’architecture médiévale. Selon les premières attributions, les pierres proviendraient d'abbayes des Pyrénées centrales, dont les vestiges furent dispersés au cours de l'Histoire. Les troubles des guerres de Religion, l’abandon progressif des établissements par les communautés religieuses, l’aliénation de leur temporel pendant la Révolution portèrent un coup sévère à l’intégrité des bâtiments monastiques ; mais, de la période post-révolutionnaire jusqu’au début du XXe siècle, ce sont bien les discrètes transactions entre particuliers et antiquaires, qui firent disparaître de la mémoire collective l’origine même des pierres, particulièrement celles des cloîtres en marbre, convoitées pour leur décor. Identifier leur provenance fut l'enjeu majeur de cette étude. Ces marbres sculptés présentent un programme iconographique riche et varié : les ensembles de "Bonnefont-en-Comminges" et de "Montréjeau" proposent un décor de feuillage stylisé tandis que celui de "Trie-sur-Baïse" expose des scènes figurées originales. Mener une analyse de ces sculptures a permis de les restituer dans leur contexte architectural originel
Three cloisters attributed to the monasteries of "Trie-sur-Baise", " Bonnefont-en-Comminges" (the Cloisters, New York) and "Montréjeau" (Paradise Island, Bahamas) were purchased by American collectors and rebuilt, during the XXth century, in North America. The modern assembly of such monuments generates interest on the taste of these American amateurs, from the beginning of XXth century, for medieval European architecture. While respectively attributed to the monasteries of "Trie-sur-Baise", "Bonnefont-en-Comminges" (the Cloisters, New York) and "Montréjeau" (Paradise Island, Bahamas), the initial attribution states that the stones were from central Pyrenees monasteries, whose ruins were scattered throughout ancient times : the Hundred-year war as well as the wars of religion, the gradual desertion of religious institutions by their communities during the XVIIth and XVIIIth centuries and, at last, the alienation of their properties during the Revolution seriously damaged the integrity of monastic buildings. However, during the post-revolutionary period until the early XXth century, many discrete transactions between individuals and antique dealers further took away the stones real origin from the collective memory, especially cloisters sculptures coveted for their ornament. Identifying the cloisters provenance was the main subject of this study. The three carved marbles present various iconography ; while the "Bonnefont-en-Comminges" and "Montréjeau" ensembles both show stylized foliage ornaments, the "Trie-sur-Baise" cloister depicts original figurative scenes. Carrying out an in-depth study of these sculptures made it possible to accurately associate the cloisters to their original architectural set and production context

APA, Harvard, Vancouver, ISO, and other styles

26

Johansson, Samuel, and Karol Wojtulewicz. "Machine learning algorithms in a distributed context." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148920.

Full text

Abstract:

Interest in distributed approaches to machine learning has increased significantly in recent years due to continuously increasing data sizes for training machine learning models. In this thesis we describe three popular machine learning algorithms: decision trees, Naive Bayes and support vector machines (SVM) and present existing ways of distributing them. We also perform experiments with decision trees distributed with bagging, boosting and hard data partitioning and evaluate them in terms of performance measures such as accuracy, F1 score and execution time. Our experiments show that the execution time of bagging and boosting increase linearly with the number of workers, and that boosting performs significantly better than bagging and hard data partitioning in terms of F1 score. The hard data partitioning algorithm works well for large datasets where the execution time decrease as the number of workers increase without any significant loss in accuracy or F1 score, while the algorithm performs poorly on small data with an increase in execution time and loss in accuracy and F1 score when the number of workers increase.

APA, Harvard, Vancouver, ISO, and other styles

27

Lundberg, Jacob. "Resource Efficient Representation of Machine Learning Models : investigating optimization options for decision trees in embedded systems." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-162013.

Full text

Abstract:

Combining embedded systems and machine learning models is an exciting prospect. However, to fully target any embedded system, with the most stringent resource requirements, the models have to be designed with care not to overwhelm it. Decision tree ensembles are targeted in this thesis. A benchmark model is created with LightGBM, a popular framework for gradient boosted decision trees. This model is first transformed and regularized with RuleFit, a LASSO regression framework. Then it is further optimized with quantization and weight sharing, techniques used when compressing neural networks. The entire process is combined into a novel framework, called ESRule. The data used comes from the domain of frequency measurements in cellular networks. There is a clear use-case where embedded systems can use the produced resource optimized models. Compared with LightGBM, ESRule uses 72ˆ less internal memory on average, simultaneously increasing predictive performance. The models use 4 kilobytes on average. The serialized variant of ESRule uses 104ˆ less hard disk space than LightGBM. ESRule is also clearly faster at predicting a single sample.

APA, Harvard, Vancouver, ISO, and other styles

28

Bruynooghe, Michel. "Nouveaux algorithmes en classification automatique applicables aux tres grands ensembles de donnees rencontres en traitement d'images et en reconnaissance des formes." Paris 6, 1989. http://www.theses.fr/1989PA066076.

Full text

Abstract:

La strategie de classification de tres grands ensembles de donnees proposee est fondee sur une etape preliminaire de definition d'une, et une seule, partition de l'ensemble a classifier en un grand nombre de classes de faible effectif, suivie d'une etape de classification ascendante hierarchique effectuee sur l'ensemble de classes ainsi obtenues. Une application est proposee concernant un probleme de reconnaissance de formes, sans apprentissage

APA, Harvard, Vancouver, ISO, and other styles

29

Pobi, Shibendra. "A study of machine learning performance in the prediction of juvenile diabetes from clinical test results." [Tampa, Fla] : University of South Florida, 2006. http://purl.fcla.edu/usf/dc/et/SFE0001671.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Santos, Ant?nio de P?dua dos. "Imagin?rio radical e educa??o f?sica: trajet?ria esportiva de corredores de longa dist?ncia." Universidade Federal do Rio Grande do Norte, 2008. http://repositorio.ufrn.br:8080/jspui/handle/123456789/14170.

Full text

Abstract:

Made available in DSpace on 2014-12-17T14:35:57Z (GMT). No. of bitstreams: 1 AntonioPS.pdf: 1422474 bytes, checksum: 56eb2f26bc9793947acf218454e0cb2d (MD5) Previous issue date: 2008-11-14
Cette ?tude propose une lecture du sport d endurance, prenant comme perspective th?orique l imaginaire radical et consid?rant les dimensions socio-historiques e subjectives de la pratique de courses de longue distance. D abord, l ?chantillon la recherche a ?t? compos? de huit sujets-atl?tes du groupe de courreurs de rue Sport Vida. Ainsi, em m?me temps que nous faisons une analyse socio-historique de cette pratique sportive, nous consid?rons l ensemble des aspects s?cio-culturels et poursuivons la recherche avec comme objectif de comprendre les sens qui lui sont attribu?s par les sjuets-atl?tes, au-del? de l aspect ?conomique et de la consommation. Nous observons que, m?me si l alt?tismo qui est pratiqu? a des aspects competitifs (economiques), les atl?tes cr?ent d autres sens pour continuer a pratiquer ce sport, comme les amiti?s, ?tre ensemble avec les amis. Ils rompent avec la logique d?terministe du sport d?passer la limite du corps, vaincre ? n importe quel prix, d?passer les coll?gues -, en cherchant des moments de solidarit?, un sport sans violence et affectif. Nous percevons n?anmoins des contradictions dans le discours de quelques atl?tes quand confessent que le plus important est l amour du sport, les amiti?s, mais r?clament du manque de sponsorts et d appui pour pouvoir s entrainer tranquillement. Cette recherche a aussi montr? que dans la pratique de ce sport, les atl?tes construisent une obstination, sachant le sacrifice qu il impose au corps, mais cela se transforme en plaisir, excitation et recherche d ?motions fortes. Valeurs ?thiques sont aussi construites et valoris?es dans l atl?tisme, ce qui est observ? lorsque que les sujets-atl?tes critiquent avec v?emence a propos de l usage de substances chimiques par les sportifs. En choisissant l imaginaire radical comme principale inspiration th?orique pour cette recherche, il devient ?vident que le sport peut ?tre ressignifi?, ? partir du moment que cet imaginaire est potencialis? dans l enseignement de l ?ducation physique, porvocant chez les ?l?ves une r?flexion critique sur la soci?t? et sur le sport, qui passe ? ?tre redimensionn? vers la solidarit?, avec d?mocratie et autonomie. Enfin, l ?tude a r?v?l? que le sport d endurance est capable de cr?er des liens sociaux et structurer des relations ? partir de cette pratique
Este estudo prop?e uma leitura do esporte de rendimento, com o aporte te?rico do imagin?rio radical, considerando as dimens?es s?cio-hist?ricas e subjetivas na pr?tica de corridas de longa dist?ncia. De in?cio, a amostra da pesquisa foi composta por oito sujeitos-atletas do grupo de corredores de rua Sport Vida. Assim, ao fazermos uma an?lise s?cio-hist?rica dessa pr?tica esportiva, consideramos em conjunto os aspectos socioculturais e seguimos com o objetivo de compreender os sentidos a ela atribu?dos pelos sujeitos-atletas, para al?m do aspecto econ?mico e do consumo. Observamos que, mesmo o atletismo envolvendo aspectos relacionados ao rendimento, os atletas criam outros sentidos para continuarem desenvolvendo essa pr?tica, como as amizades, o estar juntos com os amigos. Eles rompem com a l?gica determinista do esporte ultrapassar o limite do corpo, vencer a qualquer pre?o, sobrepujar os colegas , buscando momentos de solidariedade, um esporte sem viol?ncia e afetivo. Percebemos, por?m, contradi??es, no discurso de alguns atletas, quando confessam que o mais importante ? o amor pelo esporte, as amizades, mas reclamam da falta de patroc?nio e de apoio para poderem treinar com mais tranq?ilidade. Esta pesquisa tamb?m revelou que, nessa pr?tica, os atletas constroem uma obstina??o, devido ao sacrif?cio que ela imp?e ao corpo, por?m isso ? transformado em prazer, excita??o e busca de fortes emo??es. Valores ?ticos tamb?m s?o constru?dos e valorizados no atletismo, o que ? observado quando os sujeitos-atletas fazem cr?ticas contundentes ao uso de subst?ncias qu?micas por aqueles(as) que o praticam. Ao tomarmos o imagin?rio radical como principal fonte te?rica para esta pesquisa, fica evidente que o esporte pode ser ressignificado, desde que esse imagin?rio seja potencializado no ensino da educa??o f?sica, provocando nos alunos uma reflex?o cr?tica sobre a sociedade e sobre o esporte, que passa a ser direcionado para a solidariedade, com democracia e autonomia. Enfim, o estudo revelou que o esporte de rendimento ? capaz de criar la?os sociais e estruturar rela??es ? sua volta

APA, Harvard, Vancouver, ISO, and other styles

31

REIG, BRUNO. "Evaluation des couplages electromagnetiques dans des sous ensembles hyperfrequences tres integres. Etude et developpement des technologies de boitier et de connectique. Comparaison entre la modelisation et l'experimentation." Paris 6, 1999. http://www.theses.fr/1999PA066690.

Full text

Abstract:

L'objectif de ce travail a ete de developper une technologie permettant de realiser a prix competitif l'assemblage de sous ensembles hyperfrequence tres integres. Les resultats doivent permettre en particulier de realiser a bas cout le cablage et le packaging d'un module actif emission/reception pour antenne a balayage electronique, tout en reduisant sa surface au pas du reseau d'elements rayonnants pour pouvoir l'integrer dans une peau active. La technologie developpee consiste a hybrider sur un substrat de silicium les composants mmic d'un module hyperfrequence, et a realiser les interconnexions entre composants par l'intermediaire d'une ou plusieurs couches de films polymeres laminees sur les composants, ces couches portant des lignes de propagation et des trous metallises aux droits des plots de connexions. Des trous metallises permettent egalement de remonter les signaux en surface du module, et des micro-billes d'etain/plomb autorisent un report de type micro-bga du module sur un circuit d'accueil. La modelisation des couplages electromagnetiques, et des transitions de geometrie complexe a l'interieur d'un module hyperfrequence tres integre, necessite l'emploi d'un logiciel electromagnetique 3d, dont nous nous sommes attaches, dans un premier chapitre, a valider le domaine d'utilisation. Puis, dans le deuxieme chapitre, une implantation en dao a permis de montrer que le choix d'un cablage par film polymere est compatible de la realisation d'un module plat. Dans le troisieme chapitre, nous avons defini les regles de conception, hyperfrequences et technologiques, des interconnexions tridimensionnelles realisant la connectique intra-module. Puis apres avoir optimise ces interconnexions, nous avons developpe un procede de fabrication. Les interconnexions ainsi realisees ont ete validees jusqu'a 40 ghz. Dans le dernier chapitre, nous avons etendu ce procede d'interconnexion au cablage d'un composant gaas, et nous l'avons valide experimentalement jusqu'a 20 ghz. Enfin nous avons realise des micro-billes d'etain/plomb en surface d'un demonstrateur et nous avons effectue avec succes son report sur un circuit imprime.

APA, Harvard, Vancouver, ISO, and other styles

32

Saeed, Nausheen. "Automated Gravel Road Condition Assessment : A Case Study of Assessing Loose Gravel using Audio Data." Licentiate thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-36402.

Full text

Abstract:

Gravel roads connect sparse populations and provide highways for agriculture and the transport of forest goods. Gravel roads are an economical choice where traffic volume is low. In Sweden, 21% of all public roads are state-owned gravel roads, covering over 20,200 km. In addition, there are some 74,000 km of gravel roads and 210,000 km of forest roads that are owned by the private sector. The Swedish Transport Administration (Trafikverket) rates the condition of gravel roads according to the severity of irregularities (e.g. corrugations and potholes), dust, loose gravel, and gravel cross-sections. This assessment is carried out during the summertime when roads are free of snow. One of the essential parameters for gravel road assessment is loose gravel. Loose gravel can cause a tire to slip, leading to a loss of driver control. Assessment of gravel roads is carried out subjectively by taking images of road sections and adding some textual notes. A cost-effective, intelligent, and objective method for road assessment is lacking. Expensive methods, such as laser profiler trucks, are available and can offer road profiling with high accuracy. These methods are not applied to gravel roads, however, because of the need to maintain cost-efficiency. In this thesis, we explored the idea that, in addition to machine vision, we could also use machine hearing to classify the condition of gravel roads in relation to loose gravel. Several suitable classical supervised learning and convolutional neural networks (CNN) were tested. When people drive on gravel roads, they can make sense of the road condition by listening to the gravel hitting the bottom of the car. The more we hear gravel hitting the bottom of the car, the more we can sense that there is a lot of loose gravel and, therefore, the road might be in a bad condition. Based on this idea, we hypothesized that machines could also undertake such a classification when trained with labeled sound data. Machines can identify gravel and non-gravel sounds. In this thesis, we used traditional machine learning algorithms, such as support vector machines (SVM), decision trees, and ensemble classification methods. We also explored CNN for classifying spectrograms of audio sounds and images in gravel roads. Both supervised learning and CNN were used, and results were compared for this study. In classical algorithms, when compared with other classifiers, ensemble bagged tree (EBT)-based classifiers performed best for classifying gravel and non-gravel sounds. EBT performance is also useful in reducing the misclassification of non-gravel sounds. The use of CNN also showed a 97.91% accuracy rate. Using CNN makes the classification process more intuitive because the network architecture takes responsibility for selecting the relevant training features. Furthermore, the classification results can be visualized on road maps, which can help road monitoring agencies assess road conditions and schedule maintenance activities for a particular road.

Due to unforeseen circumstances the seminar was postponed from May 7 to 28, as duly stated in the new posting page.

APA, Harvard, Vancouver, ISO, and other styles

33

Mattes, Julian. "Invariants statistiques et structurels définis par l'arbre de confinement pour le recalage d'images et l'analyse du mouvement." Université Joseph Fourier (Grenoble), 2000. http://www.theses.fr/2000GRE1A002.

Full text

Abstract:

En imagerie médicale, nous sommes confrontés au problème consistant à recaler deux images des mêmes objets, obtenues après des mouvements ou des déformations, ou après une perte de l'orientation d'une image par rapport à l'autre. Dans le cadre de cette thèse, nous avons introduit une nouvelle méthodologie pour nous attaquer à ce problème: elle consiste à suivre les composantes connexes des ensembles de niveau de la fonction de niveau de gris (les conineurs) en utilisant la structure hiérarchique (arbre de confinement) qu'elles forment, prises à différents niveaux. Les propriétés d'invariance de cette structure, montrées dans cette thèse, permettent de définir deux types de landmarks, qui sont des paires de points ou de lignes, qui se correspondent dans les deux images. Les points sont définis à partir des barycentres des confineurs et les lignes à partir de leurs contours. - Après avoir donné une revue synthétique des notions, en imagerie, liées à l'arbre de confinement et à ses différentes applications, nous proposons un algorithme qui permet le calcul efficace de cette structure en temps O(n + nn log nn), où n est le nombre de pixels et nn le nombre de noeuds de l'arbre. - C'est l'application de l'arbre de confinement au problème décrit ci-dessus, qui est étudiée essentiellement dans cette thèse. Dans une première étape, nous proposons un algorithme pour le recalage rigide et servi-rigide (incluant une homothétie), fondé sur les deux ensembles de points extraits des deux images, qui est indépendant de la position initiale des images. L'association entre les points après le recalage rigide définit des landmarks de points, parmi lesquels nous pouvons détecter, avec une mesure structurelle, les «outliers> qui ne correspondent pas à une vraie déformation. La structure hiérarchique nous permet de concevoir un procédé «du plus grossier au plus fin» pour suivre les déformations locales. Nous évaluons notre algorithme sur différents types d'images 2D (cellules, cerveau). Finalement, nous proposons d'étendre la méthodologie aux contours, à la place des images

APA, Harvard, Vancouver, ISO, and other styles

34

Mattes, Julian. "Invariants statistiques et structurels définis par l'arbre de confinement pour le recalage d'images et l'analyse du mouvement." Université Joseph Fourier (Grenoble ; 1971-2015), 2000. http://www.theses.fr/2000GRE10247.

Full text

Abstract:

En imagerie médicale, nous sommes confrontés au problème consistant à recaler deux images des mêmes objets, obtenues après des mouvements ou des déformations, ou après une perte de l'orientation d'une image par rapport à l'autre. Dans le cadre de cette thèse, nous avons introduit une nouvelle méthodologie pour nous attaquer à ce problème: elle consiste à suivre les composantes connexes des ensembles de niveau de la fonction de niveau de gris (les conineurs) en utilisant la structure hiérarchique (arbre de confinement) qu'elles forment, prises à différents niveaux. Les propriétés d'invariance de cette structure, montrées dans cette thèse, permettent de définir deux types de landmarks, qui sont des paires de points ou de lignes, qui se correspondent dans les deux images. Les points sont définis à partir des barycentres des confineurs et les lignes à partir de leurs contours. - Après avoir donné une revue synthétique des notions, en imagerie, liées à l'arbre de confinement et à ses différentes applications, nous proposons un algorithme qui permet le calcul efficace de cette structure en temps O(n + nn log nn), où n est le nombre de pixels et nn le nombre de noeuds de l'arbre. - C'est l'application de l'arbre de confinement au problème décrit ci-dessus, qui est étudiée essentiellement dans cette thèse. Dans une première étape, nous proposons un algorithme pour le recalage rigide et servi-rigide (incluant une homothétie), fondé sur les deux ensembles de points extraits des deux images, qui est indépendant de la position initiale des images. L'association entre les points après le recalage rigide définit des landmarks de points, parmi lesquels nous pouvons détecter, avec une mesure structurelle, les «outliers> qui ne correspondent pas à une vraie déformation. La structure hiérarchique nous permet de concevoir un procédé «du plus grossier au plus fin» pour suivre les déformations locales. Nous évaluons notre algorithme sur différents types d'images 2D (cellules, cerveau). Finalement, nous proposons d'étendre la méthodologie aux contours, à la place des images

APA, Harvard, Vancouver, ISO, and other styles

35

Кичигіна, Анастасія Юріївна. "Прогнозування ІМТ за допомогою методів машинного навчання." Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2020. https://ela.kpi.ua/handle/123456789/37413.

Full text

Abstract:

Дипломна робота містить : 100 с., 17 табл., 16 рис., 2 дод. та 24 джерела. Об’єктом дослідження є індекс маси тіла людини. Предметом дослідження є методи машинного навчання – регресійні моделі, ансамблева модель випадковий ліс та нейронна мережа. В даній роботі проведено дослідження залежності індексу маси тіла людини та наявності надмірної маси тіла від харчових та побутових звичок. Для побудови дослідження були використані методи машинного навчання та аналізу даних, проведено роботу для визначення можливостей по покращенню роботи стандартних моделей та визначено кращу модель для реалізації прогнозування та класифікації на основі наведених даних. Напрямок роботи є в понижені розмірності простору ознак, відбору кращих спостережень з валідними даним для кращої роботи моделей, а також у комбінуванні різних методів навчання та отриманні більш ефективних ансамблевих моделей.
Thesis: 100 p., 17 tabl., 16 fig., 2 add. and 24 references. The object of the study is the human body mass index. The subject of research is machine learning methods - regression models, ensemble model random forest and neural network. In this paper, a study of the dependence of the human body mass index and the presence of excess body weight on eating and living habits. To build the study, the methods of machine learning and data analysis were used, work was done to identify opportunities to improve the performance of standard models and identified the best model for the implementation of predicting and classification based on the data. The direction of work is in the reduced dimensions of the feature space, selection of the best observations with valid data for better performance of models, as well as in combining different teaching methods and obtaining more effective ensemble models.

APA, Harvard, Vancouver, ISO, and other styles

36

Vuk, Vranjković. "Реконфигурабилне архитектуре за хардверску акцелерацију предиктивних модела машинског учења." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2015. http://www.cris.uns.ac.rs/record.jsf?recordId=94819&source=NDLTD&language=en.

Full text

Abstract:

У овој дисертацији представљене су универзалне реконфигурабилнеархитектуре грубог степена гранулације за хардверску имплементацијуDT (decision trees), ANN (artificial neural networks) и SVM (support vectormachines) предиктивних модела као и хомогених и хетерогенихансамбала. Коришћењем ових архитектура реализоване су две врстеDT модела, две врсте ANN модела, две врсте SVM модела и седамврста ансамбала на FPGA (field programmable gate arrays) чипу.Експерименти, засновани на скуповима из стандардне UCI базе скуповаза машинско учење, показују да FPGA имплементација омогућавазначајно убрзање (од 1 до 6 редова величине) просечног временапотребног за предикцију, у поређењу са софтверским решењима.
U ovoj disertaciji predstavljene su univerzalne rekonfigurabilnearhitekture grubog stepena granulacije za hardversku implementacijuDT (decision trees), ANN (artificial neural networks) i SVM (support vectormachines) prediktivnih modela kao i homogenih i heterogenihansambala. Korišćenjem ovih arhitektura realizovane su dve vrsteDT modela, dve vrste ANN modela, dve vrste SVM modela i sedamvrsta ansambala na FPGA (field programmable gate arrays) čipu.Eksperimenti, zasnovani na skupovima iz standardne UCI baze skupovaza mašinsko učenje, pokazuju da FPGA implementacija omogućavaznačajno ubrzanje (od 1 do 6 redova veličine) prosečnog vremenapotrebnog za predikciju, u poređenju sa softverskim rešenjima.
This thesis proposes universal coarse-grained reconfigurable computingarchitectures for hardware implementation of decision trees (DTs), artificialneural networks (ANNs), support vector machines (SVMs), andhomogeneous and heterogeneous ensemble classifiers (HHESs). Usingthese universal architectures, two versions of DTs, two versions of SVMs,two versions of ANNs, and seven versions of HHESs machine learningclassifiers, have been implemented in field programmable gate arrays(FPGA). Experimental results, based on datasets of standard UCI machinelearning repository database, show that FPGA implementation providessignificant improvement (1–6 orders of magnitude) in the average instanceclassification time, in comparison with software implementations.

APA, Harvard, Vancouver, ISO, and other styles

37

Thames, John Lane. "Advancing cyber security with a semantic path merger packet classification algorithm." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45872.

Full text

Abstract:

This dissertation investigates and introduces novel algorithms, theories, and supporting frameworks to significantly improve the growing problem of Internet security. A distributed firewall and active response architecture is introduced that enables any device within a cyber environment to participate in the active discovery and response of cyber attacks. A theory of semantic association systems is developed for the general problem of knowledge discovery in data. The theory of semantic association systems forms the basis of a novel semantic path merger packet classification algorithm. The theoretical aspects of the semantic path merger packet classification algorithm are investigated, and the algorithm's hardware-based implementation is evaluated along with comparative analysis versus content addressable memory. Experimental results show that the hardware implementation of the semantic path merger algorithm significantly outperforms content addressable memory in terms of energy consumption and operational timing.

APA, Harvard, Vancouver, ISO, and other styles

38

Ciss, Saïp. "Forêts uniformément aléatoires et détection des irrégularités aux cotisations sociales." Thesis, Paris 10, 2014. http://www.theses.fr/2014PA100063/document.

Full text

Abstract:

Nous présentons dans cette thèse une application de l'apprentissage statistique à la détection des irrégularités aux cotisations sociales. L'apprentissage statistique a pour but de modéliser des problèmes dans lesquels il existe une relation, généralement non déterministe, entre des variables et le phénomène que l'on cherche à évaluer. Un aspect essentiel de cette modélisation est la prédiction des occurrences inconnues du phénomène, à partir des données déjà observées. Dans le cas des cotisations sociales, la représentation du problème s'exprime par le postulat de l'existence d'une relation entre les déclarations de cotisation des entreprises et les contrôles effectués par les organismes de recouvrement. Les inspecteurs du contrôle certifient le caractère exact ou inexact d'un certain nombre de déclarations et notifient, le cas échéant, un redressement aux entreprises concernées. L'algorithme d'apprentissage "apprend", grâce à un modèle, la relation entre les déclarations et les résultats des contrôles, puis produit une évaluation de l'ensemble des déclarations non encore contrôlées. La première partie de l'évaluation attribue un caractère régulier ou irrégulier à chaque déclaration, avec une certaine probabilité. La seconde estime les montants de redressement espérés pour chaque déclaration. Au sein de l'URSSAF (Union de Recouvrement des cotisations de Sécurité sociale et d'Allocations Familiales) d'Île-de-France, et dans le cadre d'un contrat CIFRE (Conventions Industrielles de Formation par la Recherche), nous avons développé un modèle de détection des irrégularités aux cotisations sociales que nous présentons et détaillons tout au long de la thèse. L'algorithme fonctionne sous le logiciel libre R. Il est entièrement opérationnel et a été expérimenté en situation réelle durant l'année 2012. Pour garantir ses propriétés et résultats, des outils probabilistes et statistiques sont nécessaires et nous discutons des aspects théoriques ayant accompagné sa conception. Dans la première partie de la thèse, nous effectuons une présentation générale du problème de la détection des irrégularités aux cotisations sociales. Dans la seconde, nous abordons la détection spécifiquement, à travers les données utilisées pour définir et évaluer les irrégularités. En particulier, les seules données disponibles suffisent à modéliser la détection. Nous y présentons également un nouvel algorithme de forêts aléatoires, nommé "forêt uniformément aléatoire", qui constitue le moteur de détection. Dans la troisième partie, nous détaillons les propriétés théoriques des forêts uniformément aléatoires. Dans la quatrième, nous présentons un point de vue économique, lorsque les irrégularités aux cotisations sociales ont un caractère volontaire, cela dans le cadre de la lutte contre le travail dissimulé. En particulier, nous nous intéressons au lien entre la situation financière des entreprises et la fraude aux cotisations sociales. La dernière partie est consacrée aux résultats expérimentaux et réels du modèle, dont nous discutons.Chacun des chapitres de la thèse peut être lu indépendamment des autres et quelques notions sont redondantes afin de faciliter l'exploration du contenu
We present in this thesis an application of machine learning to irregularities in the case of social contributions. These are, in France, all contributions due by employees and companies to the "Sécurité sociale", the french system of social welfare (alternative incomes in case of unemployement, Medicare, pensions, ...). Social contributions are paid by companies to the URSSAF network which in charge to recover them. Our main goal was to build a model that would be able to detect irregularities with a little false positive rate. We, first, begin the thesis by presenting the URSSAF and how irregularities can appear, how can we handle them and what are the data we can use. Then, we talk about a new machine learning algorithm we have developped for, "random uniform forests" (and its R package "randomUniformForest") which are a variant of Breiman "random Forests" (tm), since they share the same principles but in in a different way. We present theorical background of the model and provide several examples. Then, we use it to show, when irregularities are fraud, how financial situation of firms can affect their propensity for fraud. In the last chapter, we provide a full evaluation for declarations of social contributions of all firms in Ile-de-France for year 2013, by using the model to predict if declarations present irregularities or not

APA, Harvard, Vancouver, ISO, and other styles

39

Haddad, Karim. "L’Unité Temporelle : une approche pour l’écriture de la durée et de sa quantification." Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUL141.

Full text

Abstract:

Dans cette thèse, nous nous proposons d’étudier une nouvelle approche pour la pratique de l’écriture du temps musical à partir d’un concept de notation dédié à l’écriture de la durée, du rythme, et de la forme musicale. Ce nouveau concept appelé Unité Temporelle ouvre plusieurs interrogations et problématiques se déclinant sur trois axes : la notation des Unités Temporelles, l’opérabilité (représentant le potentiel à créer des formes inédites), et la quantification. Après un examen des diverses approches portant sur le temps, la forme, la durée et sa quantification, il sera question de dégager et de penser une nouvelle grammaire du temps musical portant sur sa syntaxe, sa représentation, à développer un dispositif rhétorique de transformation rythmique propre aux Unités Temporelles. Une fois cette tâche effectuée, nous considérerons les Unités Temporelles dans leur « mise-en-temporalité » soulevant ainsi la question de l’« unité compositionnelle » et ses implications dans le domaine formel. Après une étude exhaustive de la quantification symbolique sur les structures d’Unités Temporelles, nous explorerons le chemin menant à partir de la conception d’une œuvre, à travers l’esquisse et sa réalisation finale par une quantification « juste » préservant l’intégrité du discours musical à travers un regard de certaines œuvres personnelles
In this thesis, we will study a new approach in the practice of musical time composition starting from a notational concept dedicated to the writing of duration, rhythm and musical form. This new concept that we call Time Unit opens on several questions and issues structured around three key principles : the notation of Time Units, their operability (the potentiality to yield new musical form) and quantization. After examining these different approaches on time, form, duration and quantization, we shall try to create a new grammar of musical time directed on its syntax and its representation. We shall build tools for rhythmical transformation and production of Time Units. Once this is achieved, we will study the same Time Units under their real time aspect raising the issue of “compositional” and its implications on the scope of musical form. After a thorough study on symbolic rhythm quantization of Time Units structures we will explore the path starting from the conception of a composition, through its state of sketch, to its final state achieved by a “correct” quantization preserving the integrity of its discourse. This will be illustrated with case study examples, from our personal works

APA, Harvard, Vancouver, ISO, and other styles

40

Ouali, Abdelkader. "Méthodes hybrides parallèles pour la résolution de problèmes d'optimisation combinatoire : application au clustering sous contraintes." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC215/document.

Full text

Abstract:

Les problèmes d’optimisation combinatoire sont devenus la cible de nombreuses recherches scientifiques pour leur importance dans la résolution de problèmes académiques et de problèmes réels rencontrés dans le domaine de l’ingénierie et dans l’industrie. La résolution de ces problèmes par des méthodes exactes ne peut être envisagée à cause des délais de traitement souvent exorbitants que nécessiteraient ces méthodes pour atteindre la (les) solution(s) optimale(s). Dans cette thèse, nous nous sommes intéressés au contexte algorithmique de résolution des problèmes combinatoires, et au contexte de modélisation de ces problèmes. Au niveau algorithmique, nous avons appréhendé les méthodes hybrides qui excellent par leur capacité à faire coopérer les méthodes exactes et les méthodes approchées afin de produire rapidement des solutions. Au niveau modélisation, nous avons travaillé sur la spécification et la résolution exacte des problématiques complexes de fouille des ensembles de motifs en étudiant tout particulièrement le passage à l’échelle sur des bases de données de grande taille. D'une part, nous avons proposé une première parallélisation de l'algorithme DGVNS, appelée CPDGVNS, qui explore en parallèle les différents clusters fournis par la décomposition arborescente en partageant la meilleure solution trouvée sur un modèle maître-travailleur. Deux autres stratégies, appelées RADGVNS et RSDGVNS, ont été proposées qui améliorent la fréquence d'échange des solutions intermédiaires entre les différents processus. Les expérimentations effectuées sur des problèmes combinatoires difficiles montrent l'adéquation et l'efficacité de nos méthodes parallèles. D'autre part, nous avons proposé une approche hybride combinant à la fois les techniques de programmation linéaire en nombres entiers (PLNE) et la fouille de motifs. Notre approche est complète et tire profit du cadre général de la PLNE (en procurant un haut niveau de flexibilité et d’expressivité) et des heuristiques spécialisées pour l’exploration et l’extraction de données (pour améliorer les temps de calcul). Outre le cadre général de l’extraction des ensembles de motifs, nous avons étudié plus particulièrement deux problèmes : le clustering conceptuel et le problème de tuilage (tiling). Les expérimentations menées ont montré l’apport de notre proposition par rapport aux approches à base de contraintes et aux heuristiques spécialisées
Combinatorial optimization problems have become the target of many scientific researches for their importance in solving academic problems and real problems encountered in the field of engineering and industry. Solving these problems by exact methods is often intractable because of the exorbitant time processing that these methods would require to reach the optimal solution(s). In this thesis, we were interested in the algorithmic context of solving combinatorial problems, and the modeling context of these problems. At the algorithmic level, we have explored the hybrid methods which excel in their ability to cooperate exact methods and approximate methods in order to produce rapidly solutions of best quality. At the modeling level, we worked on the specification and the exact resolution of complex problems in pattern set mining, in particular, by studying scaling issues in large databases. On the one hand, we proposed a first parallelization of the DGVNS algorithm, called CPDGVNS, which explores in parallel the different clusters of the tree decomposition by sharing the best overall solution on a master-worker model. Two other strategies, called RADGVNS and RSDGVNS, have been proposed which improve the frequency of exchanging intermediate solutions between the different processes. Experiments carried out on difficult combinatorial problems show the effectiveness of our parallel methods. On the other hand, we proposed a hybrid approach combining techniques of both Integer Linear Programming (ILP) and pattern mining. Our approach is comprehensive and takes advantage of the general ILP framework (by providing a high level of flexibility and expressiveness) and specialized heuristics for data mining (to improve computing time). In addition to the general framework for the pattern set mining, two problems were studied: conceptual clustering and the tiling problem. The experiments carried out showed the contribution of our proposition in relation to constraint-based approaches and specialized heuristics

APA, Harvard, Vancouver, ISO, and other styles

41

Hong, Je-Yi, and 洪哲儀. "Study of Stock Index Trend Using Tree-based Ensemble Classification." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/42855550952005440684.

Full text

Abstract:

碩士
靜宜大學
財務與計算數學系
104
Stock price Index and economic factors interact as both causes and effects.From to the view of investment, the trend prediction of Stock Price Index can be used to reduce the risk of investment.Predicting trends of stock market prices has been an interesting topic for many years.However, due to various subjective and objective factors, forecasting the trend of stock market prices index is a very challenging task. In this study, we treated the prediction of stock market price index as the classification problem.There are many machine learning algorithms can be used for classification including Support Vector Machine, Neural Network and so on.However, very few models are not plausible to understand how they work in practical.We applied Tree methods to take advantage of model interpretation and still keep acceptable prediction power.Comparing with traditional tree methods, random forest increases the difficulty of model interpretation.Therefore, we studied multiple trees structure constructed by real data to find meaningful predicting variables and the procedure to find model interpretable with financial meaning. We created new variables base on the distribution of cut-off values constructed from multiple trees and adjusted by known financial facts.For predicting 2013 Taiwan stock values index, we found that DPO is a highly impact factor.And we applied clustering methods in multiple trees model to identify the forest with small amounts of trees which has competitive prediction accuracy comparing with random forest.

APA, Harvard, Vancouver, ISO, and other styles

42

Yu, Li, and 游力. "Dynamic Ensemble Decision Tree Learning Algorithm for Network Traffic Classification." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/cyqff7.

Full text

Abstract:

碩士
國立交通大學
網路工程研究所
104
Network traffic classification has already been discussed for decades, which gives us the ability to monitor and detect the applications associated with network traffic. It becomes the essential step of network management and traffic engineering such as QoS control, abnormal detection and ISPs network planing. From the earliest approach, which is the port base classification, to the state of the art practice, which is the machine learning classification. Beside, most of the information technology research and advisory organizations have forecast that we are going to enter the era of big data. We would face the high volume, high velocity and high variety data. And machine learning approach traffic classification has satisfying accuracy with lower computing resources, which meet the requirement of high volume and high velocity of big data. However, most of machine learning based traffic classification researches assume the network environment is stable, which is not true. This assumption makes the classifiers unable to deal with highly variety data, since they do not have the countermeasure of the changes of network environment. In order to address the issue, we proposed the dynamic ensemble decision tree learning algorithm or EDT. Our EDT is able to dynamically update its predicting model without retraining whole model all over again. In the experiment, The testing data are collected in our experimental LTE network. Evaluation shows our algorithm can respond to the new application 24 times faster in average than the original C5.0 decision tree learning algorithm without losing more than 1.02% accuracy. The contribution of this thesis is we proposed a new model for decision tree, giving it the ability to dynamically adjust the model.

APA, Harvard, Vancouver, ISO, and other styles

43

Tasi, Wei-Lan, and 蔡維倫. "Improve the Classification Performance for Decision Tree by Population-based Approaches with Ensemble." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/90102780678990829408.

Full text

Abstract:

碩士
華梵大學
資訊管理學系碩士班
97
Data mining techniques have been widely used in prediction or classification problems. The decision trees algorithm (DT) that can provides rule-based tree structure is one of the most popular among them and can be applied to various areas. Nevertheless, different problems may require different parameters when applying DT to build the model and the parameter settings will influence classification result. On the other hand, a dataset may contain many features; however, not all features are beneficial for the model. If the feature selection did not perform may increasing cost and reduce DT learning ability. Therefore, scatter search (SS), genetic algorithm (GA) and particle swarm optimization (PSO) are proposed to select the beneficial subset of features and to obtain the better parameters which will result in a better classifications. The above three meta-heuristic algorithms mentioned above all have their its own strength and weakness. If these algorithms can work together, it is expected that the better results can be obtained. This is so called ensemble. This paper is proposed the ensemble to further enhance the prediction or classification accuracy rate. In order to evaluate the proposed approaches, datasets in UCI (University of California) are planned to evaluate the performance of the proposed approaches. The proposed three meta-heuristic methods-based DT algorithm can find the best parameters and feature subset when face various problems, and provide the higher classification accuracy rate.

APA, Harvard, Vancouver, ISO, and other styles

44

Filipe, Daniel José Canelas. "Using tree-based ensemble methods to improve the B2B customer acquisition process in the fashion industry." Master's thesis, 2020. https://hdl.handle.net/10216/132634.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Filipe, Daniel José Canelas. "Using tree-based ensemble methods to improve the B2B customer acquisition process in the fashion industry." Dissertação, 2020. https://hdl.handle.net/10216/132634.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Silvestre, Martinho de Matos. "Three-stage ensemble model : reinforce predictive capacity without compromising interpretability." Master's thesis, 2019. http://hdl.handle.net/10362/71588.

Full text

Abstract:

Thesis proposal presented as partial requirement for obtaining the Master’s degree in Statistics and Information Management, with specialization in Risk Analysis and Management
Over the last decade, several banks have developed models to quantify credit risk. In addition to the monitoring of the credit portfolio, these models also help deciding the acceptance of new contracts, assess customers profitability and define pricing strategy. The objective of this paper is to improve the approach in credit risk modeling, namely in scoring models to predict default events. To this end, we propose the development of a three-stage ensemble model that combines the results interpretability of the Scorecard with the predictive power of machine learning algorithms. The results show that ROC index improves 0.5%-0.7% and Accuracy 0%-1% considering the Scorecard as baseline.

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Tzu-Tung, and 陳姿彤. "300-year dendroclimatic reconstructions based on conventional methods and Ensemble Empirical Mode Decomposition using Picea morrisonicola tree rings from central Taiwan." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/76399793270131758250.

Full text

Abstract:

碩士
國立臺灣大學
地質科學研究所
99
Virtually very little dendrochronology data have been reported internationally from Taiwan, despite the existence of many dendrochronologically appropriate tree species. In this study, the potential for reconstruction of local paleoclimate was investigated using multi-century tree-ring chronologies developed from Picea morrisonicola (the endemic Taiwan Spruce). Signiﬁcant correlations were found against the mean April-June diurnal temperature range (DTR) and against the mean July-September maximum temperature (Tmax). Both of these climate parameters were reconstructed based on the regression relationships. In a related study, a new frequency decomposition method called empirical mode decomposition (EMD), one part of the Hilbert-Huang Transform (HHT), was investigated as an alternative to standard methods of chronology generation in terms of climate signal. A noise assisted version of EMD called ensemble empirical mode decomposition (EEMD) was used to decompose the tree-ring time series into a series of quasi-periodic modes from high to low frequency. Consecutive modes were combined from high to low frequency and compared with the climate data. The combination with the most significant climate relationships was then used to reconstruct the climate parameters. As with the reconstructions using traditional methods of chronology generation, statistics from the reconstructions of DTR and Tmax also passed tests for model skill. The reconstruction statistics and variance explained were similar for both methods of chronology generation, with EEMD chronology having better results in the DTR reconstruction and the traditional chronology having better results in the Tmax reconstruction. Adjusted latewood ring widths show significant (p<0.01) positive correlation against Alishan July-September Tmax. Linear regression of the Alishan Tmax on the tree-ring chronology produced a calibration model that accounted for 23% of the actual Tmax variance. This model was used to reconstruct the July-September Tmax back to A.D. 1636. The reconstruction shows warm periods during 1718-1726, 1908-1916, and 2002-2008. Evidence from comparisons with NCEP-NCAR reanalysis data indicates that the summer climate variability in Taiwan is regulated by processes associated with changes in the Western Pacific Subtropical High (WPSH). In years with less precipitation the WPSH reduces the southwesterly monsoonal flow by extending further westward than in other years. This appears as an anomalous warm and dry summer accompanied with anti-cyclonic motion over the East China Sea. In addition, eight of the ten warmest summers (July-September Tmax) in central Taiwan occurred during El Niño years, indicating a link between Taiwan summer maximum temperatures and ENSO dynamics. The earlywood mean chronology was calibrated against April-June DTR. A calibration model that accounted for 28% of the actual DTR variance was then produced to reconstruct the DTR. The increasing Tmin, which can be attributed to locally increased cloud cover, contributed to the reduction of DTR. The reconstructed DTR has a cycle of period 28 years, showing the variations in solar irradiance possibly due to cloudiness changes.

APA, Harvard, Vancouver, ISO, and other styles

48

Kandel, Ibrahem Hamdy Abdelhamid. "A comparative study of tree-based models for churn prediction : a case study in the telecommunication sector." Master's thesis, 2019. http://hdl.handle.net/10362/60302.

Full text

Abstract:

Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRM
In the recent years the topic of customer churn gains an increasing importance, which is the phenomena of the customers abandoning the company to another in the future. Customer churn plays an important role especially in the more saturated industries like telecommunication industry. Since the existing customers are very valuable and the acquisition cost of new customers is very high nowadays. The companies want to know which of their customers and when are they going to churn to another provider, so that measures can be taken to retain the customers who are at risk of churning. Such measures could be in the form of incentives to the churners, but the downside is the wrong classification of a churners will cost the company a lot, especially when incentives are given to some non-churner customers. The common challenge to predict customer churn will be how to pre-process the data and which algorithm to choose, especially when the dataset is heterogeneous which is very common for telecommunication companies’ datasets. The presented thesis aims at predicting customer churn for telecommunication sector using different decision tree algorithms and its ensemble models.

APA, Harvard, Vancouver, ISO, and other styles

49

Elmasry, Mohamed Hani Abdelhamid Mohamed Tawfik. "Machine learning approach for credit score analysis : a case study of predicting mortgage loan defaults." Master's thesis, 2019. http://hdl.handle.net/10362/62427.

Full text

Abstract:

Dissertation submitted in partial fulfilment of the requirements for the degree of Statistics and Information Management specialized in Risk Analysis and Management
To effectively manage credit score analysis, financial institutions instigated techniques and models that are mainly designed for the purpose of improving the process assessing creditworthiness during the credit evaluation process. The foremost objective is to discriminate their clients – borrowers – to fall either in the non-defaulter group, that is more likely to pay their financial obligations, or the defaulter one which has a higher probability of failing to pay their debts. In this paper, we devote to use machine learning models in the prediction of mortgage defaults. This study employs various single classification machine learning methodologies including Logistic Regression, Classification and Regression Trees, Random Forest, K-Nearest Neighbors, and Support Vector Machine. To further improve the predictive power, a meta-algorithm ensemble approach – stacking – will be introduced to combine the outputs – probabilities – of the afore mentioned methods. The sample for this study is solely based on the publicly provided dataset by Freddie Mac. By modelling this approach, we achieve an improvement in the model predictability performance. We then compare the performance of each model, and the meta-learner, by plotting the ROC Curve and computing the AUC rate. This study is an extension of various preceding studies that used different techniques to further enhance the model predictivity. Finally, our results are compared with work from different authors.
Para gerir com eficácia a análise de risco de crédito, as instituições financeiras desenvolveram técnicas e modelos que foram projetados principalmente para melhorar o processo de avaliação da qualidade de crédito durante o processo de avaliação de crédito. O objetivo final é classifica os seus clientes - tomadores de empréstimos - entre aqueles que tem maior probabilidade de pagar suas obrigações financeiras, e os potenciais incumpridores que têm maior probabilidade de entrar em default. Neste artigo, nos dedicamos a usar modelos de aprendizado de máquina na previsão de defaults de hipoteca. Este estudo emprega várias metodologias de aprendizado de máquina de classificação única, incluindo Regressão Logística, Classification and Regression Trees, Random Forest, K-Nearest Neighbors, and Support Vector Machine. Para melhorar ainda mais o poder preditivo, a abordagem do conjunto de meta-algoritmos - stacking - será introduzida para combinar as saídas - probabilidades - dos métodos acima mencionados. A amostra deste estudo é baseada exclusivamente no conjunto de dados fornecido publicamente pela Freddie Mac. Ao modelar essa abordagem, alcançamos uma melhoria no desempenho do modelo de previsibilidade. Em seguida, comparamos o desempenho de cada modelo e o meta-aprendiz, plotando a Curva ROC e calculando a taxa de AUC. Este estudo é uma extensão de vários estudos anteriores que usaram diferentes técnicas para melhorar ainda mais o modelo preditivo. Finalmente, nossos resultados são comparados com trabalhos de diferentes autores.

APA, Harvard, Vancouver, ISO, and other styles

50

Dias, Didier Narciso. "Soil Classification Resorting to Machine Learning Techniques." Master's thesis, 2019. http://hdl.handle.net/10362/125335.

Full text

Abstract:

Soil classification is the act of resuming the most relevant information about a soil profile into a single class, from which we can infer a large amount of properties without extensive knowledge of the subject. These classes then make the communication of soils, and how they can best be used in areas such as agriculture and forestry, simpler and easier to understand. Unfortunately soil classification is expensive and requires that specialists perform varied experiments, to be able to precisely attribute a class to a soil profile. This master’s thesis focuses on machine learning algorithms for soil classification mainly based on its intrinsic attributes, in the Mexico region. The data set used contains 6 760 soil profiles, the 19 464 horizons that constitute them, as well as physical and chemical properties, such as pH or organic content, belonging to those horizons. Four data modelling methods were tested (i.e., standard depths, n first layers, thickness, and area weighted thickness), as well as different values for a k-Nearest Neighbours imputation. A comparison between state of the art machine learning algorithms was also made, namely Random Forests, Gradient Tree Boosting, Deep Neural Networks and Recurrent Neural Networks. All of our modelling methods provided very similar results, when properly parametrised, reaching Kappa values of 0.504 and an accuracy of 0.554, with the standard depths method providing the most consistent results. The k parameter for the imputation showed very little impact on the variation on the results. Gradient Tree Boosting was the algorithm with the best overall results, closely followed by the Random Forests model. The neuron based methods never achieved a Kappa score over 0.4, therefore providing substantially worse results.
A classificação de solos é o ato de resumir a informação sobre um perfil do solo em uma única classe, da qual é possivel inferir várias propriedades, mesmo com a ausência de conhecimento sobre a área de estudo. Estas classes fazem a comunicação dos solos e de como estes podem ser usados, em áreas como a agricultura e silvicultura, mais simples de perceber. Infelizmente a classificação de solos é dispendiosa, demorada, e requer especialistas para realizar as experiências necessárias para classificar corretamente o solo em causa. A presente tese de mestrado focou-se na avaliação de algoritmos de aprendizagem automática para o problema de classificação de solos, baseada maioritariamente nos atributos intrínsecos destes, na região do México. Foi utilizada uma base de dados contendo 6 760 perfis de solos, os 19 464 horizontes que os constituem, e as propriedades químicas e físicas, como o pH e a percentagem de barro, pertencentes a esses horizontes. Quatro métodos de modelação de dados foram testados (standard depths, n first layers, thickness, e area weighted thickness), tal como diferentes valores para uma imputação baseada em k-Nearest Neighbours. Também foi realizada uma comparação entre algoritmos de aprendizagem automática, nomeadamente Random Forests, Gradient Tree Boosting, Deep Neural Networks e Recurrent Neural Networks. Todas as modelações de dados providenciaram resultados similares, quando propriamente parametrisados, atingindo valores de Kappa de 0.504 e accuracy de 0.554, sendo que o métdodo standard depths obteve uma performance mais consistente. O parâmetro k, referente ao método de imputação, revelou ter pouco impacto na variação dos resultados. O algoritmo Gradient Tree Boosting foi o que obteve melhores resultados, seguido de perto pelo modelo de Random Forests. Os métodos baseados em neurónios tiveram resultados substancialmente piores, nunca superando um valor de Kappa de 0.4.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Tree Ensemble'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles