Dissertations / Theses on the topic 'Bagging Forest'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'Bagging Forest.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Rosales, Martínez Octavio. "Caracterización de especies en plasma frío mediante análisis de espectroscopia de emisión óptica por técnicas de Machine Learning." Tesis de maestría, Universidad Autónoma del Estado de México, 2020. http://hdl.handle.net/20.500.11799/109734.
Full textБулах, В. А., Л. О. Кіріченко, and Т. А. Радівілова. "Classification of Multifractal Time Series by Decision Tree Methods." Thesis, КНУ, 2018. http://openarchive.nure.ua/handle/document/5840.
Full textAssareh, Amin. "OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION." Kent State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1353971575.
Full textYang, Kaolee. "A Statistical Analysis of Medical Data for Breast Cancer and Chronic Kidney Disease." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1587052897029939.
Full textZoghi, Zeinab. "Ensemble Classifier Design and Performance Evaluation for Intrusion Detection Using UNSW-NB15 Dataset." University of Toledo / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1596756673292254.
Full textUlriksson, Marcus, and Shahin Armaki. "Analys av prestations- och prediktionsvariabler inom fotboll." Thesis, Uppsala universitet, Statistiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-324983.
Full textRosales, Elisa Renee. "Predicting Patient Satisfaction With Ensemble Methods." Digital WPI, 2015. https://digitalcommons.wpi.edu/etd-theses/595.
Full textAlsouda, Yasser. "An IoT Solution for Urban Noise Identification in Smart Cities : Noise Measurement and Classification." Thesis, Linnéuniversitetet, Institutionen för fysik och elektroteknik (IFE), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-80858.
Full textThorén, Daniel. "Radar based tank level measurement using machine learning : Agricultural machines." Thesis, Linköpings universitet, Programvara och system, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176259.
Full textFeng, Wei. "Investigation of training data issues in ensemble classification based on margin concept : application to land cover mapping." Thesis, Bordeaux 3, 2017. http://www.theses.fr/2017BOR30016/document.
Full textClassification has been widely studied in machine learning. Ensemble methods, which build a classification model by integrating multiple component learners, achieve higher performances than a single classifier. The classification accuracy of an ensemble is directly influenced by the quality of the training data used. However, real-world data often suffers from class noise and class imbalance problems. Ensemble margin is a key concept in ensemble learning. It has been applied to both the theoretical analysis and the design of machine learning algorithms. Several studies have shown that the generalization performance of an ensemble classifier is related to the distribution of its margins on the training examples. This work focuses on exploiting the margin concept to improve the quality of the training set and therefore to increase the classification accuracy of noise sensitive classifiers, and to design effective ensemble classifiers that can handle imbalanced datasets. A novel ensemble margin definition is proposed. It is an unsupervised version of a popular ensemble margin. Indeed, it does not involve the class labels. Mislabeled training data is a challenge to face in order to build a robust classifier whether it is an ensemble or not. To handle the mislabeling problem, we propose an ensemble margin-based class noise identification and elimination method based on an existing margin-based class noise ordering. This method can achieve a high mislabeled instance detection rate while keeping the false detection rate as low as possible. It relies on the margin values of misclassified data, considering four different ensemble margins, including the novel proposed margin. This method is extended to tackle the class noise correction which is a more challenging issue. The instances with low margins are more important than safe samples, which have high margins, for building a reliable classifier. A novel bagging algorithm based on a data importance evaluation function relying again on the ensemble margin is proposed to deal with the class imbalance problem. In our algorithm, the emphasis is placed on the lowest margin samples. This method is evaluated using again four different ensemble margins in addressing the imbalance problem especially on multi-class imbalanced data. In remote sensing, where training data are typically ground-based, mislabeled training data is inevitable. Imbalanced training data is another problem frequently encountered in remote sensing. Both proposed ensemble methods involving the best margin definition for handling these two major training data issues are applied to the mapping of land covers
Булах, В. А., Л. О. Кириченко, and Т. А. Радивилова. "Сравнительный анализ классификации мультифрактальных временных рядов." Thesis, 2018. http://openarchive.nure.ua/handle/document/5777.
Full textGanbayar, Otgonkhishig, and Otgonkhishig Ganbayar. "Predicting Credit Risk of Online Peer to Peer Lending by Applying Bagging and Random Forest Ensemble." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/2dr4te.
Full text國立臺灣科技大學
資訊工程系
106
In his research thesis, we aim to analyze credit risk of Online Peer-to-Peer (P2P) lending that is the platform where individuals and businesses lend or borrow money each other through internet without any financial institution like bank. Even though the P2P system gives borrowers and investors some advantages comparing to bank deposit, it faces with a risk of the loan that is not repaid. The Lending Club platform’s publicly available 2015- 2017 loan historical dataset is used in that research. The raw datasets are preprocessed with some filtering method of cleaning data and resampled for training due to imbalance of the initial dataset. We proposed Bagging and Random Forest Ensemble machine learning algorithms for classification of loan status as good or bad loan and Entropy Based Feature Selection method for preprocessing techniques to explore, analyze and determine the factors which play crucial role in predicting the credit risk. The algorithms are optimized to distinguish the potential good loans whilst identifying defaults or bad loans. As well, other machine learning algorithms are applied to compare our proposed method’s effectiveness. The experiment results show that our proposed method can effectively raise the prediction accuracy for default risk.
Lins, Stefan Martin. "Analyse und Vergleich des Modal Splits in den Jahren 2013 und 2018 auf Basis der SrV-Daten mithilfe von Random Forest." 2021. https://tud.qucosa.de/id/qucosa%3A74086.
Full textThe high share of traffic in total emissions, the associated contribution to climate change and the extensive land consumption of individual traffic reinforce the political demands for a traffic turnaround. The aim of this thesis is to develop an optimal classification model with the help of detailed methodical presented methods of machine learning. This enables the evaluation and forcast of the choice of means of transport and thus the modal split on the basis of various influencing factors, particularly over the course of time between 2013 and 2018. Previous studies have focused on non-European areas and one-off surveys. For the analysis, the mobility survey 'SrV-Mobilität in Städten' carried out by the Technische Universität Dresden for the 25 large German cities in 2013 and 2018 is used. After the data processing, the individual feature variables are assessed for their suitability in the modeling process using descriptive methods and correlation measures in order to obtain the most meaningful model results possible. Based on CART Decision Trees, models with the Bagging, Random Forest and Boosting algorithms are created for both years. To classify the effectiveness of the models, models for Artificial Neural Networks and Multinomial Logistic Regression are also examined for both years. Based on Random Forest, which achieved the best quality measures in the study with an overall accuracy of 82.9 % (AUC value 0.9458) for 2013 and 79.8 % (AUC value 0.9377) for 2018, the influencing factors are described and evaluated using a Variable Importance Plot and the Partial Dependence Plot. In particular, it is found that the length and duration of the journey and the availability of a season ticket for public transport have the greatest influence on the choice of the mode of transport. Over the course of time, it is noticeable that in particular motorized traffic routes are being replaced by cycling and public transport, while only minor changes are noticeable in the case of walking. Most of the estimated classification models achieve excellent predictions in the choice of mode of transport, although these predictions are the most difficult for the bicycle.:Inhaltsverzeichnis Abbildungsverzeichnis VII Tabellenverzeichnis XI Abkürzungsverzeichnis XIII Symbolverzeichnis XV 1 Einleitung 1 2 Literaturübersicht 3 3 Methodik 5 3.1 Entscheidungsbäume 5 3.1.1 Notation der Baumstruktur 5 3.1.2 Regressionsbäume 6 3.1.3 Klassifikationsbäume 6 3.1.4 Stutzen eines Baumes und Abbruchkriterien 9 3.1.5 Bewertung des Verfahrens 10 3.2 Bagging 11 3.2.1 Idee 11 3.2.2 Bootstrap 12 3.2.3 Subsampling 12 3.2.4 Prinzip des Bagging-Algorithmus 12 3.2.5 Bewertung des Verfahrens und Anpassung 15 3.3 Random Forest 16 3.3.1 Idee 16 3.3.2 Prinzip des Random-Forest-Algorithmus 17 3.3.3 Bewertung des Verfahrens und Anpassung 20 3.3.4 Bewertung der Einflussfaktoren 21 3.4 Boosting 23 3.4.1 Idee 23 3.4.2 Prinzip des AdaBoost-Verfahrens 24 3.4.3 Evaluation 25 3.5 Künstliches Neuronales Netzwerk 25 3.5.1 Idee 26 3.5.2 Prinzip des Künstlichen Neuronalen Netzwerks 26 3.5.3 Evaluation und Anpassungsparameter 29 3.6 Multinomiale Logistische Regression 30 3.7 Gütemaße 30 3.7.1 Trefferquote 30 3.7.2 ROC-Kurve und AUC 30 4 Daten 33 4.1 Datensatz 33 4.2 Datenaufbereitung 34 4.2.1 Auflösung der Multilevelstruktur 34 4.2.2 Daten in der Haushaltsebene 35 4.2.3 Daten in der Personenebene 36 4.2.4 Daten in der Wegeebene 37 4.2.5 Ausreißer und fehlende Werte 37 5 Deskriptive Analyse 39 5.1 Auswertung der kategorialen abhängigen Variablen 39 5.2 Auswertung der kardinalen Variablen 40 5.2.1 Streu- und Lagemaße 40 5.2.2 Korrelation zwischen den kardinalen Variablen 42 5.3 Auswertung der ordinalen und nominalen Variablen 43 5.3.1 Relative Häufigkeiten 43 5.3.2 Beurteilung der ordinalen und nominalen Variablen mithilfe des korrigierten Kontingenzkoeffizienten nach Pearson 46 5.4 Analyse statistischer Unterschiede der beiden untersuchten Stichproben 47 6 Ergebnisse der Modelle 49 6.1 Baumbasierte Klassifikationsverfahren 49 6.1.1 CART-Entscheidungsbäume 49 6.1.2 Bagging 52 6.1.3 Random Forest 53 6.1.4 Boosting 66 6.2 Künstliches Neuronales Netzwerk 69 6.3 Multinomiale Logistische Regression 71 7 Fazit 73 8 Kritische Würdigung und Ausblick 75 Literaturverzeichnis XIX Anhang XXV Danksagung LXI
Λυπιτάκη, Αναστασία Δήμητρα Δανάη. "Μηχανική μάθηση σε ανομοιογενή δεδομένα." Thesis, 2014. http://hdl.handle.net/10889/8630.
Full textMachine Learning (ML) algorithms can generalize for every class with the same accuracy. In a problem of two classes, positive (true) and negative (false) cases-the algorithm can predict with the same accuracy the positive and negative examples that is the ideal case. In many applications ML algorithms are used in order to learn from data sets that include more examples from the one class in relationship with another class. In general inductive algorithms are designed in such a way that they can minimize the occurred errors. As a conclusion the classes that contain some cases can be ignored in a large percentage since the cost of the false classification of the super-represented class is greater than the cost of false classification of lower class. The problem of imbalanced data sets is occurred in many ‘real’ applications, such as medical diagnosis, robotics, industrial development processes, communication networks error detection, automated testing of electronic equipment and in other related areas. This dissertation entitled ‘Machine Learning with Imbalanced Data’ is referred to the solution of the problem of efficient use of ML algorithms with imbalanced data sets. The thesis includes a general description of basic ML algorithms and related methods for solving imbalanced data sets. A number of algorithmic techniques for handling imbalanced data sets is presented, such as Adacost, Cost Sensitive Boosting, Metacost and other algorithms. The evaluation metrics of ML methods for imbalanced datasets are presented, including the ROC (Receiver Operating Characteristic) curves, the PR (Precision and Recall) curves and cost curves. A new hybrid ML algorithm combining the OverBagging and Rotation Forest algorithms is introduced and the proposed algorithmic procedure is compared with other related algorithms by using the WEKA operational environment. Experimental results demonstrate the performance superiority of the proposed algorithm. Finally, the conclusions of this research work are presented and several future research directions are given.
Rodríguez, Hernán Cortés. "Ensemble classifiers in remote sensing: a comparative analysis." Master's thesis, 2014. http://hdl.handle.net/10362/11671.
Full textLand Cover and Land Use (LCLU) maps are very important tools for understanding the relationships between human activities and the natural environment. Defining accurately all the features over the Earth's surface is essential to assure their management properly. The basic data which are being used to derive those maps are remote sensing imagery (RSI), and concretely, satellite images. Hence, new techniques and methods able to deal with those data and at the same time, do it accurately, have been demanded. In this work, our goal was to have a brief review over some of the currently approaches in the scientific community to face this challenge, to get higher accuracy in LCLU maps. Although, we will be focus on the study of the classifiers ensembles and the different strategies that those ensembles present in the literature. We have proposed different ensembles strategies based in our data and previous work, in order to increase the accuracy of previous LCLU maps made by using the same data and single classifiers. Finally, only one of the ensembles proposed have got significantly higher accuracy, in the classification of LCLU map, than the better single classifier performance with the same data. Also, it was proved that diversity did not play an important role in the success of this ensemble.
Kandel, Ibrahem Hamdy Abdelhamid. "A comparative study of tree-based models for churn prediction : a case study in the telecommunication sector." Master's thesis, 2019. http://hdl.handle.net/10362/60302.
Full textIn the recent years the topic of customer churn gains an increasing importance, which is the phenomena of the customers abandoning the company to another in the future. Customer churn plays an important role especially in the more saturated industries like telecommunication industry. Since the existing customers are very valuable and the acquisition cost of new customers is very high nowadays. The companies want to know which of their customers and when are they going to churn to another provider, so that measures can be taken to retain the customers who are at risk of churning. Such measures could be in the form of incentives to the churners, but the downside is the wrong classification of a churners will cost the company a lot, especially when incentives are given to some non-churner customers. The common challenge to predict customer churn will be how to pre-process the data and which algorithm to choose, especially when the dataset is heterogeneous which is very common for telecommunication companies’ datasets. The presented thesis aims at predicting customer churn for telecommunication sector using different decision tree algorithms and its ensemble models.