To see the other types of publications on this topic, follow the link: Bagging Forest.

Journal articles on the topic 'Bagging Forest'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Bagging Forest.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Jatmiko, Yogo Aryo, Septiadi Padmadisastra, and Anna Chadidjah. "ANALISIS PERBANDINGAN KINERJA CART KONVENSIONAL, BAGGING DAN RANDOM FOREST PADA KLASIFIKASI OBJEK: HASIL DARI DUA SIMULASI." MEDIA STATISTIKA 12, no. 1 (July 24, 2019): 1. http://dx.doi.org/10.14710/medstat.12.1.1-12.

Full text
Abstract:
The conventional CART method is a nonparametric classification method built on categorical response data. Bagging is one of the popular ensemble methods whereas, Random Forests (RF) is one of the relatively new ensemble methods in the decision tree that is the development of the Bagging method. Unlike Bagging, Random Forest was developed with the idea of adding layers to the random resampling process in Bagging. Therefore, not only randomly sampled sample data to form a classification tree, but also independent variables are randomly selected and newly selected as the best divider when determining the sorting of trees, which is expected to produce more accurate predictions. Based on the above, the authors are interested to study the three methods by comparing the accuracy of classification on binary and non-binary simulation data to understand the effect of the number of sample sizes, the correlation between independent variables, the presence or absence of certain distribution patterns to the accuracy generated classification method. results of the research on simulation data show that the Random Forest ensemble method can improve the accuracy of classification.
APA, Harvard, Vancouver, ISO, and other styles
2

Tuysuzoglu, Goksu, and Derya Birant. "Enhanced Bagging (eBagging): A Novel Approach for Ensemble Learning." International Arab Journal of Information Technology 17, no. 4 (July 1, 2020): 515–28. http://dx.doi.org/10.34028/iajit/17/4/10.

Full text
Abstract:
Bagging is one of the well-known ensemble learning methods, which combines several classifiers trained on different subsamples of the dataset. However, a drawback of bagging is its random selection, where the classification performance depends on chance to choose a suitable subset of training objects. This paper proposes a novel modified version of bagging, named enhanced Bagging (eBagging), which uses a new mechanism (error-based bootstrapping) when constructing training sets in order to cope with this problem. In the experimental setting, the proposed eBagging technique was tested on 33 well-known benchmark datasets and compared with both bagging, random forest and boosting techniques using well-known classification algorithms: Support Vector Machines (SVM), decision trees (C4.5), k-Nearest Neighbour (kNN) and Naive Bayes (NB). The results show that eBagging outperforms its counterparts by classifying the data points more accurately while reducing the training error
APA, Harvard, Vancouver, ISO, and other styles
3

Anouze, Abdel Latef M., and Imad Bou-Hamad. "Data envelopment analysis and data mining to efficiency estimation and evaluation." International Journal of Islamic and Middle Eastern Finance and Management 12, no. 2 (April 30, 2019): 169–90. http://dx.doi.org/10.1108/imefm-11-2017-0302.

Full text
Abstract:
PurposeThis paper aims to assess the application of seven statistical and data mining techniques to second-stage data envelopment analysis (DEA) for bank performance.Design/methodology/approachDifferent statistical and data mining techniques are used to second-stage DEA for bank performance as a part of an attempt to produce a powerful model for bank performance with effective predictive ability. The projected data mining tools are classification and regression trees (CART), conditional inference trees (CIT), random forest based on CART and CIT, bagging, artificial neural networks and their statistical counterpart, logistic regression.FindingsThe results showed that random forests and bagging outperform other methods in terms of predictive power.Originality/valueThis is the first study to assess the impact of environmental factors on banking performance in Middle East and North Africa countries.
APA, Harvard, Vancouver, ISO, and other styles
4

Kotsiantis, Sotiris. "Combining bagging, boosting, rotation forest and random subspace methods." Artificial Intelligence Review 35, no. 3 (December 21, 2010): 223–40. http://dx.doi.org/10.1007/s10462-010-9192-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Krautenbacher, Norbert, Fabian J. Theis, and Christiane Fuchs. "Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies." Computational and Mathematical Methods in Medicine 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/7847531.

Full text
Abstract:
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R packagesambia.
APA, Harvard, Vancouver, ISO, and other styles
6

Irawan, Devi, Eza Budi Perkasa, Yurindra Yurindra, Delpiah Wahyuningsih, and Ellya Helmud. "Perbandingan Klassifikasi SMS Berbasis Support Vector Machine, Naive Bayes Classifier, Random Forest dan Bagging Classifier." Jurnal Sisfokom (Sistem Informasi dan Komputer) 10, no. 3 (December 6, 2021): 432–37. http://dx.doi.org/10.32736/sisfokom.v10i3.1302.

Full text
Abstract:
Short message service (SMS) adalah salah satu media komunikasi yang penting untuk mendukung kecepatan pengunaan ponsel oleh pengguna. Sistem hibrid klasifikasi SMS digunakan untuk mendeteksi sms yang dianggap sampah dan benar. Dalam penelitian ini yang diperlukan adalah mengumpulan dataset SMS, pemilihan fitur, prapemrosesan, pembuatan vektor, melakukan penyaringan dan pembaharuan sistem. Dua jenis klasifikasi SMS pada ponsel saat ini ada yang terdaftar sebagai daftar hitam (ditolak) dan daftar putih (diterima). Penelitian ini menggunakan beberapa algoritma seperti support vector machine, Naïve Bayes classifier, Random Forest dan Bagging Classifier. Tujuan dari penelitian ini adalah untuk menyelesaikan semua masalah SMS yang teridentifikasi spam yang banyak terjadi pada saat ini sehingga dapat memberikan masukan dalam perbandingan metode yang mampu menyaring dan memisahkan sms spam dan sms non spam. Pada penelitian ini menghasilkan bahwa Bagging classifier algorithm ini mendapatkan ferformance score tertinggi dari algoritma yang lain yang dapat dipergunakan sebagai sarana untuk memfiltrasi SMS yang masuk ke dalam inbox pengguna dan Bagging classifier algorithm dapat memberikan hasil filtrasi yang akurat untuk menyaring SMS yang masuk.
APA, Harvard, Vancouver, ISO, and other styles
7

Fitriyani, Fitriyani. "Implementasi Forward Selection dan Bagging untuk Prediksi Kebakaran Hutan Menggunakan Algoritma Naïve Bayes." Jurnal Nasional Teknologi dan Sistem Informasi 8, no. 1 (May 2, 2022): 1–8. http://dx.doi.org/10.25077/teknosi.v8i1.2022.1-8.

Full text
Abstract:
Kebakaran hutan tidak hanya menimbulkan kerusakan ekonomi dan ekologi, akan tetapi juga mengancam kehidupan manusia dengan pencemaran udara karena asap yang ditimbulkan.Tingginya angka kejadian kebakaran hutan menentukan pentingnya prediksi dilakukan. Algerian Forest Fire merupakan dataset kebakaran hutan yang digunakan dalam penelitian ini, dimana dataset ini akan diolah dengan model yang diusulkan. Dataset ini memiliki fitur-fitur yang tidak relevan dan akan mempengaruhi terhadap kinerja dari model yang diusulkan, sehingga pemilihan fitur yang relevan menggunakan Forward Selection. Metode Bagging digunakan untuk menangani ketidakseimbangan kelas yang ada pada dataset ini dan algoritma Naïve Bayes sebagai algoritma machine learning yang diimplementasikan dalam penelitian ini. Hasil akurasi terbaik adalah sebesar 98.40% pada model Naive Bayes, Bagging dan Greedy Forward Selection dan 92.63% pada model Naïve Bayes dan Bagging.
APA, Harvard, Vancouver, ISO, and other styles
8

Abellán, Joaquín, Javier G. Castellano, and Carlos J. Mantas. "A New Robust Classifier on Noise Domains: Bagging of Credal C4.5 Trees." Complexity 2017 (2017): 1–17. http://dx.doi.org/10.1155/2017/9023970.

Full text
Abstract:
The knowledge extraction from data with noise or outliers is a complex problem in the data mining area. Normally, it is not easy to eliminate those problematic instances. To obtain information from this type of data, robust classifiers are the best option to use. One of them is the application of bagging scheme on weak single classifiers. The Credal C4.5 (CC4.5) model is a new classification tree procedure based on the classical C4.5 algorithm and imprecise probabilities. It represents a type of the so-calledcredal trees. It has been proven that CC4.5 is more robust to noise than C4.5 method and even than other previous credal tree models. In this paper, the performance of the CC4.5 model in bagging schemes on noisy domains is shown. An experimental study on data sets with added noise is carried out in order to compare results where bagging schemes are applied on credal trees and C4.5 procedure. As a benchmark point, the known Random Forest (RF) classification method is also used. It will be shown that the bagging ensemble using pruned credal trees outperforms the successful bagging C4.5 and RF when data sets with medium-to-high noise level are classified.
APA, Harvard, Vancouver, ISO, and other styles
9

Choi, Sunghyeon, and Jin Hur. "An Ensemble Learner-Based Bagging Model Using Past Output Data for Photovoltaic Forecasting." Energies 13, no. 6 (March 19, 2020): 1438. http://dx.doi.org/10.3390/en13061438.

Full text
Abstract:
As the world is aware, the trend of generating energy sources has been changing from conventional fossil fuels to sustainable energy. In order to reduce greenhouse gas emissions, the ratio of renewable energy sources should be increased, and solar and wind power, typically, are driving this energy change. However, renewable energy sources highly depend on weather conditions and have intermittent generation characteristics, thus embedding uncertainty and variability. As a result, it can cause variability and uncertainty in the power system, and accurate prediction of renewable energy output is essential to address this. To solve this issue, much research has studied prediction models, and machine learning is one of the typical methods. In this paper, we used a bagging model to predict solar energy output. Bagging generally uses a decision tree as a base learner. However, to improve forecasting accuracy, we proposed a bagging model using an ensemble model as a base learner and adding past output data as new features. We set base learners as ensemble models, such as random forest, XGBoost, and LightGBMs. Also, we used past output data as new features. Results showed that the ensemble learner-based bagging model using past data features performed more accurately than the bagging model using a single model learner with default features.
APA, Harvard, Vancouver, ISO, and other styles
10

Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto. "Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 5, no. 1 (February 28, 2021): 187–92. http://dx.doi.org/10.29207/resti.v5i1.2813.

Full text
Abstract:
The world of banking requires a marketer to be able to reduce the risk of borrowing by keeping his customers from occurring non-performing loans. One way to reduce this risk is by using data mining techniques. Data mining provides a powerful technique for finding meaningful and useful information from large amounts of data by way of classification. The classification algorithm that can be used to handle imbalance problems can use the Random Forest (RF) algorithm. However, several references state that an optimization algorithm is needed to improve the classification results of the RF algorithm. Optimization of the RF algorithm can be done using Bagging and Genetic Algorithm (GA). This study aims to classify Bank Marketing data in the form of loan application receipts, which data is taken from the www.data.world site. Classification is carried out using the RF algorithm to obtain a predictive model for loan application acceptance with optimal accuracy. This study will also compare the use of optimization in the RF algorithm with Bagging and Genetic Algorithms. Based on the tests that have been done, the results show that the most optimal performance of the classification of Bank Marketing data is by using the RF algorithm with an accuracy of 88.30%, AUC (+) of 0.500 and AUC (-) of 0.000. The optimization of Bagging and Genetic Algorithm has not been able to improve the performance of the RF algorithm for classification of Bank Marketing data.
APA, Harvard, Vancouver, ISO, and other styles
11

Pérez Rave, Jorge Iván, Favián González Echavarría, and Juan Carlos Correa Morales. "Modeling of apartment prices in a Colombian context from a machine learning approach with stable-important attributes." DYNA 87, no. 212 (January 1, 2020): 63–72. http://dx.doi.org/10.15446/dyna.v87n212.80202.

Full text
Abstract:
The objective of this work is to develop a machine learning model for online pricing of apartments in a Colombian context. This article addresses three aspects: i) it compares the predictive capacity of linear regression, regression trees, random forest and bagging; ii) it studies the effect of a group of text attributes on the predictive capability of the models; and iii) it identifies the more stable-important attributes and interprets them from an inferential perspective to better understand the object of study. The sample consists of 15,177 observations of real estate. The methods of assembly (random forest and bagging) show predictive superiority with respect to others. The attributes derived from the text had a significant relationship with the property price (on a log scale). However, their contribution to the predictive capacity was almost nil, since four different attributes achieved highly accurate predictions and remained stable when the sample change.
APA, Harvard, Vancouver, ISO, and other styles
12

Taqwa Prasetyaningrun, Putri, Irfan Pratama, and Albert Yakobus Chandra. "Implementation Of Machine Learning To Determine The Best Employees Using Random Forest Method." IJCONSIST JOURNALS 2, no. 02 (June 1, 2021): 53–59. http://dx.doi.org/10.33005/ijconsist.v2i02.43.

Full text
Abstract:
In the world of work the presence of the best employees becomes a benchmark of progress of the company itself. In the determination usually by looking at the performance of the employee e.g. from craft, discipline and also other achievements. The goal is to optimize in decision making to the best employees. Models obtained for employee predictions tested on real data sets provided by IBM analytics, which includes 29 features and about 22005 samples. In this paper we try to build system that predicts employee attribution based on A collection of employee data from kaggle website. We have used four different machines learning algorithms such as KNN (Neighbor K-Nearest), Naïve Bayes, Decision Tree, Random Forest plus two ensemble technique namely stacking and bagging. Results are expressed in terms of classic metrics and algorithms that produce the best result for the available data sets is the Random Forest classifier. It reveals the best withdrawals (0,88) as good as the stacking and bagging method with the same value
APA, Harvard, Vancouver, ISO, and other styles
13

SEGUÍ, SANTI, LAURA IGUAL, and JORDI VITRIÀ. "BAGGED ONE-CLASS CLASSIFIERS IN THE PRESENCE OF OUTLIERS." International Journal of Pattern Recognition and Artificial Intelligence 27, no. 05 (August 2013): 1350014. http://dx.doi.org/10.1142/s0218001413500146.

Full text
Abstract:
The problem of training classifiers only with target data arises in many applications where nontarget data are too costly, difficult to obtain, or not available at all. Several one-class classification methods have been presented to solve this problem, but most of the methods are highly sensitive to the presence of outliers in the target class. Ensemble methods have therefore been proposed as a powerful way to improve the classification performance of binary/multi-class learning algorithms by introducing diversity into classifiers. However, their application to one-class classification has been rather limited. In this paper, we present a new ensemble method based on a nonparametric weighted bagging strategy for one-class classification, to improve accuracy in the presence of outliers. While the standard bagging strategy assumes a uniform data distribution, the method we propose here estimates a probability density based on a forest structure of the data. This assumption allows the estimation of data distribution from the computation of simple univariate and bivariate kernel densities. Experiments using original and noisy versions of 20 different datasets show that bagging ensemble methods applied to different one-class classifiers outperform base one-class classification methods. Moreover, we show that, in noisy versions of the datasets, the nonparametric weighted bagging strategy we propose outperforms the classical bagging strategy in a statistically significant way.
APA, Harvard, Vancouver, ISO, and other styles
14

S., Vikas, and Thimmaraju S.N. "Enhancement of Data Classification Accuracy using Bagging Technique in Random Forest." International Journal of Computer Sciences and Engineering 7, no. 8 (August 31, 2019): 185–88. http://dx.doi.org/10.26438/ijcse/v7i8.185188.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Hannan, Abdul, and Jagadeesh Anmala. "Classification and Prediction of Fecal Coliform in Stream Waters Using Decision Trees (DTs) for Upper Green River Watershed, Kentucky, USA." Water 13, no. 19 (October 8, 2021): 2790. http://dx.doi.org/10.3390/w13192790.

Full text
Abstract:
The classification of stream waters using parameters such as fecal coliforms into the classes of body contact and recreation, fishing and boating, domestic utilization, and danger itself is a significant practical problem of water quality prediction worldwide. Various statistical and causal approaches are used routinely to solve the problem from a causal modeling perspective. However, a transparent process in the form of Decision Trees is used to shed more light on the structure of input variables such as climate and land use in predicting the stream water quality in the current paper. The Decision Tree algorithms such as classification and regression tree (CART), iterative dichotomiser (ID3), random forest (RF), and ensemble methods such as bagging and boosting are applied to predict and classify the unknown stream water quality behavior from the input variables. The variants of bagging and boosting have also been looked at for more effective modeling results. Although the Random Forest, Gradient Boosting, and Extremely Randomized Tree models have been found to yield consistent classification results, DTs with Adaptive Boosting and Bagging gave the best testing accuracies out of all the attempted modeling approaches for the classification of Fecal Coliforms in the Upper Green River watershed, Kentucky, USA. Separately, a discussion of the Decision Support System (DSS) that uses Decision Tree Classifier (DTC) is provided.
APA, Harvard, Vancouver, ISO, and other styles
16

Adnan, A., A. M. Yolanda, and F. Natasya. "A Comparison of Bagging and Boosting on Classification Data: Case Study on Rainfall Data in Sultan Syarif Kasim II Meteorological Station in Pekanbaru." Journal of Physics: Conference Series 2049, no. 1 (October 1, 2021): 012053. http://dx.doi.org/10.1088/1742-6596/2049/1/012053.

Full text
Abstract:
Abstract A frequent way for classification data is using a machine learning algorithm alongside ensemble methods like bagging and boosting. In earlier studies, these two algorithms have shown to be very accurate. The aim of this research is to discover performance of bagging and boosting to classify rainfall data obtained at the Sultan Syarif Kasim II Meteorological Station in Pekanbaru from 1 January 2018 until 31 July 2021. Rainfall data are classified into two categories: rainy and non-rainy. The parameters are average temperature, average humidity, sunshine duration, wind direction at maximum speed, and average wind speed. For comparison, this study developed Stochastic Gradient Boosting with Gradient Boosting Modelling and C5.0 from boosting, as well as Bagged Classification and Regression Tree (CART) and Random Forest from bagging. In order to generate reliable conclusions, each algorithm is run 30 times with repeated cross validation. The result demonstrates that Stochastic Gradient Boosting with Gradient Boosting Modelling is the best algorithm based on average accuracy.
APA, Harvard, Vancouver, ISO, and other styles
17

Arabameri, Alireza, Wei Chen, Thomas Blaschke, John P. Tiefenbacher, Biswajeet Pradhan, and Dieu Tien Bui. "Gully Head-Cut Distribution Modeling Using Machine Learning Methods—A Case Study of N.W. Iran." Water 12, no. 1 (December 19, 2019): 16. http://dx.doi.org/10.3390/w12010016.

Full text
Abstract:
To more effectively prevent and manage the scourge of gully erosion in arid and semi-arid regions, we present a novel-ensemble intelligence approach—bagging-based alternating decision-tree classifier (bagging-ADTree)—and use it to model a landscape’s susceptibility to gully erosion based on 18 gully-erosion conditioning factors. The model’s goodness-of-fit and prediction performance are compared to three other machine learning algorithms (single alternating decision tree, rotational-forest-based alternating decision tree (RF-ADTree), and benchmark logistic regression). To achieve this, a gully-erosion inventory was created for the study area, the Chah Mousi watershed, Iran by combining archival records containing reports of gully erosion, remotely sensed data from Google Earth, and geolocated sites of gully head-cuts gathered in a field survey. A total of 119 gully head-cuts were identified and mapped. To train the models’ analysis and prediction capabilities, 83 head-cuts (70% of the total) and the corresponding measures of the conditioning factors were input into each model. The results from the models were validated using the data pertaining to the remaining 36 gully locations (30%). Next, the frequency ratio is used to identify which conditioning-factor classes have the strongest correlation with gully erosion. Using random-forest modeling, the relative importance of each of the conditioning factors was determined. Based on the random-forest results, the top eight factors in this study area are distance-to-road, drainage density, distance-to-stream, LU/LC, annual precipitation, topographic wetness index, NDVI, and elevation. Finally, based on goodness-of-fit and AUROC of the success rate curve (SRC) and prediction rate curve (PRC), the results indicate that the bagging-ADTree ensemble model had the best performance, with SRC (0.964) and PRC (0.978). RF-ADTree (SRC = 0.952 and PRC = 0.971), ADTree (SRC = 0.926 and PRC = 0.965), and LR (SRC = 0.867 and PRC = 0.870) were the subsequent best performers. The results also indicate that bagging and RF, as meta-classifiers, improved the performance of the ADTree model as a base classifier. The bagging-ADTree model’s results indicate that 24.28% of the study area is classified as having high and very high susceptibility to gully erosion. The new ensemble model accurately identified the areas that are susceptible to gully erosion based on the past patterns of formation, but it also provides highly accurate predictions of future gully development. The novel ensemble method introduced in this research is recommended for use to evaluate the patterns of gullying in arid and semi-arid environments and can effectively identify the most salient conditioning factors that promote the development and expansion of gullies in erosion-susceptible environments.
APA, Harvard, Vancouver, ISO, and other styles
18

Ragab, Mahmoud, Ahmed M. K. Abdel Aal, Ali O. Jifri, and Nahla F. Omran. "Enhancement of Predicting Students Performance Model Using Ensemble Approaches and Educational Data Mining Techniques." Wireless Communications and Mobile Computing 2021 (December 7, 2021): 1–9. http://dx.doi.org/10.1155/2021/6241676.

Full text
Abstract:
Student performance prediction is extremely important in today’s educational system. Predicting student achievement in advance can assist students and teachers in keeping track of the student’s progress. Today, several institutes have implemented a manual ongoing evaluation method. Students benefit from such methods since they help them improve their performance. In this study, we can use educational data mining (EDM), which we recommend as an ensemble classifier to anticipate the understudy accomplishment forecast model based on data mining techniques as classification techniques. This model uses distinct datasets which represent the student’s intercommunication with the instructive model. The exhibition of an understudy’s prescient model is evaluated by a kind of classifiers, for instance, logistic regression, naïve Bayes tree, artificial neural network, support vector system, decision tree, random forest, and k -nearest neighbor. Additionally, we used set processes to evolve the presentation of these classifiers. We utilized Boosting, Random Forest, Bagging, and Voting Algorithms, which are the normal group of techniques used in studies. By using ensemble methods, we will have a good result that demonstrates the dependability of the proposed model. For better productivity, the various classifiers are gathered and, afterward, added to the ensemble method using the Vote procedure. The implementation results demonstrate that the bagging method accomplished a cleared enhancement with the DT model, where the DT algorithm accuracy with bagging increased from 90.4% to 91.4%. Recall results improved from 0.904 to 0.914. Precision results also increased from 0.905 to 0.915.
APA, Harvard, Vancouver, ISO, and other styles
19

Kadavi, Prima, Chang-Wook Lee, and Saro Lee. "Application of Ensemble-Based Machine Learning Models to Landslide Susceptibility Mapping." Remote Sensing 10, no. 8 (August 9, 2018): 1252. http://dx.doi.org/10.3390/rs10081252.

Full text
Abstract:
The main purpose of this study was to produce landslide susceptibility maps using various ensemble-based machine learning models (i.e., the AdaBoost, LogitBoost, Multiclass Classifier, and Bagging models) for the Sacheon-myeon area of South Korea. A landslide inventory map including a total of 762 landslides was compiled based on reports and aerial photograph interpretations. The landslides were randomly separated into two datasets: 70% of landslides were selected for the model establishment and 30% were used for validation purposes. Additionally, 20 landslide condition factors divided into five categories (topographic factors, hydrological factors, soil map, geological map, and forest map) were considered in the landslide susceptibility mapping. The relationships among landslide occurrence and landslide conditioning factors were analyzed and the landslide susceptibility maps were calculated and drawn using the AdaBoost, LogitBoost, Multiclass Classifier, and Bagging models. Finally, the maps were validated using the area under the curve (AUC) method. The Multiclass Classifier method had higher prediction accuracy (85.9%) than the Bagging (AUC = 85.4%), LogitBoost (AUC = 84.8%), and AdaBoost (84.0%) methods.
APA, Harvard, Vancouver, ISO, and other styles
20

Siswoyo, Bambang, Zuraida Abal Abas, Ahmad Naim Che Pee, Rita Komalasari, and Nano Suryana. "Ensemble machine learning algorithm optimization of bankruptcy prediction of bank." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 2 (June 1, 2022): 679. http://dx.doi.org/10.11591/ijai.v11.i2.pp679-686.

Full text
Abstract:
The ensemble consists of a single set of individually trained models, the predictions of which are combined when classifying new cases, in building a good classification model requires the diversity of a single model. The algorithm, logistic regression, support vector machine, random forest, and neural network are single models as alternative sources of diversity information. Previous research has shown that ensembles are more accurate than single models. Single model and modified ensemble bagging model are some of the techniques we will study in this paper. We experimented with the banking industry’s financial ratios. The results of his observations are: First, an ensemble is always more accurate than a single model. Second, we observe that modified ensemble bagging models show improved classification model performance on balanced datasets, as they can adjust behavior and make them more suitable for relatively small datasets. The accuracy rate is 97% in the bagging ensemble learning model, an increase in the accuracy level of up to 16% compared to other models that use unbalanced datasets.
APA, Harvard, Vancouver, ISO, and other styles
21

Srivastava, Ankit Kumar. "Short Term Load Forecasting using Regression Trees: Random Forest, Bagging and M5P." International Journal of Advanced Trends in Computer Science and Engineering 9, no. 2 (April 25, 2020): 1898–902. http://dx.doi.org/10.30534/ijatcse/2020/152922020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Jiang, Xiangkui, Chang-an Wu, and Huaping Guo. "Forest Pruning Based on Branch Importance." Computational Intelligence and Neuroscience 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/3162571.

Full text
Abstract:
A forest is an ensemble with decision trees as members. This paper proposes a novel strategy to pruning forest to enhance ensemble generalization ability and reduce ensemble size. Unlike conventional ensemble pruning approaches, the proposed method tries to evaluate the importance of branches of trees with respect to the whole ensemble using a novel proposed metric called importance gain. The importance of a branch is designed by considering ensemble accuracy and the diversity of ensemble members, and thus the metric reasonably evaluates how much improvement of the ensemble accuracy can be achieved when a branch is pruned. Our experiments show that the proposed method can significantly reduce ensemble size and improve ensemble accuracy, no matter whether ensembles are constructed by a certain algorithm such as bagging or obtained by an ensemble selection algorithm, no matter whether each decision tree is pruned or unpruned.
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Xia, and Wei Chen. "GIS-Based Evaluation of Landslide Susceptibility Models Using Certainty Factors and Functional Trees-Based Ensemble Techniques." Applied Sciences 10, no. 1 (December 18, 2019): 16. http://dx.doi.org/10.3390/app10010016.

Full text
Abstract:
The main purpose of this paper is to use ensembles techniques of functional tree-based bagging, rotation forest, and dagging (functional trees (FT), bagging-functional trees (BFT), rotation forest-functional trees (RFFT), dagging-functional trees (DFT)) for landslide susceptibility modeling in Zichang County, China. Firstly, 263 landslides were identified, and the landslide inventory map was established, and the landslide locations were randomly divided into 70% (training data) and 30% (validation data). Then, 14 landslide conditioning factors were selected. Furthermore, the correlation analysis between conditioning factors and landslides was applied using the certainty factor method. Hereafter, four models were applied for landslide susceptibility modeling and zoning. Finally, the receiver operating characteristic (ROC) curve and statistical parameters were used to evaluate and compare the overall performance of the four models. The results showed that the area under the curve (AUC) for the four models was larger than 0.74. Among them, the BFT model is better than the other three models. In addition, this study also illustrated that the integrated model is not necessarily more effective than a single model. The ensemble data mining technology used in this study can be used as an effective tool for future land planning and monitoring.
APA, Harvard, Vancouver, ISO, and other styles
24

Osareh, Alireza, and Bita Shadgar. "An Efficient Ensemble Learning Method for Gene Microarray Classification." BioMed Research International 2013 (2013): 1–10. http://dx.doi.org/10.1155/2013/478410.

Full text
Abstract:
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
APA, Harvard, Vancouver, ISO, and other styles
25

Syahrani, Iswaya Maalik. "ANALISIS PEMBANDINGAN TEKNIK ENSEMBLE SECARA BOOSTING(XGBOOST) DAN BAGGING (RANDOMFOREST) PADA KLASIFIKASI KATEGORI SAMBATAN SEKUENS DNA." Jurnal Penelitian Pos dan Informatika 9, no. 1 (October 1, 2019): 27. http://dx.doi.org/10.17933/jppi.2019.090103.

Full text
Abstract:
<p class="JGI-AbstractIsi">Bioinformatics research currently supported by rapid growth of computation technology and algorithm. Ensemble decision tree is common method for classifying large and complex dataset such as DNA sequence. By implementing two classification methods with ensemble technique like xgboost and random Forest might improve the accuracy result on classifying DNA Sequence splice junction type. With 96,24% of xgboost accuracy and 95,11% of Random Forest accuracy, our conclusions the xgboost and random forest methods using right parameter setting are highly effective tool for classifying small example dataset. Analyzing both methods with their characteristics will give an overview on how they work to meet the needs in DNA splicing.</p>
APA, Harvard, Vancouver, ISO, and other styles
26

Catal, Cagatay, Serkan Tugul, and Basar Akpinar. "Automatic Software Categorization Using Ensemble Methods and Bytecode Analysis." International Journal of Software Engineering and Knowledge Engineering 27, no. 07 (September 2017): 1129–44. http://dx.doi.org/10.1142/s0218194017500425.

Full text
Abstract:
Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.
APA, Harvard, Vancouver, ISO, and other styles
27

Zamir, Ammara, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum, and Maryam Hamdani. "Phishing web site detection using diverse machine learning algorithms." Electronic Library 38, no. 1 (January 2, 2020): 65–80. http://dx.doi.org/10.1108/el-05-2019-0118.

Full text
Abstract:
Purpose This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information. Design/methodology/approach Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy. Findings The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy. Originality/value This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.
APA, Harvard, Vancouver, ISO, and other styles
28

Et.al, M. Veera Kumari. "Collaborative Classification Approach for Airline Tweets Using Sentiment Analysis." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 3 (April 10, 2021): 3597–603. http://dx.doi.org/10.17762/turcomat.v12i3.1639.

Full text
Abstract:
In the world there are so many airline services which facilitate different airline facilities for their customers. Those airline services may satisfy or may not satisfy their customers. Customers cannot express their comments immediately, so airline services provide the twitter blog to give the feedback on their services. Twitter has been increased to develop the quality of services[4]. This paper develop the different classification techniques to improve accuracy for sentiment analysis. The tweets of services are classified into three polarities such as positive, negative and neutral. Classification methods are Random forest(RF), Logistic Regression(LR), K-Nearest Neighbors(KNN), Naïve Baye’s(NB), Decision Tree(DTC), Extreme Gradient Boost(XGB), merging of (two, three and four) classification techniques with majority Voting Classifier, AdaBoost measuring the accuracy achieved by the function using 20-fold and 30-fold cross validation was compassed in the validation phase. In this paper proposes a new ensemble Bagging approach for different classifiers[10]. The metrics of sentiment analysis precision, recall, f1-score, micro average, macro average and accuracy are discovered for all above mentioned classification techniques. In addition average predictions of classifiers and also accuracy of average predictions of classifiers was calculated for getting good quality of services. The result describes that bagging classifiers achieve better accuracy than non-bagging classifiers.
APA, Harvard, Vancouver, ISO, and other styles
29

Alfaro-Navarro, José-Luis, Emilio L. Cano, Esteban Alfaro-Cortés, Noelia García, Matías Gámez, and Beatriz Larraz. "A Fully Automated Adjustment of Ensemble Methods in Machine Learning for Modeling Complex Real Estate Systems." Complexity 2020 (April 14, 2020): 1–12. http://dx.doi.org/10.1155/2020/5287263.

Full text
Abstract:
The close relationship between collateral value and bank stability has led to a considerable need to a rapid and economical appraisal of real estate. The greater availability of information related to housing stock has prompted to the use of so-called big data and machine learning in the estimation of property prices. Although this methodology has already been applied to the real estate market to identify which variables influence dwelling prices, its use for estimating the price of properties is not so frequent. The application of this methodology has become more sophisticated over time, from applying simple methods to using the so-called ensemble methods and, while the estimation capacity has improved, it has only been applied to specific geographical areas. The main contribution of this article lies in developing an application for the entire Spanish market that fully automatically provides the best model for each municipality. Real estate property prices in 433 municipalities are estimated from a sample of 790,631 dwellings, using different ensemble methods based on decision trees such as bagging, boosting, and random forest. The results for estimating the price of dwellings show a good performance of the techniques developed, in terms of the error measures, with the best results being achieved using the techniques of bagging and random forest.
APA, Harvard, Vancouver, ISO, and other styles
30

Pal, Subodh Chandra, Alireza Arabameri, Thomas Blaschke, Indrajit Chowdhuri, Asish Saha, Rabin Chakrabortty, Saro Lee, and Shahab S. Band. "Ensemble of Machine-Learning Methods for Predicting Gully Erosion Susceptibility." Remote Sensing 12, no. 22 (November 10, 2020): 3675. http://dx.doi.org/10.3390/rs12223675.

Full text
Abstract:
Gully formation through water-induced soil erosion and related to devastating land degradation is often a quasi-normal threat to human life, as it is responsible for huge loss of surface soil. Therefore, gully erosion susceptibility (GES) mapping is necessary in order to reduce the adverse effect of land degradation and diminishes this type of harmful consequences. The principle goal of the present research study is to develop GES maps for the Garhbeta I Community Development (C.D.) Block; West Bengal, India, by using a machine learning algorithm (MLA) of boosted regression tree (BRT), bagging and the ensemble of BRT-bagging with K-fold cross validation (CV) resampling techniques. The combination of the aforementioned MLAs with resampling approaches is state-of-the-art soft computing, not often used in GES evaluation. In further progress of our research work, here we used a total of 20 gully erosion conditioning factors (GECFs) and a total of 199 gully head cut points for modelling GES. The variables’ importance, which is responsible for gully erosion, was determined based on the random forest (RF) algorithm among the several GECFs used in this study. The output result of the model’s performance was validated through a receiver operating characteristics-area under curve (ROC-AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) statistical analysis. The predicted result shows that the ensemble of BRT-bagging is the most well fitted for GES where AUC value in K-3 fold is 0.972, whereas the value of AUC in sensitivity, specificity, PPV and NPV is 0.94, 0.93, 0.96 and 0.93, respectively, in a training dataset, and followed by the bagging and BRT model. Thus, from the predictive performance of this research study it is concluded that the ensemble of BRT-Bagging can be applied as a new approach for further studies in spatial prediction of GES. The outcome of this work can be helpful to policy makers in implementing remedial measures to minimize damages caused by gully erosion.
APA, Harvard, Vancouver, ISO, and other styles
31

Lestari, Tiara Suci, and Dwi Agustin Nuriani Sirodj. "Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest." Jurnal Riset Statistika 1, no. 2 (February 13, 2022): 160–67. http://dx.doi.org/10.29313/jrs.v1i2.525.

Full text
Abstract:
Abstract. In today's technological developments, the use of credit cards is a very easy and practical way for customers to make transactions. However, with the increasing use of credit cards, it will lead to financial fraud, namely fraudulent credit card transactions that can harm customers and the bank or company. One technique that can overcome this problem is data mining techniques, namely the classification used to predict fraudulent actions in credit card transactions. The method used is the random forest method, which is an ensemble method by applying bootstrap aggregating (bagging) and random feature selection, which combines several decision trees to form a forest, then to get the results of the final classification prediction through a voting process. The data used is credit card transaction fraud data for 2019-2020. The purpose of the results of this study is to apply the random forest method to the classification of credit card transaction fraud based on the evaluation of classification accuracy such as confusion matrix, accuracy, sensitivity, precision, f-measure and AUC value. The results of the study showed that the application of the random forest method gave very good classification results in classifying fraudulent credit card transactions. Abstrak. Pada perkembangan teknologi saat ini, penggunaan kartu kredit merupakan cara yang sangat mudah dan praktis digunakan pelanggan dalam melakukan transaksi. Tetapi dengan meningkatnya penggunaaan kartu kredit maka akan menimbulkan kecurangan finansial yaitu penipuan transaksi kartu kredit yang dapat merugikan nasabah maupun pihak bank atau perusahaan. Salah satu teknik yang dapat mengatasi masalah tersebut yaitu teknik data mining yaitu klasifikasi yang digunakan untuk memprediksi tindakan penipuan pada transaksi kartu kredit. Metode yang digunakan yaitu metode random forest yang merupakan metode ensemble dengan menerapkan bootstrap aggregating (bagging) dan random feature selection yaitu menggabungkan beberapa pohon keputusan sehingga membentuk hutan (forest), kemudian untuk mendapatkan hasil dugaan klasifikasi akhir melalui proses voting. Data yang digunakan yaitu data penipuan transaksi kartu kredit tahun 2019-2020. Tujuan hasil dari penelitian ini yaitu menerapkan metode random forest pada klasifikasi penipuan transaksi kartu kredit berdasarkan evaluasi ketepatan klasifikasi seperti seperti confusion matrix, akurasi, sensitivitas, presisi, f-measure dan nilai AUC. Hasil dari penelitian didapatkan bahwa penerapan metode random forest memberikan hasil klasifikasi yang sangat baik dalam mengklasifikasikan penipuan transaksi kartu kredit.
APA, Harvard, Vancouver, ISO, and other styles
32

Abedini, Mohammadali, Farzaneh Ahmadzadeh, and Rassoul Noorossana. "Customer credit scoring using a hybrid data mining approach." Kybernetes 45, no. 10 (November 7, 2016): 1576–88. http://dx.doi.org/10.1108/k-09-2015-0228.

Full text
Abstract:
Purpose A crucial decision in financial services is how to classify credit or loan applicants into good and bad applicants. The purpose of this paper is to propose a four-stage hybrid data mining approach to support the decision-making process. Design/methodology/approach The approach is inspired by the bagging ensemble learning method and proposes a new voting method, namely two-level majority voting in the last stage. First some training subsets are generated. Then some different base classifiers are tuned and afterward some ensemble methods are applied to strengthen tuned classifiers. Finally, two-level majority voting schemes help the approach to achieve more accuracy. Findings A comparison of results shows the proposed model outperforms powerful single classifiers such as multilayer perceptron (MLP), support vector machine, logistic regression (LR). In addition, it is more accurate than ensemble learning methods such as bagging-LR or rotation forest (RF)-MLP. The model outperforms single classifiers in terms of type I and II errors; it is close to some ensemble approaches such as bagging-LR and RF-MLP but fails to outperform them in terms of type I and II errors. Moreover, majority voting in the final stage provides more reliable results. Practical implications The study concludes the approach would be beneficial for banks, credit card companies and other credit provider organisations. Originality/value A novel four stages hybrid approach inspired by bagging ensemble method proposed. Moreover the two-level majority voting in two different schemes in the last stage provides more accuracy. An integrated evaluation criterion for classification errors provides an enhanced insight for error comparisons.
APA, Harvard, Vancouver, ISO, and other styles
33

Ardi, Luthfi, Noor Akhmad Setiawan, and Sunu Wibirama. "Eye Blink Classification for Assisting Disability to Communicate Using Bagging and Boosting." IJITEE (International Journal of Information Technology and Electrical Engineering) 5, no. 4 (December 24, 2021): 117. http://dx.doi.org/10.22146/ijitee.63515.

Full text
Abstract:
Disability is a physical or mental impairment. People with disability have more barriers to do certain activity than those without disability. Moreover, several conditions make them having difficulty to communicate with other people. Currently, researchers have helped people with disabilities by developing brain-computer interface (BCI) technology, which uses artifact on electroencephalograph (EEG) as a communication tool using blinks. Research on eye blinks has only focused on the threshold and peak amplitude, while the difference in how many blinks can be detected using peak amplitude has not been the focus yet. This study used primary data taken using a Muse headband on 15 subjects. This data was used as a dataset classified using bagging (random forest) and boosting (XGBoost) methods with python; 80% of the data was allocated for learning and 20% was for testing. The classified data was divided into ten times of testing, which were then averaged. The number of eye blinks’ classification results showed that the accuracy value using random forest was 77.55%, and the accuracy result with the XGBoost method was 90.39%. The result suggests that the experimental model is successful and can be used as a reference for making applications that help people to communicate by differentiating the number of eye blinks. This research focused on developing the number of eye blinks. However, in this study, only three blinking were used so that further research could increase these number.
APA, Harvard, Vancouver, ISO, and other styles
34

Syahrani, Iswaya Maalik. "Comparation Analysis of Ensemble Technique With Boosting(Xgboost) and Bagging (Randomforest) For Classify Splice Junction DNA Sequence Category." Jurnal Penelitian Pos dan Informatika 9, no. 1 (October 1, 2019): 27–36. http://dx.doi.org/10.17933/jppi.v9i1.249.

Full text
Abstract:
Bioinformatics research currently supported by rapid growth of computation technology and algorithm. Ensemble decision tree is common method for classifying large and complex dataset such as DNA sequence. By implementing two classification methods with ensemble technique like xgboost and random Forest might improve the accuracy result on classifying DNA Sequence splice junction type. With 96,24% of xgboost accuracy and 95,11% of Random Forest accuracy, our conclusions the xgboost and random forest methods using right parameter setting are highly effective tool for classifying small example dataset. Analyzing both methods with their characteristics will give an overview on how they work to meet the needs in DNA splicing.
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Mengmeng, Quanbo Ge, Haoyu Jiang, and Gang Yao. "Wear Fault Diagnosis of Aeroengines Based on Broad Learning System and Ensemble Learning." Energies 12, no. 24 (December 12, 2019): 4750. http://dx.doi.org/10.3390/en12244750.

Full text
Abstract:
An aircraft engine (aeroengine) operates in an extremely harsh environment, causing the working state of the engine to constantly change. As a result, the engine is prone to various kinds of wear faults. This paper proposes a new intelligent method for the diagnosis of aeroengine wear faults based on oil analysis, in which broad learning system (BLS) and ensemble learning models are introduced and integrated into the bagging-BLS model, in which 100 sub-BLS models are established, which are further optimized by ensemble learning. Experiments are conducted to verify the proposed method, based on the analysis of oil data, in which the random forest and single BLS algorithms are used for comparison. The results show that the output accuracy of the proposed method is stable (at 0.988), showing that the bagging-BLS model can improve the accuracy and reliability of engine wear fault diagnosis, reflecting the development trend of fault diagnosis in implementing intelligent technology.
APA, Harvard, Vancouver, ISO, and other styles
36

Yaman, Emine, and Abdulhamit Subasi. "Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification." BioMed Research International 2019 (October 31, 2019): 1–13. http://dx.doi.org/10.1155/2019/9152506.

Full text
Abstract:
The neuromuscular disorders are diagnosed using electromyographic (EMG) signals. Machine learning algorithms are employed as a decision support system to diagnose neuromuscular disorders. This paper compares bagging and boosting ensemble learning methods to classify EMG signals automatically. Even though ensemble classifiers’ efficacy in relation to real-life issues has been presented in numerous studies, there are almost no studies which focus on the feasibility of bagging and boosting ensemble classifiers to diagnose the neuromuscular disorders. Therefore, the purpose of this paper is to assess the feasibility of bagging and boosting ensemble classifiers to diagnose neuromuscular disorders through the use of EMG signals. It should be understood that there are three steps to this method, where the step number one is to calculate the wavelet packed coefficients (WPC) for every type of EMG signal. After this, it is necessary to calculate statistical values of WPC so that the distribution of wavelet coefficients could be demonstrated. In the last step, an ensemble classifier used the extracted features as an input of the classifier to diagnose the neuromuscular disorders. Experimental results showed the ensemble classifiers achieved better performance for diagnosis of neuromuscular disorders. Results are promising and showed that the AdaBoost with random forest ensemble method achieved an accuracy of 99.08%, F-measure 0.99, AUC 1, and kappa statistic 0.99.
APA, Harvard, Vancouver, ISO, and other styles
37

Fauzi, Ahmad, Riki Supriyadi, and Nurlaelatul Maulidah. "Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest." Jurnal Infortech 2, no. 1 (June 9, 2020): 96–101. http://dx.doi.org/10.31294/infortech.v2i1.8079.

Full text
Abstract:
Abstrak - Skrining merupakan upaya deteksi dini untuk mengidentifikasi penyakit atau kelainan yang secara klinis belum jelas dengan menggunakan tes, pemeriksaan atau prosedur tertentu. Upaya ini dapat digunakan secara cepat untuk membedakan orang - orang yang kelihatannya sehat tetapi sesungguhnya menderita suatu kelainan.Tujuan utama penelitian ini adalah untuk meningkatkan peforma klasifikasi pada diagnosis kanker payudara dengan menerapkan seleksi fitur pada beberapa algoritme klasifikasi. Penelitian ini menggunakan database kanker payudara Breast Cancer Coimbra Data Set . Metode seleksi fitur berbasis pricipal component analysis akan dipasangkan dengan beberapa algoritme klasifikasi dan metode, seperti Logitboost,Bagging,dan Random Forest. Penelitian ini menggunakan 10 fold cross validation sebagai metode evaluasi. Hasil penelitian menunjukkan metode seleksi fitur berbasis pricipal component analysis mengalami peningkatan peforma klasifikasi secara signifikan setelah dipasangkan dengan seleksi fitur Random Forest dan logitboost, Random forest menunjukan peforma terbaik dengan akurasi 79.3103% dengan nilai AUC sebesar 0,843. Kata Kunci: Seleksi Fitur,PCA, Kanker Payudara,Skrining,Random Forest
APA, Harvard, Vancouver, ISO, and other styles
38

Kilimci, Zeynep, and Sevinç Omurca. "Enhancement of the Heuristic Optimization Based Extended Space Forests with Classifier Ensembles." International Arab Journal of Information Technology 17, no. 2 (February 28, 2019): 188–95. http://dx.doi.org/10.34028/iajit/17/2/6.

Full text
Abstract:
Extended space forests are a matter of common knowledge for ensuring improvements on classification problems. They provide richer feature space and present better performance than the original feature space-based forests. Most of the contemporary studies employs original features as well as various combinations of them as input vectors for extended space forest approach. In this study, we seek to boost the performance of classifier ensembles by integrating them with heuristic optimization-based features. The contributions of this paper are fivefold. First, richer feature space is developed by using random combinations of input vectors and features picked out with ant colony optimization method which have high importance and not have been associated before. Second, we propose widely used classification algorithm which is utilized baseline classifier. Third, three ensemble strategies, namely bagging, random subspace, and random forests are proposed to ensure diversity. Fourth, a wide range of comparative experiments are conducted on widely used biomedicine datasets gathered from the University of California Irvine (UCI) machine learning repository to contribute to the advancement of proposed study. Finally, extended space forest approach with the proposed technique turns out remarkable experimental results compared to the original version and various extended versions of recent state-of-art studies
APA, Harvard, Vancouver, ISO, and other styles
39

Saifudin, A., U. U. Nabillah, Yulianti, and T. Desyani. "Bagging Technique to Reduce Misclassification in Coronary Heart Disease Prediction Based on Random Forest." Journal of Physics: Conference Series 1477 (March 2020): 032009. http://dx.doi.org/10.1088/1742-6596/1477/3/032009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Xing, Xue, Dexin Yu, and Wei Zhang. "Data Calibration Based on Multisensor Using Classification Analysis: A Random Forests Approach." Mathematical Problems in Engineering 2015 (2015): 1–8. http://dx.doi.org/10.1155/2015/708467.

Full text
Abstract:
This paper analyzes the problem of meaningless outliers in traffic detective data sets and researches characteristics about the data of monophyletic detector and multisensor detector based on real-time data on highway. Based on analysis of the current random forests algorithm, which is a learning algorithm of high accuracy and fast speed, new optimum random forests about filtrating outlier in the sample are proposed, which employ bagging strategy combined with boosting strategy. Random forests of different number of trees are applied to analyze status classification of meaningless outliers in traffic detective data sets, respectively, based on traffic flow, spot mean speed, and roadway occupancy rate of traffic parameters. The results show that optimum model of random forest is more accurate to filtrate meaningless outliers in traffic detective data collected from road intersections. With filtrated data for processing, transportation information system can decrease the influence of error data to improve highway traffic information services.
APA, Harvard, Vancouver, ISO, and other styles
41

Teulon, D. A. J., T. J. B. Herman, and M. M. Davidson. "Monitoring Monterey pine aphids in Hawkes Bay forests." New Zealand Plant Protection 56 (August 1, 2003): 39–44. http://dx.doi.org/10.30843/nzpp.2003.56.6029.

Full text
Abstract:
The Monterey pine aphid (Essigella californica) was recently found in New Zealand To examine the seasonal biology and impact of this insect on Pinus radiata aphids were sampled using beating and branch bagging methods over two seasons (October to April) from three forest elevations and from three tree ageclasses in Hawkes Bay forests Many more aphids were found in 200102 than in 200001 with numbers peaking in January in 2001 and April in 2002 Few aphids were found from October to December More aphids were recorded on trees in the medium and old age classes than in the young age class There was no consistent pattern in aphid numbers in relation to elevation of forests There was little visual evidence of aphid damage to trees but this does not mean that this aphid does not cause economic damage Factors influencing the population dynamics of this aphid and sampling methods are discussed
APA, Harvard, Vancouver, ISO, and other styles
42

Goudman, Lisa, Jean-Pierre Van Buyten, Ann De Smedt, Iris Smet, Marieke Devos, Ali Jerjir, and Maarten Moens. "Predicting the Response of High Frequency Spinal Cord Stimulation in Patients with Failed Back Surgery Syndrome: A Retrospective Study with Machine Learning Techniques." Journal of Clinical Medicine 9, no. 12 (December 21, 2020): 4131. http://dx.doi.org/10.3390/jcm9124131.

Full text
Abstract:
Despite the proven clinical value of spinal cord stimulation (SCS) for patients with failed back surgery syndrome (FBSS), factors related to a successful SCS outcome are not yet clearly understood. This study aimed to predict responders for high frequency SCS at 10 kHz (HF-10). Data before implantation and the last available data was extracted for 119 FBSS patients treated with HF-10 SCS. Correlations, logistic regression, linear discriminant analysis, classification and regression trees, random forest, bagging, and boosting were applied. Based on feature selection, trial pain relief, predominant pain location, and the number of previous surgeries were relevant factors for predicting pain relief. To predict responders with 50% pain relief, 58.33% accuracy was obtained with boosting, random forest and bagging. For predicting responders with 30% pain relief, 70.83% accuracy was obtained using logistic regression, linear discriminant analysis, boosting, and classification trees. For predicting pain medication decrease, accuracies above 80% were obtained using logistic regression and linear discriminant analysis. Several machine learning techniques were able to predict responders to HF-10 SCS with an acceptable accuracy. However, none of the techniques revealed a high accuracy. The inconsistent results regarding predictive factors in literature, combined with acceptable accuracy of the currently obtained models, might suggest that routinely collected baseline parameters from clinical practice are not sufficient to consistently predict the SCS response with a high accuracy in the long-term.
APA, Harvard, Vancouver, ISO, and other styles
43

Hachaj, Tomasz. "Improving Human Motion Classification by Applying Bagging and Symmetry to PCA-Based Features." Symmetry 11, no. 10 (October 10, 2019): 1264. http://dx.doi.org/10.3390/sym11101264.

Full text
Abstract:
This paper proposes a method for improving human motion classification by applying bagging and symmetry to Principal Component Analysis (PCA)-based features. In contrast to well-known bagging algorithms such as random forest, the proposed method recalculates the motion features for each “weak classifier” (it does not randomly sample a feature set). The proposed classification method was evaluated on a challenging (even to a human observer) motion capture recording dataset of martial arts techniques performed by professional karate sportspeople. The dataset consisted of 360 recordings in 12 motion classes. Because some classes of these motions might be symmetrical (which means that they are performed with a dominant left or right hand/leg), an analysis was conducted to determine whether accounting for symmetry could improve the recognition rate of a classifier. The experimental results show that applying the proposed classifiers’ bagging procedure increased the recognition rate (RR) of the Nearest-Neighbor (NNg) and Support Vector Machine (SVM) classifiers by more than 5% and 3%, respectively. The RR of one trained classifier (SVM) was higher when we did not use symmetry. On the other hand, the application of symmetry information for bagged NNg improved its recognition rate compared with the results without symmetry information. We can conclude that symmetry information might be helpful in situations in which it is not possible to optimize the decision borders of the classifier (for example, when we do not have direct information about class labels). The experiment presented in this paper shows that, in this case, bagging and mirroring might help find a similar object in the training set that shares the same class label. Both the dataset that was used for the evaluation and the implementation of the proposed method can be downloaded, so the experiment is easily reproducible.
APA, Harvard, Vancouver, ISO, and other styles
44

CHAPARRO, Jorge E., Jenny CUATINDOY, and Nelson BARRERA. "Análisis comparativo de técnicas de clasificación para determinar la deserción estudiantil de la facultad de ingeniería de la Universidad de Antioquia, Colombia." ESPACIOS 42, no. 07 (April 15, 2021): 63–81. http://dx.doi.org/10.48082/espacios-a21v42n07p05.

Full text
Abstract:
El objetivo de este estudio es comparar diferentes algoritmos de clasificación de aprendizaje supervisado como las redes neuronales artificiales, métodos probabilísticos como regresión logística multinomial, métodos de ensamble como random forest, bagging, boosting y las máquinas de soporte vectorial, con el fin de identificar perfiles de posibles estudiantes desertores de la facultad de ingeniería de la Universidad de Antioquia, a partir de dos targets; número de créditos inscritos en último semestre y semestre en el cual el estudiante abandona la universidad.
APA, Harvard, Vancouver, ISO, and other styles
45

Nafees, Afnan, Sherbaz Khan, Muhammad Faisal Javed, Raid Alrowais, Abdeliazim Mustafa Mohamed, Abdullah Mohamed, and Nikolai Ivanovic Vatin. "Forecasting the Mechanical Properties of Plastic Concrete Employing Experimental Data Using Machine Learning Algorithms: DT, MLPNN, SVM, and RF." Polymers 14, no. 8 (April 13, 2022): 1583. http://dx.doi.org/10.3390/polym14081583.

Full text
Abstract:
Increased population necessitates an expansion of infrastructure and urbanization, resulting in growth in the construction industry. A rise in population also results in an increased plastic waste, globally. Recycling plastic waste is a global concern. Utilization of plastic waste in concrete can be an optimal solution from recycling perspective in construction industry. As environmental issues continue to grow, the development of predictive machine learning models is critical. Thus, this study aims to create modelling tools for estimating the compressive and tensile strengths of plastic concrete. For predicting the strength of concrete produced with plastic waste, this research integrates machine learning algorithms (individual and ensemble techniques), including bagging and adaptive boosting by including weak learners. For predicting the mechanical properties, 80 cylinders for compressive strength and 80 cylinders for split tensile strength were casted and tested with varying percentages of irradiated plastic waste, either as of cement or fine aggregate replacement. In addition, a thorough and reliable database, including 320 compressive strength tests and 320 split tensile strength tests, was generated from existing literature. Individual, bagging and adaptive boosting models of decision tree, multilayer perceptron neural network, and support vector machines were developed and compared with modified learner model of random forest. The results implied that individual model response was enriched by utilizing bagging and boosting learners. A random forest with a modified learner algorithm provided the robust performance of the models with coefficient correlation of 0.932 for compressive strength and 0.86 for split tensile strength with the least errors. Sensitivity analyses showed that tensile strength models were least sensitive to water and coarse aggregates, while cement, silica fume, coarse aggregate, and age have a substantial effect on compressive strength models. To minimize overfitting errors and corroborate the generalized modelling result, a cross-validation K-Fold technique was used. Machine learning algorithms are used to predict mechanical properties of plastic concrete to promote sustainability in construction industry.
APA, Harvard, Vancouver, ISO, and other styles
46

Petrides, George, and Wouter Verbeke. "Cost-sensitive ensemble learning: a unifying framework." Data Mining and Knowledge Discovery 36, no. 1 (September 28, 2021): 1–28. http://dx.doi.org/10.1007/s10618-021-00790-4.

Full text
Abstract:
AbstractOver the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.
APA, Harvard, Vancouver, ISO, and other styles
47

Gbenga, Fadare Oluwaseun, Adetunmbi Adebayo Olusola, and Oyinloye Oghenerukevwe Elohor. "Towards Optimization of Malware Detection using Extra-Tree and Random Forest Feature Selections on Ensemble Classifiers." International Journal of Recent Technology and Engineering 9, no. 6 (March 30, 2021): 223–32. http://dx.doi.org/10.35940/ijrte.f5545.039621.

Full text
Abstract:
The proliferation of Malware on computer communication systems posed great security challenges to confidential data stored and other valuable substances across the globe. There have been several attempts in curbing the menace using a signature-based approach and in recent times, machine learning techniques have been extensively explored. This paper proposes a framework combining the exploit of both feature selections based on extra tree and random forest and eight ensemble techniques on five base learners- KNN, Naive Bayes, SVM, Decision Trees, and Logistic Regression. K-Nearest Neighbors returns the highest accuracy of 96.48%, 96.40%, and 87.89% on extra-tree, random forest, and without feature selection (WFS) respectively. Random forest ensemble accuracy on both Feature Selections are the highest with 98.50% and 98.16% on random forest and extra-tree respectively. The Extreme Gradient Boosting Classifier is next on random-forest FS with an accuracy of 98.37% while Voting returns the least detection accuracy of 95.80%. On extra-tree FS, Bagging is next with a detection accuracy of 98.09% while Voting returns the least accuracy of 95.54%. Random Forest has the highest all in seven evaluative measures in both extra tree and random forest feature selection techniques. The study results uncover the tree-based ensemble model is proficient and successful for malware classification.
APA, Harvard, Vancouver, ISO, and other styles
48

Liu, Jiaming, Liuan Wang, Linan Zhang, Zeming Zhang, and Sicheng Zhang. "Predictive analytics for blood glucose concentration: an empirical study using the tree-based ensemble approach." Library Hi Tech 38, no. 4 (July 1, 2020): 835–58. http://dx.doi.org/10.1108/lht-08-2019-0171.

Full text
Abstract:
PurposeThe primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).Design/methodology/approachThis study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.FindingsThe results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.Practical implicationsThis study proposed a novel BG prediction framework for better predictive analytics in health care.Social implicationsThis study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.Originality/valueThe majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.
APA, Harvard, Vancouver, ISO, and other styles
49

Liu, Jinhua, Jianli Ding, Xiangyu Ge, and Jingzhe Wang. "Evaluation of Total Nitrogen in Water via Airborne Hyperspectral Data: Potential of Fractional Order Discretization Algorithm and Discrete Wavelet Transform Analysis." Remote Sensing 13, no. 22 (November 18, 2021): 4643. http://dx.doi.org/10.3390/rs13224643.

Full text
Abstract:
Controlling and managing surface source pollution depends on the rapid monitoring of total nitrogen in water. However, the complex factors affecting water quality (plant shading and suspended matter in water) make direct estimation extremely challenging. Considering the spectral response mechanisms of emergent plants, we coupled discrete wavelet transform (DWT) and fractional order discretization (FOD) techniques with three machine learning models (random forest (RF), bagging algorithm (bagging), and eXtreme Gradient Boosting (XGBoost)) to mine this potential spectral information. A total of 567 models were developed, and airborne hyperspectral data processed with various DWT scales and FOD techniques were compared. The effective information in the hyperspectral reflectance data were better emphasized after DWT processing. After DWT processing the original spectrum (OR), its sensitivity to TN in water was maximally improved by 0.22, and the correlation between FOD and TN in water was optimally increased by 0.57. The transformed spectral information enhanced the TN model accuracy, especially for FOD after DWT. For RF, 82% of the model R2 values improved by 0.02~0.72 compared to the model using FOD spectra; 78.8% of the bagging values improved by 0.01~0.53 and 65.0% of the XGBoost values improved by 0.01~0.64. The XGBoost model with DWT coupled with grey relation analysis (GRA) yielded the best estimation accuracy, with the highest precision of R2 = 0.91 for L6. In conclusion, appropriately scaled DWT analysis can substantially improve the accuracy of extracting TN from UAV hyperspectral images. These outcomes may facilitate the further development of accurate water quality monitoring in sophisticated global waters from drone or satellite hyperspectral data.
APA, Harvard, Vancouver, ISO, and other styles
50

Guo, Huaping, Xiaoyu Diao, and Hongbing Liu. "Embedding Undersampling Rotation Forest for Imbalanced Problem." Computational Intelligence and Neuroscience 2018 (November 1, 2018): 1–15. http://dx.doi.org/10.1155/2018/6798042.

Full text
Abstract:
Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography