Log in

Relevant bibliographies by topics / Decision Tree with CART algorithm / Journal articles

Journal articles on the topic 'Decision Tree with CART algorithm'

To see the other types of publications on this topic, follow the link: Decision Tree with CART algorithm.

Author: Grafiati

Published: 14 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Decision Tree with CART algorithm.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Pratiwi, Reni, Memi Nor Hayati, and Surya Prangga. "PERBANDINGAN KLASIFIKASI ALGORITMA C5.0 DENGAN CLASSIFICATION AND REGRESSION TREE (STUDI KASUS : DATA SOSIAL KEPALA KELUARGA MASYARAKAT DESA TELUK BARU KECAMATAN MUARA ANCALONG TAHUN 2019)." BAREKENG: Jurnal Ilmu Matematika dan Terapan 14, no. 2 (September 7, 2020): 273–84. http://dx.doi.org/10.30598/barekengvol14iss2pp273-284.

Full text

Abstract:

Decision tree is a algorithm used as a reasoning procedure to get answers from problems are entered. Many methods can be used in decision trees, including the C5.0 algorithm and Classification and Regression Tree (CART). C5.0 algorithm is a non-binary decision tree where the branch of tree can be more than two, while the CART algorithm is a binary decision tree where the branch of tree consists of only two branches. This research aims to determine the classification results of the C5.0 and CART algorithms and to determine the comparison of the accuracy classification results from these two methods. The variables used in this research are the average monthly income (Y), employment (X1), number of family members (X2), last education (X3) and gender (X4). After analyzing the results obtained that the accuracy rate of C5.0 algorithm is 79,17% while the accuracy rate of CART is 84,63%. So it can be said that the CART method is a better method in classifying the average income of the people of Teluk Baru Village in Muara Ancalong District in 2019 compared to the C5.0 algorithm method. Keywords: C5.0 Algorithm, CART, Classification, Decision Tree.

APA, Harvard, Vancouver, ISO, and other styles

2

Okada, Hugo Kenji Rodrigues, Andre Ricardo Nascimento das Neves, and Ricardo Shitsuka. "Analysis of Decision Tree Induction Algorithms." Research, Society and Development 8, no. 11 (August 24, 2019): e298111473. http://dx.doi.org/10.33448/rsd-v8i11.1473.

Full text

Abstract:

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.

APA, Harvard, Vancouver, ISO, and other styles

3

Kumar, Sunil, Saroj Ratnoo, and Jyoti Vashishtha. "HYPER HEURISTIC EVOLUTIONARY APPROACH FOR CONSTRUCTING DECISION TREE CLASSIFIERS." Journal of Information and Communication Technology 20, Number 2 (February 21, 2021): 249–76. http://dx.doi.org/10.32890/jict2021.20.2.5.

Full text

Abstract:

Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.

APA, Harvard, Vancouver, ISO, and other styles

4

Khoshgoftaar, Taghi M., and Naeem Seliya. "Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm." International Journal on Artificial Intelligence Tools 12, no. 03 (September 2003): 207–25. http://dx.doi.org/10.1142/s0218213003001204.

Full text

Abstract:

Predicting the quality of system modules prior to software testing and operations can benefit the software development team. Such a timely reliability estimation can be used to direct cost-effective quality improvement efforts to the high-risk modules. Tree-based software quality classification models based on software metrics are used to predict whether a software module is fault-prone or not fault-prone. They are white box quality estimation models with good accuracy, and are simple and easy to interpret. An in-depth study of calibrating classification trees for software quality estimation using the SPRINT decision tree algorithm is presented. Many classification algorithms have memory limitations including the requirement that datasets be memory resident. SPRINT removes all of these limitations and provides a fast and scalable analysis. It is an extension of a commonly used decision tree algorithm, CART, and provides a unique tree pruning technique based on the Minimum Description Length (MDL) principle. Combining the MDL pruning technique and the modified classification algorithm, SPRINT yields classification trees with useful accuracy. The case study used consists of software metrics collected from a very large telecommunications system. It is observed that classification trees built by SPRINT are more balanced and demonstrate better stability than those built by CART.

APA, Harvard, Vancouver, ISO, and other styles

5

Duan, Huajie, Zhengdong Deng, Feifan Deng, and Daqing Wang. "Assessment of Groundwater Potential Based on Multicriteria Decision Making Model and Decision Tree Algorithms." Mathematical Problems in Engineering 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/2064575.

Full text

Abstract:

Groundwater plays an important role in global climate change and satisfying human needs. In the study, RS (remote sensing) and GIS (geographic information system) were utilized to generate five thematic layers, lithology, lineament density, topology, slope, and river density considered as factors influencing the groundwater potential. Then, the multicriteria decision model (MCDM) was integrated with C5.0 and CART, respectively, to generate the decision tree with 80 surveyed tube wells divided into four classes on the basis of the yield. To test the precision of the decision tree algorithms, the 10-fold cross validation and kappa coefficient were adopted and the average kappa coefficient for C5.0 and CART was 90.45% and 85.09%, respectively. After applying the decision tree to the whole study area, four classes of groundwater potential zones were demarcated. According to the classification result, the four grades of groundwater potential zones, “very good,” “good,” “moderate,” and “poor,” occupy 4.61%, 8.58%, 26.59%, and 60.23%, respectively, with C5.0 algorithm, while occupying the percentages of 4.68%, 10.09%, 26.10%, and 59.13%, respectively, with CART algorithm. Therefore, we can draw the conclusion that C5.0 algorithm is more appropriate than CART for the groundwater potential zone prediction.

APA, Harvard, Vancouver, ISO, and other styles

6

Yu, Shuang, Xiongfei Li, Hancheng Wang, Xiaoli Zhang, and Shiping Chen. "C_CART: An instance confidence-based decision tree algorithm for classification." Intelligent Data Analysis 25, no. 4 (July 9, 2021): 929–48. http://dx.doi.org/10.3233/ida-205361.

Full text

Abstract:

In classification, a decision tree is a common model due to its simple structure and easy understanding. Most of decision tree algorithms assume all instances in a dataset have the same degree of confidence, so they use the same generation and pruning strategies for all training instances. In fact, the instances with greater degree of confidence are more useful than the ones with lower degree of confidence in the same dataset. Therefore, the instances should be treated discriminately according to their corresponding confidence degrees when training classifiers. In this paper, we investigate the impact and significance of degree of confidence of instances on the classification performance of decision tree algorithms, taking the classification and regression tree (CART) algorithm as an example. First, the degree of confidence of instances is quantified from a statistical perspective. Then, a developed CART algorithm named C_CART is proposed by introducing the confidence of instances into the generation and pruning processes of CART algorithm. Finally, we conduct experiments to evaluate the performance of C_CART algorithm. The experimental results show that our C_CART algorithm can significantly improve the generalization performance as well as avoiding the over-fitting problem to a certain extend.

APA, Harvard, Vancouver, ISO, and other styles

7

Barros, Rodrigo C., Márcio P. Basgalupp, André C. P. L. F. de Carvalho, and Alex A. Freitas. "Automatic Design of Decision-Tree Algorithms with Evolutionary Algorithms." Evolutionary Computation 21, no. 4 (November 2013): 659–84. http://dx.doi.org/10.1162/evco_a_00101.

Full text

Abstract:

This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

APA, Harvard, Vancouver, ISO, and other styles

8

Jun, Sungbum. "Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing." Sensors 21, no. 8 (April 18, 2021): 2849. http://dx.doi.org/10.3390/s21082849.

Full text

Abstract:

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

APA, Harvard, Vancouver, ISO, and other styles

9

Liu, Biao, and Zhipeng Sun. "Global Economic Market Forecast and Decision System for IoT and Machine Learning." Mobile Information Systems 2022 (April 20, 2022): 1–12. http://dx.doi.org/10.1155/2022/8344791.

Full text

Abstract:

The fast growth of IoT in wearable devices, smart sensors, and home appliances will affect every aspect of our lives. With the rapid development of economic globalization, how to integrate science and technology into economic decision-making is the focus of the current research field, and the research of this paper is precisely to solve this problem. This paper proposes a global economic market forecasting and decision-making system research based on the Internet of Things and machine learning. Using the wireless sensor network of the Internet of Things technology to perceive and predict the global economic market, through the decision tree method in machine learning, and combine the global economic market to make economic decisions, this paper explores the decision tree algorithm with the highest execution efficiency through the experimental comparison of four decision tree algorithms: ID3 algorithm, C4.5 algorithm, CART algorithm, and IQ algorithm. The output of the experiments in the paper indicates that the C4.5 algorithm has the fastest running speed. When the dataset increases to 110,000, its running time reaches 503 s.

APA, Harvard, Vancouver, ISO, and other styles

10

Yang, Bao Hua, and Shuang Li. "Remote Sense Image Classification Based on CART Algorithm." Advanced Materials Research 864-867 (December 2013): 2782–86. http://dx.doi.org/10.4028/www.scientific.net/amr.864-867.2782.

Full text

Abstract:

This papers deals with the study of the algorithm of classification method based on decision tree for remote sensing image. The experimental area is located in the Xiangyang district, the data source for the 2010 satellite images of SPOT and TM fusion. Moreover, classification method based on decision tree is optimized with the help of the module of RuleGen and applied in regional remote sensing image of interest. The precision of Maximum likelihood ratio is 95.15 percent, and 94.82 percent for CRAT. Experimental results show that the classification method based on classification and regression tree method is as well as the traditional one.

APA, Harvard, Vancouver, ISO, and other styles

11

Monjezi, Nasim. "The Application of the CART and CHIAD Algorithms in Sugar Beet Yield Prediction." Basrah J. Agric. Sci. 34, no. 1 (February 4, 2021): 1–13. http://dx.doi.org/10.37077/25200860.2021.34.1.01.

Full text

Abstract:

Yield prediction is a very important agricultural problem. Any farmer would like to know, as soon as possible, how much yield he can expect. The problem of predicting yield production can be solved by employing data mining techniques. This study evaluated the feasibility to predict the yield at Khuzestan Province in Iran using CART and CHAID algorithms. The analyses were performed using IBM SPSS Modeler 14.2. Three cropping seasons from 125 farms were selected between 2015 and 2018. The most important attributes were selected and the average yield was classified according to a decision tree. The data was partitioned into training (70%) and testing (30%) samples. The decision tree, including nine independent variables and 29 nodes, was produced through CART method. The decision tree, including nine independent variables and 39 nodes, was produced through the CHAID method. The CART and CHAID algorithms were evaluated using linear correlation and mean absolute error (MAE). Maximum precision of model in training part relevant to CART algorithm was equal to 95%, in testing part relevant to CART algorithm was equal to 93%. According to models′ precision, the results showed that CHAID and CART models were stable and suitable for prediction of sugar beet yield.

APA, Harvard, Vancouver, ISO, and other styles

12

Clarin, Jeffrey A. "Academic Analytics: Predicting Success in the Licensure Examination of Graduates using CART Decision Tree Algorithm." Journal of Advanced Research in Dynamical and Control Systems 12, no. 01-Special Issue (February 13, 2020): 143–51. http://dx.doi.org/10.5373/jardcs/v12sp1/20201057.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Zhao, Long, Sanghyuk Lee, and Seon-Phil Jeong. "Decision Tree Application to Classification Problems with Boosting Algorithm." Electronics 10, no. 16 (August 8, 2021): 1903. http://dx.doi.org/10.3390/electronics10161903.

Full text

Abstract:

A personal credit evaluation algorithm is proposed by the design of a decision tree with a boosting algorithm, and the classification is carried out. By comparison with the conventional decision tree algorithm, it is shown that the boosting algorithm acts to speed up the processing time. The Classification and Regression Tree (CART) algorithm with the boosting algorithm showed 90.95% accuracy, slightly higher than without boosting, 90.31%. To avoid overfitting of the model on the training set due to unreasonable data set division, we consider cross-validation and illustrate the results with simulation; hypermeters of the model have been applied and the model fitting effect is verified. The proposed decision tree model is fitted optimally with the help of a confusion matrix. In this paper, relevant evaluation indicators are also introduced to evaluate the performance of the proposed model. For the comparison with the conventional methods, accuracy rate, error rate, precision, recall, etc. are also illustrated; we comprehensively evaluate the model performance based on the model accuracy after the 10-fold cross-validation. The results show that the boosting algorithm improves the performance of the model in accuracy and precision when CART is applied, but the model fitting time takes much longer, around 2 min. With the obtained result, it is verified that the performance of the decision tree model is improved under the boosting algorithm. At the same time, we test the performance of the proposed verification model with model fitting, and it could be applied to the prediction model for customers’ decisions on subscription to the fixed deposit business.

APA, Harvard, Vancouver, ISO, and other styles

14

Yontar, Meltem, Özge Hüsniye Namli, and Seda Yanik. "Using machine learning techniques to develop prediction models for detecting unpaid credit card customers." Journal of Intelligent & Fuzzy Systems 39, no. 5 (November 19, 2020): 6073–87. http://dx.doi.org/10.3233/jifs-189080.

Full text

Abstract:

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.

APA, Harvard, Vancouver, ISO, and other styles

15

Monalisa, Siti, and Fakhri Hadi. "Penerapan Algoritma CART Dalam Menentukan Jurusan Siswa di MAN 1 Inhil." Jurnal Sisfokom (Sistem Informasi dan Komputer) 9, no. 3 (October 27, 2020): 387–94. http://dx.doi.org/10.32736/sisfokom.v9i3.932.

Full text

Abstract:

MAN 1 Inhil is a school that applies ministerial regulations to determine the direction of student majors at the beginning of entry, namely in class X. Determination of majors is done by considering several indicators, namely the results of academic tests, interviews, and student interest. The calculation in determining this course is very simple, namely by adding up the values of each indicator and dividing them together so that an average value is obtained. If the value is fulfilled then the student is grouped based on their interests. This can lead to errors in decision making by the school because it can be subjective because it prioritizes student interests. Therefore we need methods and algorithms to help make decisions well, the decision tree method. One algorithm that can be used is CART algorithm to classify majors with three indicators, namely Natural Sciences, Social Sciences and Religion. The results of this study indicate that the CART algorithm is able to predict correctly, from 360 data classified using the CART algorithm, it can be concluded that 71 data majoring in religion and correctly classified by CART. 144 data majoring in Natural Sciences, 119 data correctly classified and 24 data classified as IPS, and 1 data classified as religion. Of 146 data majoring in social studies, 129 were classified correctly, 16 data were classified as natural sciences. Therefore it can be concluded that CART algorithm has an 80% accuracy so that it can be used in decision making

APA, Harvard, Vancouver, ISO, and other styles

16

Abspoel, Mark, Daniel Escudero, and Nikolaj Volgushev. "Secure training of decision trees with continuous attributes." Proceedings on Privacy Enhancing Technologies 2021, no. 1 (January 1, 2021): 167–87. http://dx.doi.org/10.2478/popets-2021-0010.

Full text

Abstract:

AbstractWe apply multiparty computation (MPC) techniques to show, given a database that is secret-shared among multiple mutually distrustful parties, how the parties may obliviously construct a decision tree based on the secret data. We consider data with continuous attributes (i.e., coming from a large domain), and develop a secure version of a learning algorithm similar to the C4.5 or CART algorithms. Previous MPC-based work only focused on decision tree learning with discrete attributes (De Hoogh et al. 2014). Our starting point is to apply an existing generic MPC protocol to a standard decision tree learning algorithm, which we then optimize in several ways. We exploit the fact that even if we allow the data to have continuous values, which a priori might require fixed or floating point representations, the output of the tree learning algorithm only depends on the relative ordering of the data. By obliviously sorting the data we reduce the number of comparisons needed per node to O(N log2N) from the naive O(N2), where N is the number of training records in the dataset, thus making the algorithm feasible for larger datasets. This does however introduce a problem when duplicate values occur in the dataset, but we manage to overcome this problem with a relatively cheap subprotocol. We show a procedure to convert a sorting network into a permutation network of smaller complexity, resulting in a round complexity of O(log N) per layer in the tree. We implement our algorithm in the MP-SPDZ framework and benchmark our implementation for both passive and active three-party computation using arithmetic modulo 264. We apply our implementation to a large scale medical dataset of ≈ 290 000 rows using random forests, and thus demonstrate practical feasibility of using MPC for privacy-preserving machine learning based on decision trees for large datasets.

APA, Harvard, Vancouver, ISO, and other styles

17

Wang, Peng, and Ningchao Zhang. "Decision tree classification algorithm for non-equilibrium data set based on random forests." Journal of Intelligent & Fuzzy Systems 39, no. 2 (August 31, 2020): 1639–48. http://dx.doi.org/10.3233/jifs-179937.

Full text

Abstract:

In order to overcome the problems of poor accuracy and high complexity of current classification algorithm for non-equilibrium data set, this paper proposes a decision tree classification algorithm for non-equilibrium data set based on random forest. Wavelet packet decomposition is used to denoise non-equilibrium data, and SNM algorithm and RFID are combined to remove redundant data from data sets. Based on the results of data processing, the non-equilibrium data sets are classified by random forest method. According to Bootstrap resampling method with certain constraints, the majority and minority samples of each sample subset are sampled, CART is used to train the data set, and a decision tree is constructed. Obtain the final classification results by voting on the CART decision tree classification. Experimental results show that the proposed algorithm has the characteristics of high classification accuracy and low complexity, and it is a feasible classification algorithm for non-equilibrium data set.

APA, Harvard, Vancouver, ISO, and other styles

18

Boļakova, Ieva. "A STUDY OF DECISION TREE ALGORITHMS FOR CONTINUOUS ATTRIBUTES." Environment. Technology. Resources. Proceedings of the International Scientific and Practical Conference 1 (June 20, 2001): 248. http://dx.doi.org/10.17770/etr2001vol1.1922.

Full text

Abstract:

Nowadays a lot of different algorithms for decision trees construction exist. With the help o f these algorithms one can make classification o f both discrete and continuous data. The aim o f this paper is to explore decision tree algorithms for continuous attributes. There are investigated CART (Breiman et al, 1984) and C4.5 (Quinlan, 1992) in this paper. The comparison of these methods was done in the process of exploration. As a result of the usage o f both algorithms, the conclusions about CART and C4.5 utilization advantages were drawn.

APA, Harvard, Vancouver, ISO, and other styles

19

Dobrovska, Lyudmila, and Olena Nosovets. "Development of the classifier based on a multilayer perceptron using genetic algorithm and cart decision tree." Eastern-European Journal of Enterprise Technologies 5, no. 9 (113) (October 31, 2021): 82–90. http://dx.doi.org/10.15587/1729-4061.2021.242795.

Full text

Abstract:

The problem of developing universal classifiers of biomedical data, in particular those that characterize the presence of a large number of parameters, inaccuracies and uncertainty, is urgent. Many studies are aimed at developing methods for analyzing these data, among them there are methods based on a neural network (NN) in the form of a multilayer perceptron (MP) using GA. The question of the application of evolutionary algorithms (EA) for setting up and learning the neural network is considered. Theories of neural networks, genetic algorithms (GA) and decision trees intersect and penetrate each other, new developed neural networks and their applications constantly appear. An example of a problem that is solved using EA algorithms is considered. Its goal is to develop and research a classifier for the diagnosis of breast cancer, obtained by combining the capabilities of the multilayer perceptron using the genetic algorithm (GA) and the CART decision tree. The possibility of improving the classifiers of biomedical data in the form of NN based on GA by applying the process of appropriate preparation of biomedical data using the CART decision tree has been established. The obtained results of the study indicate that these classifiers show the highest efficiency on the set of learning and with the minimum reduction of Decision Trees; increasing the number of contractions usually degrades the simulation result. On two datasets on the test set, the simulation accuracy was »83–87 %. The experiments carried out have confirmed the effectiveness of the proposed method for the synthesis of neural networks and make it possible to recommend it for practical use in processing data sets for further diagnostics, prediction, or pattern recognition

APA, Harvard, Vancouver, ISO, and other styles

20

Aziz, Firman, and Armin Lawi. "Increasing electrical grid stability classification performance using ensemble bagging of C4.5 and classification and regression trees." International Journal of Electrical and Computer Engineering (IJECE) 12, no. 3 (June 1, 2022): 2955. http://dx.doi.org/10.11591/ijece.v12i3.pp2955-2962.

Full text

Abstract:

<span>The increasing demand for electricity every year makes the electricity infrastructure approach the maximum threshold value, thus affecting the stability of the electricity network. The decentralized smart grid control (DSGC) system has succeeded in maintaining the stability of the electricity network with various assumptions. The data mining approach on the DSGC system shows that the decision tree algorithm provides new knowledge, however, its performance is not yet optimal. This paper poses an ensemble bagging algorithm to reinforce the performance of decision trees C4.5 and classification and regression trees (CART). To evaluate the classification performance, 10-fold cross-validation was used on the grid data. The results showed that the ensemble bagging algorithm succeeded in increasing the performance of both methods in terms of accuracy by 5.6% for C4.5 and 5.3% for CART.</span>

APA, Harvard, Vancouver, ISO, and other styles

21

Liu, Zhenyu, Tao Wen, Wei Sun, and Qilong Zhang. "A Novel Multiway Splits Decision Tree for Multiple Types of Data." Mathematical Problems in Engineering 2020 (November 12, 2020): 1–12. http://dx.doi.org/10.1155/2020/7870534.

Full text

Abstract:

Classical decision trees such as C4.5 and CART partition the feature space using axis-parallel splits. Oblique decision trees use the oblique splits based on linear combinations of features to potentially simplify the boundary structure. Although oblique decision trees have higher generalization accuracy, most oblique split methods are not directly conducive to the categorical data and are computationally expensive. In this paper, we propose a multiway splits decision tree (MSDT) algorithm, which adopts feature weighting and clustering. This method can combine multiple numerical features, multiple categorical features, or multiple mixed features. Experimental results show that MSDT has excellent performance for multiple types of data.

APA, Harvard, Vancouver, ISO, and other styles

22

Ma, Xiaojun, Shengjun Zhai, Yingxian Fu, Leonard Yoonjae Lee, and Jingxuan Shen. "Predicting the Occurrence and Causes of Employee Turnover with Machine Learning." Computer Engineering and Applications Journal 8, no. 3 (September 24, 2019): 217–27. http://dx.doi.org/10.18495/comengapp.v8i3.316.

Full text

Abstract:

This paper looks at the problem of employee turnover, which has considerable influence on organizational productivity and healthy working environments. Using a publicly available dataset, key factors capable of predicting employee churn are identified. Six machine learning algorithms including decision trees, random forests, naïve Bayes and multi-layer perceptron are used to predict employees who are prone to churn. A good level of predictive accuracy is observed, and a comparison is made with previous findings. It is found that while the simplest correlation and regression tree (CART) algorithm gives the best accuracy or F1-score, the alternating decision tree (ADT) gives the best area under the ROC curve. Rules extracted in the if-then form enable successful identification of the probable causes of churning.

APA, Harvard, Vancouver, ISO, and other styles

23

Kašćelan, Ljiljana, and Vladimir Kašćelan. "Component-Based Decision Trees." International Journal of Operations Research and Information Systems 6, no. 4 (October 2015): 1–18. http://dx.doi.org/10.4018/ijoris.2015100101.

Full text

Abstract:

Popular decision tree (DT) algorithms such as ID3, C4.5, CART, CHAID and QUEST may have different results using same data set. They consist of components which have similar functionalities. These components implemented on different ways and they have different performance. The best way to get an optimal DT for a data set is one that use component-based design, which enables user to intelligently select in advance implemented components well suited to specific data set. In this article the authors proposed component-based design of the optimal DT for classification of securities account holders. Research results showed that the optimal algorithm is not one of the original DT algorithms. This fact confirms that the component design provided algorithms with better performance than the original ones. Also, the authors found how the specificities of the data influence the DT components performance. Obtained results of classification can be useful to the future investors in the Montenegrin capital market.

APA, Harvard, Vancouver, ISO, and other styles

24

Han, Jiaonan. "System Optimization of Talent Life Cycle Management Platform Based on Decision Tree Model." Journal of Mathematics 2022 (January 21, 2022): 1–12. http://dx.doi.org/10.1155/2022/2231112.

Full text

Abstract:

Decision tree algorithm is a widely used classification and prediction method. Because it generates a tree-like classifier, it has a simple structure and is extensively used by people. Regardless of the decision tree algorithm, the decision attributes are classified according to the condition attributes. The judgment process is from the root node to the leaf node. Each branch of the tree takes the form of selecting the best split attribute. However, this classification method of decision tree makes it rely too much on training data. If the data are more complicated, there are noisy data, incomplete data, etc. The decision tree will often have overfitting problems. This study mainly analyzes the random forest algorithm model and the CART algorithm and applies the CART algorithm to the model according to the random forest model. Aiming at the algorithm’s shortcomings in solving big data, this study will improve the algorithm through the MapReduce programming model to achieve parallelization of the process and construction of the function. Combining the construction goals and principles of the talent supply chain management system, this study constructs the overall framework and operational process of the enterprise talent supply chain management system based on the decision tree model from the overall level and the operational level. Aiming at the enterprise’s talent management problems, it focuses on designing integrated management, flexible management, talent information integrated management, and evaluation and optimization management models to ensure that the constructed system is operable and measurable and can achieve dynamic optimization. Based on the current situation of talent management in a company, this study analyzes the enterprise talent supply chain management model based on the decision tree model proposed in this study and constructs the overall framework and core model of a company’s talent supply chain management system. The current situation of the company puts forward the safeguard measures for the implementation of the management system to assure that the established management system can be effectively implemented.

APA, Harvard, Vancouver, ISO, and other styles

25

Bollwein, Ferdinand, and Stephan Westphal. "A branch & bound algorithm to determine optimal bivariate splits for oblique decision tree induction." Applied Intelligence 51, no. 10 (March 12, 2021): 7552–72. http://dx.doi.org/10.1007/s10489-021-02281-x.

Full text

Abstract:

AbstractUnivariate decision tree induction methods for multiclass classification problems such as CART, C4.5 and ID3 continue to be very popular in the context of machine learning due to their major benefit of being easy to interpret. However, as these trees only consider a single attribute per node, they often get quite large which lowers their explanatory value. Oblique decision tree building algorithms, which divide the feature space by multidimensional hyperplanes, often produce much smaller trees but the individual splits are hard to interpret. Moreover, the effort of finding optimal oblique splits is very high such that heuristics have to be applied to determine local optimal solutions. In this work, we introduce an effective branch and bound procedure to determine global optimal bivariate oblique splits for concave impurity measures. Decision trees based on these bivariate oblique splits remain fairly interpretable due to the restriction to two attributes per split. The resulting trees are significantly smaller and more accurate than their univariate counterparts due to their ability of adapting better to the underlying data and capturing interactions of attribute pairs. Moreover, our evaluation shows that our algorithm even outperforms algorithms based on heuristically obtained multivariate oblique splits despite the fact that we are focusing on two attributes only.

APA, Harvard, Vancouver, ISO, and other styles

26

Wieczorek, Wojciech, Jan Kozak, Łukasz Strąk, and Arkadiusz Nowakowski. "Minimum Query Set for Decision Tree Construction." Entropy 23, no. 12 (December 14, 2021): 1682. http://dx.doi.org/10.3390/e23121682.

Full text

Abstract:

A new two-stage method for the construction of a decision tree is developed. The first stage is based on the definition of a minimum query set, which is the smallest set of attribute-value pairs for which any two objects can be distinguished. To obtain this set, an appropriate linear programming model is proposed. The queries from this set are building blocks of the second stage in which we try to find an optimal decision tree using a genetic algorithm. In a series of experiments, we show that for some databases, our approach should be considered as an alternative method to classical ones (CART, C4.5) and other heuristic approaches in terms of classification quality.

APA, Harvard, Vancouver, ISO, and other styles

27

Jahangiri, Mina, Fakher Rahim, Najmaldin Saki, and Amal Saki Malehi. "Application of Bayesian Decision Tree in Hematology Research: Differential Diagnosis of β-Thalassemia Trait from Iron Deficiency Anemia." Computational and Mathematical Methods in Medicine 2021 (November 9, 2021): 1–10. http://dx.doi.org/10.1155/2021/6401105.

Full text

Abstract:

Objective. Several discriminating techniques have been proposed to discriminate between β-thalassemia trait (βTT) and iron deficiency anemia (IDA). These discrimination techniques are essential clinically, but they are challenging and typically difficult. This study is the first application of the Bayesian tree-based method for differential diagnosis of βTT from IDA. Method. This cross-sectional study included 907 patients with ages over 18 years old and a mean (±SD) age of 25 ± 16.1 with either βTT or IDA. Hematological parameters were measured using a Sysmex KX-21 automated hematology analyzer. Bayesian Logit Treed (BLTREED) and Classification and Regression Trees (CART) were implemented to discriminate βTT from IDA based on the hematological parameters. Results. This study proposes an automatic detection model of beta-thalassemia carriers based on a Bayesian tree-based method. The BLTREED model and CART showed that mean corpuscular volume (MCV) was the main predictor in diagnostic discrimination. According to the test dataset, CART indicated higher sensitivity and negative predictive value than BLTREED for differential diagnosis of βTT from IDA. However, the CART algorithm had a high false-positive rate. Overall, the BLTREED model showed better performance concerning the area under the curve (AUC). Conclusions. The BLTREED model showed excellent diagnostic accuracy for differentiating βTT from IDA. In addition, understanding tree-based methods are easy and do not need statistical experience. Thus, it can help physicians in making the right clinical decision. So, the proposed model could support medical decisions in the differential diagnosis of βTT from IDA to avoid much more expensive, time-consuming laboratory tests, especially in countries with limited recourses or poor health services.

APA, Harvard, Vancouver, ISO, and other styles

28

Jung, Ji-Yong, Chang-Min Yang, and Jung-Ja Kim. "Decision Tree-Based Foot Orthosis Prescription for Patients with Pes Planus." International Journal of Environmental Research and Public Health 19, no. 19 (September 30, 2022): 12484. http://dx.doi.org/10.3390/ijerph191912484.

Full text

Abstract:

Pes planus, one of the most common foot deformities, includes the loss of the medial arch, misalignment of the rearfoot, and abduction of the forefoot, which negatively affects posture and gait. Foot orthosis, which is effective in normalizing the arch and providing stability during walking, is prescribed for the purpose of treatment and correction. Currently, machine learning technology for classifying and diagnosing foot types is being developed, but it has not yet been applied to the prescription of foot orthosis for the treatment and management of pes planus. Thus, the aim of this study is to propose a model that can prescribe a customized foot orthosis to patients with pes planus by learning from and analyzing various clinical data based on a decision tree algorithm called classification and regressing tree (CART). A total of 8 parameters were selected based on the feature importance, and 15 rules for the prescription of foot orthosis were generated. The proposed model based on the CART algorithm achieved an accuracy of 80.16%. This result suggests that the CART model developed in this study can provide adequate help to clinicians in prescribing foot orthosis easily and accurately for patients with pes planus. In the future, we plan to acquire more clinical data and develop a model that can prescribe more accurate and stable foot orthosis using various machine learning technologies.

APA, Harvard, Vancouver, ISO, and other styles

29

Wong, Ng Poi, Florida N. S. Damanik, Christine -, Edward Surya Jaya, and Ryan Rajaya. "Perbandingan Algoritma C4.5 dan Classification and Regression Tree (CART) Dalam Menyeleksi Calon Karyawan." Jurnal SIFO Mikroskil 20, no. 1 (April 4, 2019): 11–18. http://dx.doi.org/10.55601/jsm.v20i1.622.

Full text

Abstract:

This research compares the accuracy of the C4.5 algorithm and Classification and Regression Tree (CART) for prospective employees selection in companies. This research using dataset with criteria like age, working experience, recent education, marital status, number of abilities possessed, and the result of admission selection test. Testing uses 200 prospective employee selection data manually from a company. Algorithm testing using K-Fold Cross Validation and the accuracy calculation of the algorithm using Confusion Matrix. C4.5 algorithm has a level of accuracy, the success rate of the system, and the level of accuracy of the decision results of 52,83%, 41,48% and 43,98%, and CART algorithm is 53,33%, 44,06%, and 42,81%.

APA, Harvard, Vancouver, ISO, and other styles

30

Ataman, Görkem, and Serpil Kahraman. "Comparing Decision Trees and Association Rules for Stock Market Expectations in BIST100 and BIST30." Scientific Annals of Economics and Business 69, no. 3 (September 21, 2022): 459–75. http://dx.doi.org/10.47743/saeb-2022-0024.

Full text

Abstract:

With the increased financial fragility, methods have been needed to predict financial data effectively. In this study, two leading data mining technologies, classification analysis and association rule mining, are implemented for modeling potentially successful and risky stocks on the BIST 30 index and BIST 100 Index based on the key variables of index name, index value, and stock price. Classification and Regression Tree (CART) is used for classification, and Apriori is applied for association analysis. The study data set covered monthly closing values during 2013-2019. The Apriori algorithm also obtained almost all of the classification rules generated with the CART algorithm. Validated by two promising data mining techniques, proposed rules guide decision-makers in their investment decisions. By providing early warning signals of risky stocks, these rules can be used to minimize risk levels and protect decision-makers from making risky decisions.

APA, Harvard, Vancouver, ISO, and other styles

31

Polaka, Inese, Igor Tom, and Arkady Borisov. "Decision Tree Classifiers in Bioinformatics." Scientific Journal of Riga Technical University. Computer Sciences 42, no. 1 (January 1, 2010): 118–23. http://dx.doi.org/10.2478/v10143-010-0052-4.

Full text

Abstract:

Decision Tree Classifiers in BioinformaticsThis paper presents a literature review of articles related to the use of decision tree classifiers in gene microarray data analysis published in the last ten years. The main focus is on researches solving the cancer classification problem using single decision tree classifiers (algorithms C4.5 and CART) and decision tree forests (e.g. random forests) showing strengths and weaknesses of the proposed methodologies when compared to other popular classification methods. The article also touches the use of decision tree classifiers in gene selection.

APA, Harvard, Vancouver, ISO, and other styles

32

Gao, Wen, Rong Yu, Zhaolei Yu, Zhuang Ma, and Md Masum. "Auxiliary Diagnosis Method of Chest Pain Based on Machine Learning." International Journal of Engineering and Technology 14, no. 4 (November 2022): 79–83. http://dx.doi.org/10.7763/ijet.2022.v14.1207.

Full text

Abstract:

Chest pain is sudden, its pathological causes are complex and various, fatal or non-fatal so that improving the diagnostic accuracy is extremely important in the emergency system of prehospital and hospitals. Therefore, we propose a method of introducing a decision tree, support vector machine, and KNN algorithm in machine learning into the auxiliary diagnosis of chest pain. First select the algorithm with better performance among decision tree, support vector machine, and KNN algorithm; Then compare the classification performance of the CART algorithm, the support vector machine using the Gaussian kernel function, and the K nearest neighbor algorithm using the Euclidean distance to select the best; Finally, through the analysis of the experimental results, the support vector machine algorithm with Gaussian kernel function is obtained. Its detection time and diagnosis accuracy rate are the best among the three algorithms, which can assist medical staff in the emergency system to carry out targeted chest pain diagnosis.

APA, Harvard, Vancouver, ISO, and other styles

33

Awoin, Emmanuel, Peter Appiahene, Frank Gyasi, and Abdulai Sabtiwu. "Predicting the Performance of Rural Banks in Ghana Using Machine Learning Approach." Advances in Fuzzy Systems 2020 (February 19, 2020): 1–7. http://dx.doi.org/10.1155/2020/8028019.

Full text

Abstract:

The idea of rural banks was introduced as a result of limited commercial bank branches in rural areas to mobilize their resources for rural development. It is also believed that financial institutions such as rural banks are powerful tools for mitigating poverty. Nevertheless, some of these banks are rather increasing the burden of people through illegal activities and mismanagement of resources. Assessing banks’ performance using a set of financial ratios has been an interesting and challenging problem for many researchers and practitioners. Identification of factors that can accurately predict a firm’s performance is of great interest to any decision-maker. The study used ARB’s financial ratios as its independent variables to assess the performance of rural banks and later used random forest algorithm to identify the variables with the most relevance to the model. A dataset was obtained from the various banks. This study used three decision tree algorithms, namely, C5.0, C4.5, and CART, to build the various decision tree predictive models. The result of the study suggested that the C5.0 algorithm gave an accuracy of 100%, followed by the CART algorithm with an accuracy of 84.6% and, finally, the C4.5 algorithm with an accuracy of 83.34 on average. The study, therefore, recommended the usage of the C5.0 predictive model in predicting the financial performance of rural banks in Ghana.

APA, Harvard, Vancouver, ISO, and other styles

34

Zhao, Qichao, Xuxin Dong, Guohong Li, Yongtao Jin, Xiufeng Yang, and Yandan Qu. "Classification and Regression Tree Models for Remote Recognition of Black and Odorous Water Bodies Based on Sensor Networks." Scientific Programming 2022 (February 25, 2022): 1–12. http://dx.doi.org/10.1155/2022/7390098.

Full text

Abstract:

Black and odorous water bodies represent a topic of significant interest in the field of water pollution prevention and control. Remote sensing technology is increasingly exploited for the monitoring of black and odorous water bodies because of its high efficiency and large-scale monitoring potential. In the present study, the Sentinel-2A imagery data were combined with data obtained by measuring spectral properties of black and odorous water bodies to produce a classification and regression tree (CART) model-based improved remote sensing recognition method for such water bodies. This method transforms the traditional single-feature empirical threshold segmentation algorithm to a multi-feature fuzzy decision-tree classification algorithm. The results reveal overall accuracy values of 84.78%, 92.85%, and 72.23% for the CART decision-tree algorithm, the confidence zone classification, and the fuzzy zone node classification, respectively. The method proposed in the present study enables the highly precise extraction of features representing black and odorous water bodies from satellite imagery. The characterization of confidence and fuzzy zones minimizes the need for field inspections, and it enhances the efficiency of diverse applications including engineering.

APA, Harvard, Vancouver, ISO, and other styles

35

Manzali, Youness, Mohamed El Far, Mohamed Chahhou, and Mohammed Elmohajir. "Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation." Cybernetics and Information Technologies 22, no. 2 (June 1, 2022): 50–65. http://dx.doi.org/10.2478/cait-2022-0016.

Full text

Abstract:

Abstract Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.

APA, Harvard, Vancouver, ISO, and other styles

36

Fakir, Y., M. Azalmad, and R. Elaychi. "Study of The ID3 and C4.5 Learning Algorithms." Journal of Medical Informatics and Decision Making 1, no. 2 (April 23, 2020): 29–43. http://dx.doi.org/10.14302/issn.2641-5526.jmid-20-3302.

Full text

Abstract:

Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model

APA, Harvard, Vancouver, ISO, and other styles

37

Yong, Liu Guang, Xu Lin Ying, and Zhou Cheng Qiong. "Research of Software Defect Prediction Based on CART." Applied Mechanics and Materials 602-605 (August 2014): 3871–76. http://dx.doi.org/10.4028/www.scientific.net/amm.602-605.3871.

Full text

Abstract:

Through the software defect prediction can effectively guide the rational distribution of software system development resources, so as to improve the quality of software and software reliability. In order to fully utilize the existing historical data to guide the software development of existing software system development, this paper based on an improved classification and regression tree (Classification and Regression, CART) algorithm software defect prediction models. The paper first principal component analysis of the data predicted correlation dimension (Principle Component Analysis, PCA) between data and reduce the data, and configured according to the theory and optimized CART decision tree algorithm, existing software defect prediction system, and with traditional defect prediction method, the experimental results show that the proposed prediction model has higher prediction accuracy and stability.

APA, Harvard, Vancouver, ISO, and other styles

38

SHARPSTEN, LUCIE, JUANJUAN FAN, JOSEPH R. BARR, XIAOGANG SU, SHABAN DEMIREL, and RICHARD A. LEVINE. "PREDICTING GLAUCOMA PROGRESSION USING DECISION TREES FOR CLUSTERED DATA BY GOODNESS OF SPLIT." International Journal of Semantic Computing 07, no. 02 (June 2013): 157–72. http://dx.doi.org/10.1142/s1793351x13400072.

Full text

Abstract:

Glaucoma is a chronic, progressive and potentially blinding condition. Predicting which patients will experience significant progression is recognized as a crucially needed development in the management of this disease. Application of the CART (Classification And Regression Trees) methodology has demonstrated that certain patterns of visual field findings may convey greater predictive information for glaucoma progression. However, the current standard classification tree method was developed for uncorrelated data. In this article a classification tree method is extended to correlated binary data. The robust Wald test statistic from generalized estimating equations (GEE) is used to measure the between-node difference while adjusting for correlation between the eyes of a patient. The proposed method is assessed through simulations conducted under a variety of model configurations and is used to analyze the perimetry and psychophysics in glaucoma (PPIG) study data. Employing an amalgamation algorithm from the result of a best-sized tree, each eye is classified to one of two prognosis categories (less likely, or more likely, to progress). Receiver operating characteristics (ROC) and area under the curve (AUC) indicate that the proposed method, applied to data from both eyes of the same patient, provides much improved prediction accuracy compared with application of standard CART method to the same PPIG data.

APA, Harvard, Vancouver, ISO, and other styles

39

Hasanah, Msy Aulia, Sopian Soim, and Ade Silvia Handayani. "Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir." Journal of Applied Informatics and Computing 5, no. 2 (October 7, 2021): 103–8. http://dx.doi.org/10.30871/jaic.v5i2.3200.

Full text

Abstract:

Indonesia is part of a tropical climate with high rainfall intensity. High rainfall intensity can potentially cause flooding. To minimize this, accurate weather predictions are needed to be able to anticipate beforehand. This research was conducted with the aim of classifying based on the rain category with the dichotomy of heavy rain and very heavy rain using data mining techniques with the CRISP-DM methodology. The algorithm used in the classification technique is CART (Classification And Regression Tree) with Confusion Matrix test parameters. Based on the results of the model evaluation, it shows that the CART algorithm has a fairly good performance in classifying with an accuracy value of 89.4%.

APA, Harvard, Vancouver, ISO, and other styles

40

YILDIZ, OLCAY TANER, and ETHEM ALPAYDIN. "LINEAR DISCRIMINANT TREES." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 03 (May 2005): 323–53. http://dx.doi.org/10.1142/s0218001405004125.

Full text

Abstract:

We discuss and test empirically the effects of six dimensions along which existing decision tree induction algorithms differ. These are: Node type (univariate versus multivariate), branching factor (two or more), grouping of classes into two if the tree is binary, error (impurity) measure, and the methods for minimization to find the best split vector and threshold. We then propose a new decision tree induction method that we name linear discriminant trees (LDT) which uses the best combination of these criteria in terms of accuracy, simplicity and learning time. This tree induction method can be univariate or multivariate. The method has a supervised outer optimization layer for converting a K > 2-class problem into a sequence of two-class problems and each two-class problem is solved analytically using Fisher's Linear Discriminant Analysis (LDA). On twenty datasets from the UCI repository, we compare the linear discriminant trees with the univariate decision tree methods C4.5 and C5.0, multivariate decision tree methods CART, OC1, QUEST, neural trees and LMDT. Our proposed linear discriminant trees learn fast, are accurate, and the trees generated are small.

APA, Harvard, Vancouver, ISO, and other styles

41

Reddy, G. Sekhar, and Suneetha Chittineni. "Entropy based C4.5-SHO algorithm with information gain optimization in data mining." PeerJ Computer Science 7 (April 7, 2021): e424. http://dx.doi.org/10.7717/peerj-cs.424.

Full text

Abstract:

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.

APA, Harvard, Vancouver, ISO, and other styles

42

Darwin, Darwin, Dwiky Christian, Wilson Chandra, and Marlince Nababan. "Comparison of Decision Tree and Linear Regression Algorithms in the Case of Spread Prediction of COVID-19 in Indonesia." Journal of Computer Networks, Architecture and High Performance Computing 4, no. 1 (January 2, 2022): 1–12. http://dx.doi.org/10.47709/cnahpc.v4i1.1234.

Full text

Abstract:

COVID-19 is a disease that was first discovered in Wuhan, China and caused the 2019-2020 coronavirus pandemic. This virus can cause respiratory tract infections such as flu when infecting humans. According to Ministry of Health of the Republic of Indonesia, the number of confirmed cases of COVID-19 in Indonesia at March 2021 is 1,511,712 with 40,858 deaths and 1,348,330 recovered. For that, Indonesia is declared to have the highest confirmed cases in ASEAN. Several studies have been carried out to handle some cases by using the data mining techniques such as Decision Tree or Linear Regression algorithm, as example to classify the respiratory diseases and predict pregnancy hypertension. In this study, we tried to analyze COVID-19 cases in Indonesia and conducted an experiment of predicting COVID-19 new cases with the Decision Tree (CART) and Linear Regression algorithms. Then we will compare the values of these two algorithms by using R2 Score to evaluate the prediction performance. The results of this analysis state that DKI Jakarta province has the highest number of positive cases, cures and deaths in Indonesia. The value of the comparison results from the R2 Score obtained in the Decision Tree algorithm reached 95.69% (training) and 92.15% (testing) while the Linear Regression algorithm reached 79.93% (training) and 77.25% (testing).

APA, Harvard, Vancouver, ISO, and other styles

43

Wu, Chia-Yun, Tzeon-Jye Chiou, Chun-Yu Liu, Feng-Chang Lin, Jeong-Shi Lin, Man-Hsin Hung, Liang-Tsai Hsiao, et al. "Decision-Tree Algorithm Optimize Hematopoietic Progenitor Cell-Based Prediction in Peripheral Blood Stem Cell Mobilization." Blood 126, no. 23 (December 3, 2015): 1903. http://dx.doi.org/10.1182/blood.v126.23.1903.1903.

Full text

Abstract:

Abstract Background and Objectives Enumeration of hematopoietic progenitor cells (HPC) using an automated hematology analyzer provides rapid, inexpensive, and less technically dependent prediction of peripheral blood stem cell (PBSC) mobilization. This study aimed to incorporate HPC enumeration along with other predictors for optimizing a successful harvest. Materials and Methods Between 2007 and 2012, 189 consecutive patients who proceeded to PBSC harvesting with a preharvest HPC ≥ 20 x 106 /L were recruited. A failed PBSC mobilization was defined as < 2 x 106 CD34+ cells/kg. Variables predicting a successful harvest identified by multivariate logistic regression and correlation analysis were subjected to classification and regression tree (CART) analysis. Results A total of 154 (81.5%) patients successfully achieved mobilization of CD34+ cells (median 8.18 x 106 CD34+ cells/kg). Five independent host predictors including age ≥ 60, a diagnosis of solid tumor, prior chemotherapy cycles ≥ 5, prior radiotherapy, and mobilization with G-CSF alone or high-dose cyclophosphamide, as well as laboratory markers including HPC and mononuclear cell (MNC) counts, were used for CART analysis. The number of host predictors with a cutoff at two, HPC cutoff at 28 x 106/L and MNC cutoff at 3.5 x 109 /L were best discriminative for successful prediction. In the decision tree algorithm, patients predicted as good mobilizers (0 to 2 risk factors) had a higher success rate (150/169, 88.8%) than that (4/20, 20.0%) of those predicted as poor mobilizers (3-5 risk factors). Moreover, patients predicted as good mobilizers and further with a HPC enumeration ≥ 28 x 106/L had a high probability of achieving successful mobilization (138/148, 93.2%). Conclusion Our CART algorithm incorporating host predictors, HPC enumeration and MNC count may improve prediction and thus increase the success of PBSC mobilization. Further prospective validation is necessary. Figure 1. Figure 1. Disclosures No relevant conflicts of interest to declare.

APA, Harvard, Vancouver, ISO, and other styles

44

Podhorska, Ivana, Jaromir Vrbka, George Lazaroiu, and Maria Kovacova. "Innovations in Financial Management: Recursive Prediction Model Based on Decision Trees." Marketing and Management of Innovations, no. 3 (2020): 276–92. http://dx.doi.org/10.21272/mmi.2020.3-20.

Full text

Abstract:

The issue of enterprise financial distress represents the actual and interdisciplinary topic for the economic community. The bankrupt is thus one of the major externalities of today’s modern economies, which cannot be avoided even with every effort. Where there are investment opportunities, there are individuals and businesses that are willing to assume their financial obligations and the resulting risks to maintain and develop their standard of living or their economic activities. The decision tree algorithm is one of the most intuitive methods of data mining that can be used for financial distress prediction. Systematization literary sources and approaches prove that decision trees represent the part of the innovations in financial management. The main propose of the research is a possibility of application of a decision tree algorithm for the creation of the prediction model, which can be used in economy practice. The Paper's main aim is to create a comprehensive prediction model of enterprise financial distress based on decision trees, under the conditions of emerging markets. Paper methods are based on the decision tree, with emphasis on algorithm CART. Emerging markets included 17 countries: Slovak Republic, Czech Republic, Poland, Hungary, Romania, Bulgaria, Lithuania, Latvia, Estonia, Slovenia, Croatia, Serbia, Russia, Ukraine, Belarus, Montenegro, and Macedonia. Paper research is focused on the possibilities of implementation of a decision tree algorithm for the creation of a prediction model in the condition of emerging markets. Used data contained 2,359,731 enterprises from emerging markets (30% of total amount); divided into prosperous enterprises (1,802,027) and non-prosperous enterprises (557,704); obtained from Amadeus database. Input variables for the model represented 24 financial indicators, 3 dummy variables, and the countries' GDP data, in the years 2015 and 2016. The 80% of enterprises represented the training sample and 20% test sample, for model creation. The model correctly classified 93.2% of enterprises from both the training and test sample. Correctly classification of non-prosperous enterprises was 83.5% in both samples. The result of the research brings a new model for the identification of bankrupt enterprises. The created prediction model can be considered sufficiently suitable for classifying enterprises in emerging markets. Keywords prediction model, decision tree, emerging markets.

APA, Harvard, Vancouver, ISO, and other styles

45

Vlahou, Antonia, John O. Schorge, Betsy W. Gregory, and Robert L. Coleman. "Diagnosis of Ovarian Cancer Using Decision Tree Classification of Mass Spectral Data." Journal of Biomedicine and Biotechnology 2003, no. 5 (2003): 308–14. http://dx.doi.org/10.1155/s1110724303210032.

Full text

Abstract:

Recent reports from our laboratory and others support the SELDI ProteinChip technology as a potential clinical diagnostic tool when combined withn-dimensional analyses algorithms. The objective of this study was to determine if the commercially available classification algorithm biomarker patterns software (BPS), which is based on a classification and regression tree (CART), would be effective in discriminating ovarian cancer from benign diseases and healthy controls. Serum protein mass spectrum profiles from 139 patients with either ovarian cancer, benign pelvic diseases, or healthy women were analyzed using the BPS software. A decision tree, using five protein peaks resulted in an accuracy of 81.5% in the cross-validation analysis and 80%in a blinded set of samples in differentiating the ovarian cancer from the control groups. The potential, advantages, and drawbacks of the BPS system as a bioinformatic tool for the analysis of the SELDI high-dimensional proteomic data are discussed.

APA, Harvard, Vancouver, ISO, and other styles

46

Ryanto, Sean Akbar, Donni Richasdy, and Widi Astuti. "Partner Sentiment Analysis for Telkom University on Twitter Social Media Using Decision Tree (CART) Algorithm." JURNAL MEDIA INFORMATIKA BUDIDARMA 6, no. 4 (October 25, 2022): 1940. http://dx.doi.org/10.30865/mib.v6i4.4533.

Full text

Abstract:

Sentiment analysis is an analysis in terms of opinion and meaning in the form of writing. Sentiment analysis is very useful for expressing opinions from any individual or group to improve branding. Branding is a process to promote and improve the name of a brand or brands to attract the attention of consumers to be interested in trying the services of a company that runs in academic terms such as Telkom University. However, this requires cooperation between other associations as partners so that the branding carried out can be effective. One form of cooperation is by providing opinions about Telkom University so that consumers are more familiar with Telkom University on Twitter social media which is the largest social media used by many people because it can provide any opinion freely. Therefore, this study aims to analyze the sentiment submitted by partners for Telkom University on Twitter which is the main factor for promoting themselves to consumers. The process carried out is to take all tweets about Telkom University submitted by partners and then carry out the TF-IDF weighting process and classified using the Decision Tree CART algorithm based on positive, negative, and neutral sentiment categories. The best results obtained by the Decision Tree model of the CART algorithm are the Accuracy value of 86.73%, Precision of 87.06%, Recall of 87.55%, and F1-Score of 86.52%.

APA, Harvard, Vancouver, ISO, and other styles

47

Sa'adah, Umu, Masithoh Yessi Rochayani, and Ani Budi Astuti. "Knowledge discovery from gene expression dataset using bagging lasso decision tree." Indonesian Journal of Electrical Engineering and Computer Science 21, no. 2 (February 1, 2021): 1151. http://dx.doi.org/10.11591/ijeecs.v21.i2.pp1151-1159.

Full text

Abstract:

<p>Classifying high-dimensional data are a challenging task in data mining. Gene expression data is a type of high-dimensional data that has thousands of features. The study was proposing a method to extract knowledge from high-dimensional gene expression data by selecting features and classifying. Lasso was used for selecting features and the classification and regression tree (CART) algorithm was used to construct the decision tree model. To examine the stability of the lasso decision tree, we performed bootstrap aggregating (Bagging) with 50 replications. The gene expression data used was an ovarian tumor dataset that has 1,545 observations, 10,935 gene features, and binary class. The findings of this research showed that the lasso decision tree could produce an interpretable model that theoretically correct and had an accuracy of 89.32%. Meanwhile, the model obtained from the majority vote gave an accuracy of 90.29% which showed an increase in accuracy of 1% from the single lasso decision tree model. The slightly increasing accuracy shows that the lasso decision tree classifier is stable.</p>

APA, Harvard, Vancouver, ISO, and other styles

48

Hu, Guanghui, Weizhi Zhang, Hong Wan, and Xinxin Li. "Improving the Heading Accuracy in Indoor Pedestrian Navigation Based on a Decision Tree and Kalman Filter." Sensors 20, no. 6 (March 12, 2020): 1578. http://dx.doi.org/10.3390/s20061578.

Full text

Abstract:

In pedestrian inertial navigation, multi-sensor fusion is often used to obtain accurate heading estimates. As a widely distributed signal source, the geomagnetic field is convenient to provide sufficiently accurate heading angles. Unfortunately, there is a broad presence of artificial magnetic perturbations in indoor environments, leading to difficulties in geomagnetic correction. In this paper, by analyzing the spatial distribution model of the magnetic interference field on the geomagnetic field, two quantitative features have been found to be crucial in distinguishing normal magnetic data from anomalies. By leveraging these two features and the classification and regression tree (CART) algorithm, we trained a decision tree that is capable of extracting magnetic data from distorted measurements. Furthermore, this well-trained decision tree can be used as a reject gate in a Kalman filter. By combining the decision tree and Kalman filter, a high-precision indoor pedestrian navigation system based on a magnetically assisted inertial system is proposed. This system is then validated in a real indoor environment, and the results show that our system delivers state-of-the-art positioning performance. Compared to other baseline algorithms, an improvement of over 70% in the positioning accuracy is achieved.

APA, Harvard, Vancouver, ISO, and other styles

49

Pratiwi, Ula Mir'aatunnas, and Mursyidul Ibad. "KLASIFIKASI FAKTOR YANG BERPENGARUH DALAM KEHAMILAN TIDAK DIINGINKAN MENGGUNAKAN METODE ALGORITMA DECISION TREE." Jurnal Lebesgue : Jurnal Ilmiah Pendidikan Matematika, Matematika dan Statistika 3, no. 2 (August 30, 2022): 406–16. http://dx.doi.org/10.46306/lb.v3i2.129.

Full text

Abstract:

Unwanted pregnancy (KTD) in Indonesia has increased every year. Unwanted pregnancy is one of the factors causing miscarriage, abortion, low birth weight and premature birth. Of course, this also has an impact on increasing the risk of maternal and child mortality. This research aims toanalyze classificationfactors that influence unwanted pregnancies using the decision tree algorithm method. This research is an non reactive or unobstrusive research. The research design used was cross-sectional using secondary data from the 2019 KKBPK Performance and Accountability Survey (SKAP). The population in this study were couples of childbearing age (PUS) in Indonesia with sample 46,220 currently married women aged 15-49 years. Analysis of the data used in this study using the decision tree CART algorithm. The results of this study indicate thatpThe classification system is formed with an accuracy level of 84.5% on training data, and 84.6% on testing data. The tree construction that was formed resulted in 13 classifications of factors related to KTD with the highest classification (94%) namely PUS who had children who were still alive 2, lived in urban areas, mother's age at first marriage was 25 years, and when family planning decisions were made by themselves. yourself (mother), service provider, spouse (husband).

APA, Harvard, Vancouver, ISO, and other styles

50

Liu, Yafei, Zhaoxu Ren, Jiye Li, and Jun Li. "Design of Informatization College and University Teaching Management System Based on Improved Decision Tree Algorithm." Wireless Communications and Mobile Computing 2022 (April 14, 2022): 1–11. http://dx.doi.org/10.1155/2022/3127487.

Full text

Abstract:

At present, the teaching management system used in colleges cannot classify and store the teaching material information well and also has some problems, such as inaccurate calculation results of resource information weight, long response time, and large data query error. Therefore, this study designs an information college teaching management system based on improved decision tree algorithm. The hardware structure of the system consists of information communication structure, information teaching resource sharing structure, processor, and crystal oscillator circuit, and the core module is the data output control module. This study designs the system software based on the improved decision tree algorithm, creates a decision tree recursively, uses the CART decision tree to calculate the weight of teaching resource information, and constructs the fitness objective function of teaching resource information according to the mean clustering algorithm, so as to accurately extract the teaching resource information and realize the efficient processing of college teaching resource. The experimental results show that the response time of the system in this study is only 8 ms, the maximum convergence value is only 40, there is only one wrong data in the data query, and the storage time is 40 s. This system has a short response time, fast convergent rate, and a low probability of data search error when more than one client access the database at the same time.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!