To see the other types of publications on this topic, follow the link: Classification tree models.

Journal articles on the topic 'Classification tree models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Classification tree models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Verbyla, David L. "Classification trees: a new discrimination tool." Canadian Journal of Forest Research 17, no. 9 (September 1, 1987): 1150–52. http://dx.doi.org/10.1139/x87-177.

Full text
Abstract:
Classification trees are discriminant models structured as dichtomous keys. A simple classification tree is presented and contrasted with a linear discriminant function. Classification trees have several advantages when compared with linear discriminant analysis. The method is robust with respect to outlier cases. It is nonparametric and can use nominal, ordinal, interval, and ratio scaled predictor variables. Cross-validation is used during tree development to prevent overrating the tree with too many predictor variables. Missing values are handled by using surrogate splits based on nonmissing predictor variables. Classification trees, like linear discriminant analysis, have potential prediction bias and therefore should be validated before being accepted.
APA, Harvard, Vancouver, ISO, and other styles
2

Diligenti, M., P. Frasconi, and M. Gori. "Hidden tree markov models for document image classification." IEEE Transactions on Pattern Analysis and Machine Intelligence 25, no. 4 (April 2003): 520–24. http://dx.doi.org/10.1109/tpami.2003.1190578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Povkhan, I. F. "THE METHOD OF BOUNDED CONSTRUCTIONS OF LOGICAL CLASSIFICATION TREES IN THE PROBLEM OF DISCRETE OBJECTS CLASSIFICATION." Ukrainian Journal of Information Technology 3, no. 1 (2021): 22–29. http://dx.doi.org/10.23939/ujit2021.03.022.

Full text
Abstract:
The problem of constructing a model of logical classification trees based on a limited method of selecting elementary features for geological data arrays is considered. A method for approximating an array of real data with a set of elementary features with a fixed criterion for stopping the branching procedure at the stage of constructing a classification tree is proposed. This approach allows to ensure the necessary accuracy of the model, reduce its structural complexity, and achieve the necessary performance indicators. A limited method for constructing classification trees has been developed, which is aimed at completing only those paths (tiers) of the classification tree structure where there are the greatest number of errors (of all types) of classification. This approach to synthesizing the recognition model makes it possible to effectively regulate the complexity (accuracy) of the classification tree model that is being built, and it is advisable to use it in situations with restrictions on the hardware resources of the information system, restrictions on the accuracy and structural complexity of the model, restrictions on the structure, sequence and depth of recognition of the training sample data array. The limited scheme of synthesis of classification trees allows to build models almost 20 % faster. The constructed logical classification tree will accurately classify (recognize) the entire training sample that the model is based on, will have a minimal structure (structural complexity), and will consist of components – sets of elementary features as design vertices, tree attributes. Based on the proposed modification of the elementary feature selection method, software has been developed that allows working with a set of different types of applied problems. An approach to synthesizing new recognition models based on a limited logic tree scheme and selecting pre-pruning parameters is proposed. In other words, an effective scheme for recognizing discrete objects has been developed based on step-by-step evaluation and selection of sets of attributes (generalized features) based on selected paths in the classification tree structure at each stage of scheme synthesis.
APA, Harvard, Vancouver, ISO, and other styles
4

Maschler, Julia, Clement Atzberger, and Markus Immitzer. "Individual Tree Crown Segmentation and Classification of 13 Tree Species Using Airborne Hyperspectral Data." Remote Sensing 10, no. 8 (August 3, 2018): 1218. http://dx.doi.org/10.3390/rs10081218.

Full text
Abstract:
Knowledge of the distribution of tree species within a forest is key for multiple economic and ecological applications. This information is traditionally acquired through time-consuming and thereby expensive field work. Our study evaluates the suitability of a visible to near-infrared (VNIR) hyperspectral dataset with a spatial resolution of 0.4 m for the classification of 13 tree species (8 broadleaf, 5 coniferous) on an individual tree crown level in the UNESCO Biosphere Reserve ‘Wienerwald’, a temperate Austrian forest. The study also assesses the automation potential for the delineation of tree crowns using a mean shift segmentation algorithm in order to permit model application over large areas. Object-based Random Forest classification was carried out on variables that were derived from 699 manually delineated as well as automatically segmented reference trees. The models were trained separately for two strata: small and/or conifer stands and high broadleaf forests. The two strata were delineated beforehand using CHM-based tree height and NDVI. The predictor variables encompassed spectral reflectance, vegetation indices, textural metrics and principal components. After feature selection, the overall classification accuracy (OA) of the classification based on manual delineations of the 13 tree species was 91.7% (Cohen’s kappa (κ) = 0.909). The highest user’s and producer’s accuracies were most frequently obtained for Weymouth pine and Scots Pine, while European ash was most often associated with the lowest accuracies. The classification that was based on mean shift segmentation yielded similarly good results (OA = 89.4% κ = 0.883). Based on the automatically segmented trees, the Random Forest models were also applied to the whole study site (1050 ha). The resulting tree map of the study area confirmed a high abundance of European beech (58%) with smaller amounts of oak (6%) and Scots pine (5%). We conclude that highly accurate tree species classifications can be obtained from hyperspectral data covering the visible and near-infrared parts of the electromagnetic spectrum. Our results also indicate a high automation potential of the method, as the results from the automatically segmented tree crowns were similar to those that were obtained for the manually delineated tree crowns.
APA, Harvard, Vancouver, ISO, and other styles
5

Thoe, Wai, King Wah Choi, and Joseph Hun-wei Lee. "Predicting ‘very poor’ beach water quality gradings using classification tree." Journal of Water and Health 14, no. 1 (October 8, 2015): 97–108. http://dx.doi.org/10.2166/wh.2015.094.

Full text
Abstract:
A beach water quality prediction system has been developed in Hong Kong using multiple linear regression (MLR) models. However, linear models are found to be weak at capturing the infrequent ‘very poor’ water quality occasions when Escherichia coli (E. coli) concentration exceeds 610 counts/100 mL. This study uses a classification tree to increase the accuracy in predicting the ‘very poor’ water quality events at three Hong Kong beaches affected either by non-point source or point source pollution. Binary-output classification trees (to predict whether E. coli concentration exceeds 610 counts/100 mL) are developed over the periods before and after the implementation of the Harbour Area Treatment Scheme, when systematic changes in water quality were observed. Results show that classification trees can capture more ‘very poor’ events in both periods when compared to the corresponding linear models, with an increase in correct positives by an average of 20%. Classification trees are also developed at two beaches to predict the four-category Beach Water Quality Indices. They perform worse than the binary tree and give excessive false alarms of ‘very poor’ events. Finally, a combined modelling approach using both MLR model and classification tree is proposed to enhance the beach water quality prediction system for Hong Kong.
APA, Harvard, Vancouver, ISO, and other styles
6

Povkhan, Igor. "FEATURES OF SOFTWARE SOLUTIONS OF MODELS OF LOGICAL CLASSIFICATION TREES BASED ON SELECTION OF SETS OF ELEMENTARY FEATURES." Technical Sciences and Technologies, no. 4(22) (2020): 72–90. http://dx.doi.org/10.25140/2411-5363-2020-4(22)-72-90.

Full text
Abstract:
Urgency of the research.Currently there are several independent approaches (concepts) to solve the classification problem in the general setting, and the development of various concepts, approaches, methods, and models that cover the general issues of the theory of artificial intelligence and information systems, all of these approaches in a recognition theory have their advantages and disadvantages and form a single tool to solve applied problems of the theory of artificial intelligence. This study will focus on the current concept of decision trees (classification trees). The general problem of software (algorithmic) construction of logical recognition trees (classification) is considered. The object of this research is logical classification trees (LСT structures). The subject of the research is actual methods and algorithmic schemes for constructing logical classification trees. Target setting.The main existing methods and algorithms for working with arrays of discrete information in the construc-tion of recognition functions (classifiers) do not allow you to achieve a predetermined level of accuracy (efficiency) of the classification system and regulate their complexity in the construction process. However, this disadvantage is absent in meth-ods and schemes for building recognition systems based on the concept of logical classification trees (decision trees). That is, the coverage of the training sample the set of elementary signs in the case of LCT generates a fixed tree data structure (model LCT), which provides compression and conversion initial data TS, and therefore allows significant optimization and savings of hardware resources of the system, and is based on a single methodology – the optimal approximation test sample set of elementary features (attributes) that are included in some schema (operator) constructed in the learning process.Actual scientific researches and issues analysis. The possibility of an effective and economical software (algorithmic) scheme for constructing a logical classification tree (LCT structuremodel) based on the source arrays of training samples (arrays of discrete information) of a large sample.The research objective. Development of a simple and high-quality software method (algorithm and software system) for building models (structures) LCTfor large arrays of initial samples by synthesizing minimal forms of classification and recog-nition trees that provide an effective approximation of educational information with a set of ranked elementary features (at-tributes) is created on the basis of ascheme for branched feature selection in a wide range of applied problems.The statement of basic materials. We propose a general program scheme for constructing structures of logical classifi-cation trees, which for a given initial training sample builds a tree structure (classification model), which consists of a set of elementary features evaluated at each step of building the model for this sample. A method and ready-made software system build logic trees the main idea is to approximate the initial random sampling of the volume set of elementary features. This method provides the selection of the most informative (qualitative) elementary features from the source set when forming the current vertex of the logical tree (node). This approach allows to significantly reduce the size and complexity of the tree (the total number of branches and tiers of the structure) and improve the quality of its subsequent analysis.Conclusions. The developed and proposed mathematical support for constructing LCT structures (classification tree mod-els) allows it to be used for solving a wide range of practical problems of recognition and classification, and the prospectsfor further research may consist in creating a limited method of logical classification tree (LCT structures), which consists in maintaining the criterion for stopping the procedure for constructing a logical tree by the depth of the structure, optimizing its software implementations, as well as experimental studies of this method for a wider range of practicalproblems.
APA, Harvard, Vancouver, ISO, and other styles
7

Khoshgoftaar, Taghi M., and Naeem Seliya. "Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm." International Journal on Artificial Intelligence Tools 12, no. 03 (September 2003): 207–25. http://dx.doi.org/10.1142/s0218213003001204.

Full text
Abstract:
Predicting the quality of system modules prior to software testing and operations can benefit the software development team. Such a timely reliability estimation can be used to direct cost-effective quality improvement efforts to the high-risk modules. Tree-based software quality classification models based on software metrics are used to predict whether a software module is fault-prone or not fault-prone. They are white box quality estimation models with good accuracy, and are simple and easy to interpret. An in-depth study of calibrating classification trees for software quality estimation using the SPRINT decision tree algorithm is presented. Many classification algorithms have memory limitations including the requirement that datasets be memory resident. SPRINT removes all of these limitations and provides a fast and scalable analysis. It is an extension of a commonly used decision tree algorithm, CART, and provides a unique tree pruning technique based on the Minimum Description Length (MDL) principle. Combining the MDL pruning technique and the modified classification algorithm, SPRINT yields classification trees with useful accuracy. The case study used consists of software metrics collected from a very large telecommunications system. It is observed that classification trees built by SPRINT are more balanced and demonstrate better stability than those built by CART.
APA, Harvard, Vancouver, ISO, and other styles
8

Hu, Ruo, and Zan Fu Xie. "Classification of Knowledge Discovery Methods." Applied Mechanics and Materials 63-64 (June 2011): 859–62. http://dx.doi.org/10.4028/www.scientific.net/amm.63-64.859.

Full text
Abstract:
Knowledge Discovery, the science and technology of exploring knowledge in order to discover previously unknown patterns, is a part of the overall process of getting information in databases. In today’s computer-driven world, these databases contain a lot of information. The significant value of this information makes knowledge discovery a matter of considerable importance and necessity. A decision tree is a predictive model which can be used to represent both classifiers and regression models. When a decision tree is used for classification tasks, it is more appropriately referred to as a classification tree.in this paper, Classification Trees Method of Knowledge Discovery In Internet is given.
APA, Harvard, Vancouver, ISO, and other styles
9

Thakkar, Pooja. "Drug Classification using Black-box models and Interpretability." International Journal for Research in Applied Science and Engineering Technology 9, no. 9 (September 30, 2021): 1518–29. http://dx.doi.org/10.22214/ijraset.2021.38203.

Full text
Abstract:
Abstract: The focus of this study is on drug categorization utilising Machine Learning models, as well as interpretability utilizing LIME and SHAP to get a thorough understanding of the ML models. To do this, the researchers used machine learning models such as random forest, decision tree, and logistic regression to classify drugs. Then, using LIME and SHAP, they determined if these models were interpretable, which allowed them to better understand their results. It may be stated at the conclusion of this paper that LIME and SHAP can be utilised to get insight into a Machine Learning model and determine which attribute is accountable for the divergence in the outcomes. According to the LIME and SHAP results, it is also discovered that Random Forest and Decision Tree ML models are the best models to employ for drug classification, with Na to K and BP being the most significant characteristics for drug classification. Keywords: Machine Learning, Back-box models, LIME, SHAP, Decision Tree
APA, Harvard, Vancouver, ISO, and other styles
10

Lim, Chee Soon, Edy Tonnizam Mohamad, Mohammad Reza Motahari, Danial Jahed Armaghani, and Rosli Saad. "Machine Learning Classifiers for Modeling Soil Characteristics by Geophysics Investigations: A Comparative Study." Applied Sciences 10, no. 17 (August 19, 2020): 5734. http://dx.doi.org/10.3390/app10175734.

Full text
Abstract:
To design geotechnical structures efficiently, it is important to examine soil’s physical properties. Therefore, classifying soil with respect to geophysical parameters is an advantageous and popular approach. Novel, quick, cost, and time effective machine learning techniques can facilitate this classification. This study employs three kinds of machine learning models, including the Decision Tree, Artificial Neural Networks, and Bayesian Networks. The Decision tree models included the chi-square automatic interaction detection (CHAID), classification and regression trees (CART), quick, unbiased, and efficient statistical tree (QUEST), and C5; the Artificial Neural Networks models included Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF); and BN models included the Tree Augmented Naïve (TAN) and Markov Blanket, which were employed to predict the soil classifications using geophysics investigations and laboratory tests. The performance of each model was assessed through the accuracy, stability and gains. The results showed that while the BAYESIANMARKOV model achieved the highest overall accuracy (100%) in training phase, this model achieved the lowest accuracy (34.21%) in testing phases. Thus, this model had the worst stability. The QUEST had the second highest overall training accuracy (99.12%) and had the highest overall testing accuracy (94.74%). Thus, this model was somewhat stable and had an acceptable overall training and testing accuracy to predict the soil characteristics. The future studies can use the findings of this paper as a benchmark to classify the soil characteristics and select the best machine learning technique to perform this classification.
APA, Harvard, Vancouver, ISO, and other styles
11

Khoshgoftaar, T. M., E. B. Allen, W. D. Jones, and J. P. Hudepohl. "Classification-tree models of software-quality over multiple releases." IEEE Transactions on Reliability 49, no. 1 (March 2000): 4–11. http://dx.doi.org/10.1109/24.855532.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

LIANG, HAN, YUHONG YAN, and HARRY ZHANG. "LEARNING DECISION TREES WITH LOG CONDITIONAL LIKELIHOOD." International Journal of Pattern Recognition and Artificial Intelligence 24, no. 01 (February 2010): 117–51. http://dx.doi.org/10.1142/s0218001410007877.

Full text
Abstract:
In machine learning and data mining, traditional learning models aim for high classification accuracy. However, accurate class probability prediction is more desirable than classification accuracy in many practical applications, such as medical diagnosis. Although it is known that decision trees can be adapted to be class probability estimators in a variety of approaches, and the resulting models are uniformly called Probability Estimation Trees (PETs), the performances of these PETs in class probability estimation, have not yet been investigated. We begin our research by empirically studying PETs in terms of class probability estimation, measured by Log Conditional Likelihood (LCL). We also compare a PET called C4.4 with other representative models, including Naïve Bayes, Naïve Bayes Tree, Bayesian Network, KNN and SVM, in LCL. From our experiments, we draw several valuable conclusions. First, among various tree-based models, C4.4 is the best in yielding precise class probability prediction measured by LCL. We provide an explanation for this and reveal the nature of LCL. Second, compared with non tree-based models, C4.4 also performs best. Finally, LCL does not dominate another well-established relevant metric — AUC, which suggests that different decision-tree learning models should be used for different objectives. Our experiments are conducted on the basis of 36 UCI sample sets. We run all the models within a machine learning platform — Weka. We also explore an approach to improve the class probability estimation of Naïve Bayes Tree. We propose a greedy and recursive learning algorithm, where at each step, LCL is used as the scoring function to expand the decision tree. The algorithm uses Naïve Bayes created at leaves to estimate class probabilities of test samples. The whole tree encodes the posterior class probability in its structure. One benefit of improving class probability estimation is that both classification accuracy and AUC can be possibly scaled up. We call the new model LCL Tree (LCLT). Our experiments on 33 UCI sample sets show that LCLT outperforms all state-of-the-art learning models, such as Naïve Bayes Tree, significantly in accurate class probability prediction measured by LCL, as well as in classification accuracy and AUC.
APA, Harvard, Vancouver, ISO, and other styles
13

Seo, Kanghyeon, Bokjin Chung, Hamsa Priya Panchaseelan, Taewoo Kim, Hyejung Park, Byungmo Oh, Minho Chun, et al. "Forecasting the Walking Assistance Rehabilitation Level of Stroke Patients Using Artificial Intelligence." Diagnostics 11, no. 6 (June 15, 2021): 1096. http://dx.doi.org/10.3390/diagnostics11061096.

Full text
Abstract:
Cerebrovascular accidents (CVA) cause a range of impairments in coordination, such as a spectrum of walking impairments ranging from mild gait imbalance to complete loss of mobility. Patients with CVA need personalized approaches tailored to their degree of walking impairment for effective rehabilitation. This paper aims to evaluate the validity of using various machine learning (ML) and deep learning (DL) classification models (support vector machine, Decision Tree, Perceptron, Light Gradient Boosting Machine, AutoGluon, SuperTML, and TabNet) for automated classification of walking assistant devices for CVA patients. We reviewed a total of 383 CVA patients’ (1623 observations) prescription data for eight different walking assistant devices from five hospitals. Among the classification models, the advanced tree-based classification models (LightGBM and tree models in AutoGluon) achieved classification results of over 90% accuracy, recall, precision, and F1-score. In particular, AutoGluon not only presented the highest predictive performance (almost 92% in accuracy, recall, precision, and F1-score, and 86.8% in balanced accuracy) but also demonstrated that the classification performances of the tree-based models were higher than that of the other models on its leaderboard. Therefore, we believe that tree-based classification models have potential as practical diagnosis tools for medical rehabilitation.
APA, Harvard, Vancouver, ISO, and other styles
14

Das, Adrian J., John J. Battles, Nathan L. Stephenson, and Phillip J. van Mantgem. "The relationship between tree growth patterns and likelihood of mortality: a study of two tree species in the Sierra Nevada." Canadian Journal of Forest Research 37, no. 3 (March 2007): 580–97. http://dx.doi.org/10.1139/x06-262.

Full text
Abstract:
We examined mortality of Abies concolor (Gord. & Glend.) Lindl. (white fir) and Pinus lambertiana Dougl. (sugar pine) by developing logistic models using three growth indices obtained from tree rings: average growth, growth trend, and count of abrupt growth declines. For P. lambertiana, models with average growth, growth trend, and count of abrupt declines improved overall prediction (78.6% dead trees correctly classified, 83.7% live trees correctly classified) compared with a model with average recent growth alone (69.6% dead trees correctly classified, 67.3% live trees correctly classified). For A. concolor, counts of abrupt declines and longer time intervals improved overall classification (trees with DBH ≥20 cm: 78.9% dead trees correctly classified and 76.7% live trees correctly classified vs. 64.9% dead trees correctly classified and 77.9% live trees correctly classified; trees with DBH <20 cm: 71.6% dead trees correctly classified and 71.0% live trees correctly classified vs. 67.2% dead trees correctly classified and 66.7% live trees correctly classified). In general, count of abrupt declines improved live-tree classification. External validation of A. concolor models showed that they functioned well at stands not used in model development, and the development of size-specific models demonstrated important differences in mortality risk between understory and canopy trees. Population-level mortality-risk models were developed for A. concolor and generated realistic mortality rates at two sites. Our results support the contention that a more comprehensive use of the growth record yields a more robust assessment of mortality risk.
APA, Harvard, Vancouver, ISO, and other styles
15

El-Rayes, Nesreen, Ming Fang, Michael Smith, and Stephen M. Taylor. "Predicting employee attrition using tree-based models." International Journal of Organizational Analysis 28, no. 6 (March 4, 2020): 1273–91. http://dx.doi.org/10.1108/ijoa-10-2019-1903.

Full text
Abstract:
Purpose The purpose of this study is to develop tree-based binary classification models to predict the likelihood of employee attrition based on firm cultural and management attributes. Design/methodology/approach A data set of resumes anonymously submitted through Glassdoor’s online portal is used in tandem with public company review information to fit decision tree, random forest and gradient boosted tree models to predict the probability of an employee leaving a firm during a job transition. Findings Random forest and decision tree methods are found to be the strongest attrition prediction models. In addition, compensation, company culture and senior management performance play a primary role in an employee’s decision to leave a firm. Practical implications This study may be used by human resources staff to better understand factors which influence employee attrition. In addition, techniques developed in this study may be applied to company-specific data sets to construct customized attrition models. Originality/value This study contains several novel contributions which include exploratory studies such as industry job transition percentages, distributional comparisons between factors strongly contributing to employee attrition between those who left or stayed with the firm and the first comprehensive search over binary classification models to identify which provides the strongest predictive performance of employee attrition.
APA, Harvard, Vancouver, ISO, and other styles
16

Williams, Adina, Andrew Drozdov*, and Samuel R. Bowman. "Do latent tree learning models identify meaningful structure in sentences?" Transactions of the Association for Computational Linguistics 6 (December 2018): 253–67. http://dx.doi.org/10.1162/tacl_a_00019.

Full text
Abstract:
Recent work on the problem of latent tree learning has made it possible to train neural networks that learn to both parse a sentence and use the resulting parse to interpret the sentence, all without exposure to ground-truth parse trees at training time. Surprisingly, these models often perform better at sentence understanding tasks than models that use parse trees from conventional parsers. This paper aims to investigate what these latent tree learning models learn. We replicate two such models in a shared codebase and find that (i) only one of these models outperforms conventional tree-structured models on sentence classification, (ii) its parsing strategies are not especially consistent across random restarts, (iii) the parses it produces tend to be shallower than standard Penn Treebank (PTB) parses, and (iv) they do not resemble those of PTB or any other semantic or syntactic formalism that the authors are aware of.
APA, Harvard, Vancouver, ISO, and other styles
17

Chuerubim, Maria Lígia, Alan Valejo, Barbara Stolte Bezerra, and Irineu Da Silva. "Limitation of classification tree models in investigating road accident severity." Revista de Engenharia Civil IMED 6, no. 2 (December 1, 2019): 3. http://dx.doi.org/10.18256/2358-6508.2019.v6i2.2927.

Full text
Abstract:
O objetivo deste estudo foi discutir as principais limitações encontradas no processo de classificação da severidade dos acidentes de tráfego, com base em modelos de árvore de decisão (CART). Para atingir este objetivo, a CART foi utilizada na mineração de um banco de dados desbalanceado de acidentes rodoviários, considerando a variável dependente severidade da lesão, a qual foi categorizada em acidentes sem vítimas e com vítimas (fatais e não fatais). Para tanto, foram utilizadas as variáveis associadas às características dos acidentes, à infraestrutura viária e às condições ambientais, com a finalidade de se identificar a influência desses fatores na variação da severidade dos acidentes. Embora a classificação pela CART tenha resultado em uma alta acurácia, a mesma forneceu baixa taxa de acerto na classificação dos acidentes com vítimas, que correspondem às observações mais raras do banco de dados. Além disso, resultou na extração de um elevado número de regras de decisão, considerando o número de categorias das variáveis independentes no processo de predição da variável alvo. Os resultados indicaram que a CART não é eficiente no estudo de efeitos multicausais como os acidentes rodoviários, pois não tem a potencialidade de associação de um vasto número de parâmetros, o que restringe a análise e interpretação dos resultados quanto à estrutura binária da árvore. Ela é indicada, no entanto, para a análise exploratória de bancos de dados, quando se deseja analisar a influência de uma categoria específica de uma variável do banco de dados na ocorrência dos acidentes de tráfego.
APA, Harvard, Vancouver, ISO, and other styles
18

Yan, Xuedong, and Essam Radwan. "Analyses of Rear-End Crashes Based on Classification Tree Models." Traffic Injury Prevention 7, no. 3 (September 2006): 276–82. http://dx.doi.org/10.1080/15389580600660062.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Shokirov, Shukhrat, and Géza Király. "Analysis of multitemporal aerial images for fenyőfő Forest change detection." Landscape & Environment 10, no. 2 (October 14, 2016): 89–100. http://dx.doi.org/10.21120/le/10/2/4.

Full text
Abstract:
This study evaluated the use of 40 cm spatial resolution aerial images for individual tree crown delineation, forest type classification, health estimation and clear-cut area detection in Fenyőfő forest reserves in 2012 and 2015 years. Region growing algorithm was used for segmentation of individual tree crowns. Forest type (coniferous/deciduous trees) were distinguished based on the orthomosaic images and segments. Research also investigated the height of individual trees, clear-cut areas and cut crowns between 2012 and 2015 years using Canopy Height Models. Results of the research were examined based on the field measurement data. According to our results, we achieved 75.2% accuracy in individual tree crown delineation. Heights of tree crowns have been calculated with 88.5% accuracy. This study had promising result in clear cut area and individual cut crown detection. Overall accuracy of classification was 77.2%, analysis showed that coniferous tree type classification was very accurate, but deciduous tree classification had a lot of omission errors. Based on the results and analysis, general information about forest health conditions has been presented. Finally, strengths and limitations of the research were discussed and recommendations were given for further research.
APA, Harvard, Vancouver, ISO, and other styles
20

Fan, Zhaofei, Stephen R. Shifley, Martin A. Spetich, Frank R. Thompson III, and David R. Larsen. "Distribution of cavity trees in midwestern old-growth and second-growth forests." Canadian Journal of Forest Research 33, no. 8 (August 1, 2003): 1481–94. http://dx.doi.org/10.1139/x03-068.

Full text
Abstract:
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in old-growth hardwood forests in Missouri, Illinois, and Indiana found that 8–11% of snags had at least one visible cavity (as visually detected from the ground; smallest opening [Formula: see text]2 cm diameter), about twice the percentage for live trees. Five percent of live trees and snags had cavities on mature ([Formula: see text]110 years) second-growth plots on timberland in Missouri. Because snags accounted for typically no more than 10% of standing trees on any of these sites, 80–85% of cavity trees are living trees. Within the subset of mature and old-growth forests, the presence of cavities was strongly related to tree diameter. Classification and regression tree models indicated that 30 cm diameter at breast height (DBH) was a threshold size useful in distinguishing cavity trees from noncavity trees in the old-growth sample. There were two diameter thresholds in the mature second-growth sample: 18 and 44 cm DBH. Cavity tree probability differed by species group and increased with increasing decay class.
APA, Harvard, Vancouver, ISO, and other styles
21

Muzamil Basha, Syed, Dharmendra Singh Rajput, Ravi Kumar Poluru, S. Bharath Bhushan, and Shaik Abdul Khalandar Basha. "Evaluating the Performance of Supervised Classification Models: Decision Tree and Naïve Bayes Using KNIME." International Journal of Engineering & Technology 7, no. 4.5 (September 22, 2018): 248. http://dx.doi.org/10.14419/ijet.v7i4.5.20079.

Full text
Abstract:
The classification task is to predict the value of the target variable from the values of the input variables. If a target is provided as part of the dataset, then classification is a supervised task. It is important to analysis the performance of supervised classification models before using them in classification task. In our research we would like to propose a novel way to evaluated the performance of supervised classification models like Decision Tree and Naïve Bayes using KNIME Analytics platform. Experiments are conducted on Multi variant dataset consisting 58000 instances, 9 columns associated specially for classification, collected from UCI Machine learning repositories (http://archive.ics.uci.edu/ml/datasets/statlog+(shuttle)) and compared the performance of both the models in terms of Classification Accuracy (CA) and Error Rate. Finally, validated both the models using Metric precision, recall and F-measure. In our finding, we found that Decision tree acquires CA (99.465%) where as Naïve Bayes attain CA (90.358%). The F-measure of Decision tree is 0.984, whereas Naïve Bayes acquire 0.7045.
APA, Harvard, Vancouver, ISO, and other styles
22

Begleiter, R., R. El-Yaniv, and G. Yona. "On Prediction Using Variable Order Markov Models." Journal of Artificial Intelligence Research 22 (December 1, 2004): 385–421. http://dx.doi.org/10.1613/jair.1491.

Full text
Abstract:
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a ``decomposed'' CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
APA, Harvard, Vancouver, ISO, and other styles
23

Luna, José Marcio, Efstathios D. Gennatas, Lyle H. Ungar, Eric Eaton, Eric S. Diffenderfer, Shane T. Jensen, Charles B. Simone, Jerome H. Friedman, Timothy D. Solberg, and Gilmer Valdes. "Building more accurate decision trees with the additive tree." Proceedings of the National Academy of Sciences 116, no. 40 (September 16, 2019): 19887–93. http://dx.doi.org/10.1073/pnas.1816748116.

Full text
Abstract:
The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.
APA, Harvard, Vancouver, ISO, and other styles
24

Nai-Arun, Nongyao, and Punnee Sittidech. "Ensemble Learning Model for Diabetes Classification." Advanced Materials Research 931-932 (May 2014): 1427–31. http://dx.doi.org/10.4028/www.scientific.net/amr.931-932.1427.

Full text
Abstract:
This paper proposed data mining techniques to improve efficiency and reliability in diabetes classification. The real data set collected from Sawanpracharak Regional Hospital, Thailand, was fist analyzed by using gain-ratio feature selection techniques. Three well known algorithms; naïve bayes, k-nearest neighbors and decision tree, were used to construct classification models on the selected features. Then, the popular ensemble learning; bagging and boosting were applied using the three base classifiers. The results revealed that the best model with the highest accuracy was bagging with base classifier decision tree algorithm (95.312%). The experiments also showed that ensemble classifier models performed better than the base classifiers alone.
APA, Harvard, Vancouver, ISO, and other styles
25

Ivanov, Atanas. "Decision Trees for Evaluation of Mathematical Competencies in the Higher Education: A Case Study." Mathematics 8, no. 5 (May 8, 2020): 748. http://dx.doi.org/10.3390/math8050748.

Full text
Abstract:
The assessment of knowledge and skills acquired by the student at each academic stage is crucial for every educational process. This paper proposes and tests an approach based on a structured assessment test for mathematical competencies in higher education and methods for statistical evaluation of the test. A case study is presented for the assessment of knowledge and skills for solving linear algebra and analytic geometry problems by first-year university students. The test includes three main parts—a multiple-choice test with four selectable answers, a solution of two problems with and without the use of specialized mathematical software, and a survey with four questions for each problem. The processing of data is performed mainly by the classification and regression tree (CART) method. Comparative analysis, cross-tables, and reliability statistics were also used. Regression tree models are built to assess the achievements of students and classification tree models for competency assessment on a three-categorical scale. The influence of 31 variables and groups of them on the assessment of achievements and grading of competencies is determined. Regression models show over 94% fit with data and classification ones—up to 92% correct classifications. The models can be used to predict students’ grades and assess their mathematical competency.
APA, Harvard, Vancouver, ISO, and other styles
26

Grubb, Teryl G., and Rudy M. King. "Assessing Human Disturbance of Breeding Bald Eagles with Classification Tree Models." Journal of Wildlife Management 55, no. 3 (July 1991): 500. http://dx.doi.org/10.2307/3808982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Rizzo, Giuseppe, Claudia d’Amato, Nicola Fanizzi, and Floriana Esposito. "Tree-based models for inductive classification on the Web Of Data." Journal of Web Semantics 45 (August 2017): 1–22. http://dx.doi.org/10.1016/j.websem.2017.05.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Fakir, Y., M. Azalmad, and R. Elaychi. "Study of The ID3 and C4.5 Learning Algorithms." Journal of Medical Informatics and Decision Making 1, no. 2 (April 23, 2020): 29–43. http://dx.doi.org/10.14302/issn.2641-5526.jmid-20-3302.

Full text
Abstract:
Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model
APA, Harvard, Vancouver, ISO, and other styles
29

Krasteva, Vessela, Irena Jekova, Remo Leber, Ramun Schmid, and Roger Abächerli. "Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System." PLOS ONE 10, no. 10 (October 13, 2015): e0140123. http://dx.doi.org/10.1371/journal.pone.0140123.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Currim, Imran S., Robert J. Meyer, and Nhan T. Le. "Disaggregate Tree-Structured Modeling of Consumer Choice Data." Journal of Marketing Research 25, no. 3 (August 1988): 253–65. http://dx.doi.org/10.1177/002224378802500303.

Full text
Abstract:
A new approach to inferring hierarchical models of consumer choice is described. A classification algorithm is used to estimate decision trees at an individual level without requiring prior assumptions about tree form. Derived models are analyzed within a modeling system that summarizes the diversity of decision rules in a sample as well as their implications for aggregate market shares. An application to the analysis of panel data and a comparison with disaggregate logit analysis are reported.
APA, Harvard, Vancouver, ISO, and other styles
31

Abdollahnejad, Azadeh, and Dimitrios Panagiotidis. "Tree Species Classification and Health Status Assessment for a Mixed Broadleaf-Conifer Forest with UAS Multispectral Imaging." Remote Sensing 12, no. 22 (November 12, 2020): 3722. http://dx.doi.org/10.3390/rs12223722.

Full text
Abstract:
Automatic discrimination of tree species and identification of physiological stress imposed on forest trees by biotic factors from unmanned aerial systems (UAS) offers substantial advantages in forest management practices. In this study, we aimed to develop a novel workflow for facilitating tree species classification and the detection of healthy, unhealthy, and dead trees caused by bark beetle infestation using ultra-high resolution 5-band UAS bi-temporal aerial imagery in the Czech Republic. The study is divided into two steps. We initially classified the tree type, either as broadleaf or conifer, and we then classified trees according to the tree type and health status, and subgroups were created to further classify trees (detailed classification). Photogrammetric processed datasets achieved by the use of structure-from-motion (SfM) imaging technique, where resulting digital terrain models (DTMs), digital surface models (DSMs), and orthophotos with a resolution of 0.05 m were utilized as input for canopy spectral analysis, as well as texture analysis (TA). For the spectral analysis, nine vegetation indices (VIs) were applied to evaluate the amount of vegetation cover change of canopy surface between the two seasons, spring and summer of 2019. Moreover, 13 TA variables, including Mean, Variance, Entropy, Contrast, Heterogeneity, Homogeneity, Angular Second Moment, Correlation, Gray-level Difference Vector (GLDV) Angular Second Moment, GLDV Entropy, GLDV Mean, GLDV Contrast, and Inverse Difference, were estimated for the extraction of canopy surface texture. Further, we used the support vector machine (SVM) algorithm to conduct a detailed classification of tree species and health status. Our results highlighted the efficiency of the proposed method for tree species classification with an overall accuracy (OA) of 81.18% (Kappa: 0.70) and health status assessment with an OA of 84.71% (Kappa: 0.66). While SVM proved to be a good classifier, the results also showed that a combination of VI and TA layers increased the OA by 4.24%, providing a new dimension of information derived from UAS platforms. These methods could be used to quickly evaluate large areas that have been impacted by biological disturbance agents for mapping and detection, tree inventory, and evaluating habitat conditions at relatively low costs.
APA, Harvard, Vancouver, ISO, and other styles
32

Zurada, Jozef, Waldemar Karwowski, and William Marras. "Classification of jobs with risk of low back disorders by applying data mining techniques." Occupational Ergonomics 4, no. 4 (May 17, 2005): 291–305. http://dx.doi.org/10.3233/oer-2004-4406.

Full text
Abstract:
Work related low back disorders (LBDs) continue to pose significant occupational health problem that affects the quality of life of the industrial population. The main objective of this study was to explore the application of various data mining techniques, including neural networks, logistic regression, decision trees, memory-based reasoning, and the ensemble model, for classification of industrial jobs with respect to the risk of work-related LBDs. The results from extensive computer simulations using a 10-fold cross validation showed that memory-based reasoning and ensemble models were the best in the overall classification accuracy. The decision tree and memory-based reasoning models were the most accurate in classifying jobs with high risk of LBDs, whereas neural networks and logistic regression were the best in classifying jobs with low risk of LBDs. The decision tree model delivered the most stable results across 10 generations of different data sets randomly chosen for training, validation, and testing. The classification results generated by the decision tree were the easiest to interpret because they were given in the form of simple 'if-then' rules. These results produced by the decision tree method showed that the peak moment had the highest predictive power of LBDs.
APA, Harvard, Vancouver, ISO, and other styles
33

Staňková, Michaela, and David Hampel. "Bankruptcy Prediction of Engineering Companies in the EU Using Classification Methods." Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis 66, no. 5 (2018): 1347–56. http://dx.doi.org/10.11118/actaun201866051347.

Full text
Abstract:
This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.
APA, Harvard, Vancouver, ISO, and other styles
34

Xie, Yunxin, Chenyang Zhu, Yue Lu, and Zhengwei Zhu. "Towards Optimization of Boosting Models for Formation Lithology Identification." Mathematical Problems in Engineering 2019 (August 14, 2019): 1–13. http://dx.doi.org/10.1155/2019/5309852.

Full text
Abstract:
Lithology identification is an indispensable part in geological research and petroleum engineering study. In recent years, several mathematical approaches have been used to improve the accuracy of lithology classification. Based on our earlier work that assessed machine learning models on formation lithology classification, we optimize the boosting approaches to improve the classification ability of our boosting models with the data collected from the Daniudi gas field and Hangjinqi gas field. Three boosting models, namely, AdaBoost, Gradient Tree Boosting, and eXtreme Gradient Boosting, are evaluated with 5-fold cross validation. Regularization is applied to the Gradient Tree Boosting and eXtreme Gradient Boosting to avoid overfitting. After adapting the hyperparameter tuning approach on each boosting model to optimize the parameter set, we use stacking to combine the three optimized models to improve the classification accuracy. Results suggest that the optimized stacked boosting model has better performance concerning the evaluation matrix such as precision, recall, and f1 score compared with the single optimized boosting model. Confusion matrix also shows that the stacked model has better performance in distinguishing sandstone classes.
APA, Harvard, Vancouver, ISO, and other styles
35

Carrizosa, Emilio, Cristina Molero-Río, and Dolores Romero Morales. "Mathematical optimization in classification and regression trees." TOP 29, no. 1 (March 17, 2021): 5–33. http://dx.doi.org/10.1007/s11750-021-00594-1.

Full text
Abstract:
AbstractClassification and regression trees, as well as their variants, are off-the-shelf methods in Machine Learning. In this paper, we review recent contributions within the Continuous Optimization and the Mixed-Integer Linear Optimization paradigms to develop novel formulations in this research area. We compare those in terms of the nature of the decision variables and the constraints required, as well as the optimization algorithms proposed. We illustrate how these powerful formulations enhance the flexibility of tree models, being better suited to incorporate desirable properties such as cost-sensitivity, explainability, and fairness, and to deal with complex data, such as functional data.
APA, Harvard, Vancouver, ISO, and other styles
36

Zeng, Xiangxiang, Sisi Yuan, You Li, and Quan Zou. "Decision Tree Classification Model for Popularity Forecast of Chinese Colleges." Journal of Applied Mathematics 2014 (2014): 1–7. http://dx.doi.org/10.1155/2014/675806.

Full text
Abstract:
Prospective students generally select their preferred college on the basis of popularity. Thus, this study uses survey data to build decision tree models for forecasting the popularity of a number of Chinese colleges in each district. We first extract a feature called “popularity change ratio” from existing data and then use a simplified but efficient algorithm based on “gain ratio” for decision tree construction. The final model is evaluated using common evaluation methods. This research is the first of its type in the educational field and represents a novel use of decision tree models with time series attributes for forecasting the popularity of Chinese colleges. Experimental analyses demonstrated encouraging results, proving the practical viability of the approach.
APA, Harvard, Vancouver, ISO, and other styles
37

Uddameri, Venkatesh, Ana Silva, Sreeram Singaraju, Ghazal Mohammadi, and E. Hernandez. "Tree-Based Modeling Methods to Predict Nitrate Exceedances in the Ogallala Aquifer in Texas." Water 12, no. 4 (April 3, 2020): 1023. http://dx.doi.org/10.3390/w12041023.

Full text
Abstract:
The performance of four tree-based classification techniques—classification and regression trees (CART), multi-adaptive regression splines (MARS), random forests (RF) and gradient boosting trees (GBT) were compared against the commonly used logistic regression (LR) analysis to assess aquifer vulnerability in the Ogallala Aquifer of Texas. The results indicate that the tree-based models performed better than the logistic regression model, as they were able to locally refine nitrate exceedance probabilities. RF exhibited the best generalizable capabilities. The CART model did better in predicting non-exceedances. Nitrate exceedances were sensitive to well depths—an indicator of aquifer redox conditions, which, in turn, was controlled by alkalinity increases brought forth by the dissolution of calcium carbonate. The clay content of soils and soil organic matter, which serve as indicators of agriculture activities, were also noted to have significant influences on nitrate exceedances. Likely nitrogen releases from confined animal feedlot operations in the northeast portions of the study area also appeared to be locally important. Integrated soil, hydrogeological and geochemical datasets, in conjunction with tree-based methods, help elucidate processes controlling nitrate exceedances. Overall, tree-based models offer flexible, transparent approaches for mapping nitrate exceedances, identifying underlying mechanisms and prioritizing monitoring activities.
APA, Harvard, Vancouver, ISO, and other styles
38

Kumar, Sunil, Saroj Ratnoo, and Jyoti Vashishtha. "HYPER HEURISTIC EVOLUTIONARY APPROACH FOR CONSTRUCTING DECISION TREE CLASSIFIERS." Journal of Information and Communication Technology 20, Number 2 (February 21, 2021): 249–76. http://dx.doi.org/10.32890/jict2021.20.2.5.

Full text
Abstract:
Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.
APA, Harvard, Vancouver, ISO, and other styles
39

Saraee, M., B. Theodoulidis, J. A. Keane, and C. Tjortjis. "Using T3, an Improved Decision Tree Classifier, for Mining Stroke-related Medical Data." Methods of Information in Medicine 46, no. 05 (2007): 523–29. http://dx.doi.org/10.1160/me0317.

Full text
Abstract:
Summary Objectives: Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical managementand investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. Method: We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. Results: T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. Conclusions: This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.
APA, Harvard, Vancouver, ISO, and other styles
40

Egelberg, Jacob, Nina Pena, Rachel Rivera, and Christina Andruk. "Assessing the geographic specificity of pH prediction by classification and regression trees." PLOS ONE 16, no. 8 (August 11, 2021): e0255119. http://dx.doi.org/10.1371/journal.pone.0255119.

Full text
Abstract:
Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York’s Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH.
APA, Harvard, Vancouver, ISO, and other styles
41

Quan, Zhiyu, and Emiliano A. Valdez. "Predictive analytics of insurance claims using multivariate decision trees." Dependence Modeling 6, no. 1 (December 1, 2018): 377–407. http://dx.doi.org/10.1515/demo-2018-0022.

Full text
Abstract:
AbstractBecause of its many advantages, the use of decision trees has become an increasingly popular alternative predictive tool for building classification and regression models. Its origins date back for about five decades where the algorithm can be broadly described by repeatedly partitioning the regions of the explanatory variables and thereby creating a tree-based model for predicting the response. Innovations to the original methods, such as random forests and gradient boosting, have further improved the capabilities of using decision trees as a predictive model. In addition, the extension of using decision trees with multivariate response variables started to develop and it is the purpose of this paper to apply multivariate tree models to insurance claims data with correlated responses. This extension to multivariate response variables inherits several advantages of the univariate decision tree models such as distribution-free feature, ability to rank essential explanatory variables, and high predictive accuracy, to name a few. To illustrate the approach, we analyze a dataset drawn from the Wisconsin Local Government Property Insurance Fund (LGPIF)which offers multi-line insurance coverage of property, motor vehicle, and contractors’ equipments.With multivariate tree models, we are able to capture the inherent relationship among the response variables and we find that the marginal predictive model based on multivariate trees is an improvement in prediction accuracy from that based on simply the univariate trees.
APA, Harvard, Vancouver, ISO, and other styles
42

Behr, Andreas, and Jurij Weinblat. "Default prediction using balance-sheet data: a comparison of models." Journal of Risk Finance 18, no. 5 (November 20, 2017): 523–40. http://dx.doi.org/10.1108/jrf-01-2017-0003.

Full text
Abstract:
Purpose The purpose of this paper is to do a performance comparison of three different data mining techniques. Design/methodology/approach Logit model, decision tree and random forest are applied in this study on British, French, German, Italian, Portuguese and Spanish balance sheet data from 2006 to 2012, which covers 446,464 firms. Because of the strong imbalance with regard to the solvency status, classification trees and random forests are modified to adapt to this imbalance. All three model specifications are optimized extensively using resampling techniques, relying on the training sample only. Model performance is assessed, strictly, based on out-of-sample predictions. Findings Random forest is found to strongly outperform the classification tree and the logit model in almost all considered years and countries, according to the quality measure in this study. Originality/value Obtaining reliable estimates of default propensity scores is of immense importance for potential credit grantors, portfolio managers and regulatory authorities. As the overwhelming majority of firms are not listed on stock exchanges, annual balance sheets still provide the most important source of information. The obtained ranking of the three models according to their predictive performance is relatively robust, due to the consideration of several countries and a relatively long time period.
APA, Harvard, Vancouver, ISO, and other styles
43

Pearse, Grant D., Michael S. Watt, Julia Soewarto, and Alan Y. S. Tan. "Deep Learning and Phenology Enhance Large-Scale Tree Species Classification in Aerial Imagery during a Biosecurity Response." Remote Sensing 13, no. 9 (May 4, 2021): 1789. http://dx.doi.org/10.3390/rs13091789.

Full text
Abstract:
The ability of deep convolutional neural networks (deep learning) to learn complex visual characteristics offers a new method to classify tree species using lower-cost data such as regional aerial RGB imagery. In this study, we use 10 cm resolution imagery and 4600 trees to develop a deep learning model to identify Metrosideros excelsa (pōhutukawa)—a culturally important New Zealand tree that displays distinctive red flowers during summer and is under threat from the invasive pathogen Austropuccinia psidii (myrtle rust). Our objectives were to compare the accuracy of deep learning models that could learn the distinctive visual characteristics of the canopies with tree-based models (XGBoost) that used spectral and textural metrics. We tested whether the phenology of pōhutukawa could be used to enhance classification by using multitemporal aerial imagery that showed the same trees with and without widespread flowering. The XGBoost model achieved an accuracy of 86.7% on the dataset with strong phenology (flowering). Without phenology, the accuracy fell to 79.4% and the model relied on the blueish hue and texture of the canopies. The deep learning model achieved 97.4% accuracy with 96.5% sensitivity and 98.3% specificity when leveraging phenology—even though the intensity of flowering varied substantially. Without strong phenology, the accuracy of the deep learning model remained high at 92.7% with sensitivity of 91.2% and specificity of 94.3% despite significant variation in the appearance of non-flowering pōhutukawa. Pooling time-series imagery did not enhance either approach. The accuracy of XGBoost and deep learning models were, respectively, 83.2% and 95.2%, which were of intermediate precision between the separate models.
APA, Harvard, Vancouver, ISO, and other styles
44

Lafond, Daniel, Benoît R. Vallières, François Vachon, Marie-Ève St-Louis, and Sébastien Tremblay. "Capturing Non-linear Judgment Policies Using Decision Tree Models of Classification Behavior." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 59, no. 1 (September 2015): 831–35. http://dx.doi.org/10.1177/1541931215591251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Mokarram, Reza, and Mehdi Emadi. "Classification in Non-linear Survival Models Using Cox Regression and Decision Tree." Annals of Data Science 4, no. 3 (July 12, 2017): 329–40. http://dx.doi.org/10.1007/s40745-017-0105-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Yarak, Kanitta, Apichon Witayangkurn, Kunnaree Kritiyutanont, Chomchanok Arunplod, and Ryosuke Shibasaki. "Oil Palm Tree Detection and Health Classification on High-Resolution Imagery Using Deep Learning." Agriculture 11, no. 2 (February 23, 2021): 183. http://dx.doi.org/10.3390/agriculture11020183.

Full text
Abstract:
Combining modern technology and agriculture is an important consideration for the effective management of oil palm trees. In this study, an alternative method for oil palm tree management is proposed by applying high-resolution imagery, combined with Faster-RCNN, for automatic detection and health classification of oil palm trees. This study used a total of 4172 bounding boxes of healthy and unhealthy palm trees, constructed from 2000 pixel × 2000 pixel images. Of the total dataset, 90% was used for training and 10% was prepared for testing using Resnet-50 and VGG-16. Three techniques were used to assess the models’ performance: model training evaluation, evaluation using visual interpretation, and ground sampling inspections. The study identified three characteristics needed for detection and health classification: crown size, color, and density. The optimal altitude to capture images for detection and classification was determined to be 100 m, although the model showed satisfactory performance up to 140 m. For oil palm tree detection, healthy tree identification, and unhealthy tree identification, Resnet-50 obtained F1-scores of 95.09%, 92.07%, and 86.96%, respectively, with respect to visual interpretation ground truth and 97.67%, 95.30%, and 57.14%, respectively, with respect to ground sampling inspection ground truth. Resnet-50 yielded better F1-scores than VGG-16 in both evaluations. Therefore, the proposed method is well suited for the effective management of crops.
APA, Harvard, Vancouver, ISO, and other styles
47

Hernández, Víctor Adrián Sosa, Raúl Monroy, Miguel Angel Medina-Pérez, Octavio Loyola-González, and Francisco Herrera. "A Practical Tutorial for Decision Tree Induction." ACM Computing Surveys 54, no. 1 (April 2021): 1–38. http://dx.doi.org/10.1145/3429739.

Full text
Abstract:
Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.
APA, Harvard, Vancouver, ISO, and other styles
48

Hülsmann, Lisa, Harald Bugmann, and Peter Brang. "How to predict tree death from inventory data — lessons from a systematic assessment of European tree mortality models." Canadian Journal of Forest Research 47, no. 7 (July 2017): 890–900. http://dx.doi.org/10.1139/cjfr-2016-0224.

Full text
Abstract:
The future development of forest ecosystems depends critically on tree mortality. However, the suitability of empirical mortality algorithms for extrapolation in space or time remains untested. We systematically analyzed the performance of 46 inventory-based mortality models available from the literature using nearly 80 000 independent records from 54 strict forest reserves in Germany and Switzerland covering 11 species. Mortality rates were predicted with higher accuracy if covariates for tree growth and (or) competition at the individual level were included and if models were applied within the same ecological zone. In contrast, classification of dead vs. living trees was only improved by growth variables. Management intensity in the calibration stands, as well as the census interval and size of the calibration datasets, did not influence model performance. Consequently, future approaches should make use of tree growth and competition at the level of individual trees. Mortality algorithms for applications over a restricted spatial extent and under current climate should be calibrated based on datasets from the same region, even if they are small. To obtain models with wide applicability and enhanced climatic sensitivity, the spatial variability of mortality should be addressed explicitly by considering environmental influences using data of high temporal resolution covering large ecological gradients. Finally, such models need to be validated and documented thoroughly.
APA, Harvard, Vancouver, ISO, and other styles
49

Lima, Nilsa Duarte da Silva, Irenilza de Alencar Nääs, João Gilberto Mendes dos Reis, and Raquel Baracat Tosi Rodrigues da Silva. "Classifying the Level of Energy-Environmental Efficiency Rating of Brazilian Ethanol." Energies 13, no. 8 (April 21, 2020): 2067. http://dx.doi.org/10.3390/en13082067.

Full text
Abstract:
The present study aimed to assess and classify energy-environmental efficiency levels to reduce greenhouse gas emissions in the production, commercialization, and use of biofuels certified by the Brazilian National Biofuel Policy (RenovaBio). The parameters of the level of energy-environmental efficiency were standardized and categorized according to the Energy-Environmental Efficiency Rating (E-EER). The rating scale varied between lower efficiency (D) and high efficiency + (highest efficiency A+). The classification method with the J48 decision tree and naive Bayes algorithms was used to predict the models. The classification of the E-EER scores using a decision tree using the J48 algorithm and Bayesian classifiers using the naive Bayes algorithm produced decision tree models efficient at estimating the efficiency level of Brazilian ethanol producers and importers certified by the RenovaBio. The rules generated by the models can assess the level classes (efficiency scores) according to the scale discretized into high efficiency (Classification A), average efficiency (Classification B), and standard efficiency (Classification C). These results might generate an ethanol energy-environmental efficiency label for the end consumers and resellers of the product, to assist in making a purchase decision concerning its performance. The best classification model was naive Bayes, compared to the J48 decision tree. The classification of the Energy Efficiency Note levels using the naive Bayes algorithm produced a model capable of estimating the efficiency level of Brazilian ethanol to create labels.
APA, Harvard, Vancouver, ISO, and other styles
50

Suhaimi Sulaiman, Mohd, and Zuraidi Saad. "Classification of healthy and white root disease infected rubber trees based on relative permittivity and capacitance input properties using LM and SCG artificial neural network." Indonesian Journal of Electrical Engineering and Computer Science 19, no. 1 (July 1, 2020): 222. http://dx.doi.org/10.11591/ijeecs.v19.i1.pp222-228.

Full text
Abstract:
<span>White root disease is one of the most serious diseases in rubber plantation in Malaysia that originally infects on the root surface of the rubber tree. So, prevention is important compared to treatment. The classification system proposed in the research had the ability of detecting the disease by classifying between healthy rubber trees and white root disease infected rubber trees. 600 samples of latex from healthy rubber trees and white root disease infected rubber trees were taken from the RRIM station in Kota Tinggi, Johor. These samples were measured based on its relative permittivity and capacitance. All of the measurement inputs from the experiment were tested using statistical analysis. These measurement input were then went through the process of classification in ANN to generate the optimized models by using LM and SCG algorithm. There were four optimized models selected from the classification process. The accuracy from the selected most optimized models were greater than 70%. The selected most optimized models were then used to classify between healthy trees and white root infected trees based on single input categories. </span>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography