Rozprawy doktorskie: „Decision tree”

1

Shi, Haijian. "Best-first Decision Tree Learning". The University of Waikato, 2007. http://hdl.handle.net/10289/2317.

Streszczenie:

In best-first top-down induction of decision trees, the best split is added in each step (e.g. the split that maximally reduces the Gini index). This is in contrast to the standard depth-first traversal of a tree. The resulting tree will be the same, just how it is built is different. The objective of this project is to investigate whether it is possible to determine an appropriate tree size on practical datasets by combining best-first decision tree growth with cross-validation-based selection of the number of expansions that are performed. Pre-pruning, post-pruning, CART-pruning can be performed this way to compare.

Style APA, Harvard, Vancouver, ISO itp.

2

Vella, Alan. "Hyper-heuristic decision tree induction". Thesis, Heriot-Watt University, 2012. http://hdl.handle.net/10399/2540.

Pełny tekst źródła

Streszczenie:

A hyper-heuristic is any algorithm that searches or operates in the space of heuristics as opposed to the space of solutions. Hyper-heuristics are increasingly used in function and combinatorial optimization. Rather than attempt to solve a problem using a fixed heuristic, a hyper-heuristic approach attempts to find a combination of heuristics that solve a problem (and in turn may be directly suitable for a class of problem instances). Hyper-heuristics have been little explored in data mining. This work presents novel hyper-heuristic approaches to data mining, by searching a space of attribute selection criteria for decision tree building algorithm. The search is conducted by a genetic algorithm. The result of the hyper-heuristic search in this case is a strategy for selecting attributes while building decision trees. Most hyper-heuristics work by trying to adapt the heuristic to the state of the problem being solved. Our hyper-heuristic is no different. It employs a strategy for adapting the heuristic used to build decision tree nodes according to some set of features of the training set it is working on. We introduce, explore and evaluate five different ways in which this problem state can be represented for a hyper-heuristic that operates within a decisiontree building algorithm. In each case, the hyper-heuristic is guided by a rule set that tries to map features of the data set to be split by the decision tree building algorithm to a heuristic to be used for splitting the same data set. We also explore and evaluate three different sets of low-level heuristics that could be employed by such a hyper-heuristic. This work also makes a distinction between specialist hyper-heuristics and generalist hyper-heuristics. The main difference between these two hyperheuristcs is the number of training sets used by the hyper-heuristic genetic algorithm. Specialist hyper-heuristics are created using a single data set from a particular domain for evolving the hyper-heurisic rule set. Such algorithms are expected to outperform standard algorithms on the kind of data set used by the hyper-heuristic genetic algorithm. Generalist hyper-heuristics are trained on multiple data sets from different domains and are expected to deliver a robust and competitive performance over these data sets when compared to standard algorithms. We evaluate both approaches for each kind of hyper-heuristic presented in this thesis. We use both real data sets as well as synthetic data sets. Our results suggest that none of the hyper-heuristics presented in this work are suited for specialization – in most cases, the hyper-heuristic’s performance on the data set it was specialized for was not significantly better than that of the best performing standard algorithm. On the other hand, the generalist hyper-heuristics delivered results that were very competitive to the best standard methods. In some cases we even achieved a significantly better overall performance than all of the standard methods.

Style APA, Harvard, Vancouver, ISO itp.

3

Bogdan, Vukobratović. "Hardware Acceleration of Nonincremental Algorithms for the Induction of Decision Trees and Decision Tree Ensembles". Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2017. https://www.cris.uns.ac.rs/record.jsf?recordId=102520&source=NDLTD&language=en.

Pełny tekst źródła

Streszczenie:

The thesis proposes novel full decision tree and decision tree ensembleinduction algorithms EFTI and EEFTI, and various possibilities for theirimplementations are explored. The experiments show that the proposed EFTIalgorithm is able to infer much smaller DTs on average, without thesignificant loss in accuracy, when compared to the top-down incremental DTinducers. On the other hand, when compared to other full tree inductionalgorithms, it was able to produce more accurate DTs, with similar sizes, inshorter times. Also, the hardware architectures for acceleration of thesealgorithms (EFTIP and EEFTIP) are proposed and it is shown in experimentsthat they can offer substantial speedups.
У овоj дисертациjи, представљени су нови алгоритми EFTI и EEFTI заформирање стабала одлуке и њихових ансамбала неинкременталномметодом, као и разне могућности за њихову имплементациjу.Експерименти показуjу да jе предложени EFTI алгоритам у могућностида произведе драстично мања стабла без губитка тачности у односу напостојеће top-down инкременталне алгоритме, а стабла знатно већетачности у односу на постојеће неинкременталне алгоритме. Такође супредложене хардверске архитектуре за акцелерацију ових алгоритама(EFTIP и EEFTIP) и показано је да је уз помоћ ових архитектура могућеостварити знатна убрзања.
U ovoj disertaciji, predstavljeni su novi algoritmi EFTI i EEFTI zaformiranje stabala odluke i njihovih ansambala neinkrementalnommetodom, kao i razne mogućnosti za njihovu implementaciju.Eksperimenti pokazuju da je predloženi EFTI algoritam u mogućnostida proizvede drastično manja stabla bez gubitka tačnosti u odnosu napostojeće top-down inkrementalne algoritme, a stabla znatno većetačnosti u odnosu na postojeće neinkrementalne algoritme. Takođe supredložene hardverske arhitekture za akceleraciju ovih algoritama(EFTIP i EEFTIP) i pokazano je da je uz pomoć ovih arhitektura mogućeostvariti znatna ubrzanja.

Style APA, Harvard, Vancouver, ISO itp.

4

Qureshi, Taimur. "Contributions to decision tree based learning". Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO20051/document.

Pełny tekst źródła

Streszczenie:

Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data learning techniques which aim at producing high-level information, or models, from data. A Typical knowledge discovery process consists of data selection, data preparation, data transformation, data mining and interpretation/validation of the results. Thus, we develop automatic learning techniques which contribute to the data preparation, transformation and mining tasks of knowledge discovery. In doing so, we try to improve the prediction accuracy of the overall learning process. Our work focuses on decision tree based learning and thus, we introduce various preprocessing and transformation techniques such as discretization, fuzzy partitioning and dimensionality reduction to improve this type of learning. However, these techniques can be used in other learning methods e.g. discretization can also be used for naive-bayes classifiers. The data preparation step represents almost 80 percent of the problem and is both time consuming and critical for the quality of modeling. Discretization of continuous features is an important problem that has effects on accuracy, complexity, variance and understandability of the induction models. In this thesis, we propose and develop resampling based aggregation techniques that improve the quality of discretization. Later, we validate by comparing with other discretization techniques and with an optimal partitioning method on 10 benchmark data sets.The second part of our thesis concerns with automatic fuzzy partitioning for soft decision tree induction. Soft or fuzzy decision tree is an extension of the classical crisp tree induction such that fuzzy logic is embedded into the induction process with the effect of more accurate models and reduced variance, but still interpretable and autonomous. We modify the above resampling based partitioning method to generate fuzzy partitions. In addition we propose, develop and validate another fuzzy partitioning method that improves the accuracy of the decision tree.Finally, we adopt a topological learning scheme and perform non-linear dimensionality reduction. We modify an existing manifold learning based technique and see whether it can enhance the predictive power and interpretability of classification
La recherche avancée dans les méthodes d'acquisition de données ainsi que les méthodes de stockage et les technologies d'apprentissage, s'attaquent défi d'automatiser de manière systématique les techniques d'apprentissage de données en vue d'extraire des connaissances valides et utilisables.La procédure de découverte de connaissances s'effectue selon les étapes suivants: la sélection des données, la préparation de ces données, leurs transformation, le fouille de données et finalement l'interprétation et validation des résultats trouvés. Dans ce travail de thèse, nous avons développé des techniques qui contribuent à la préparation et la transformation des données ainsi qu'a des méthodes de fouille des données pour extraire les connaissances. A travers ces travaux, on a essayé d'améliorer l'exactitude de la prédiction durant tout le processus d'apprentissage. Les travaux de cette thèse se basent sur les arbres de décision. On a alors introduit plusieurs approches de prétraitement et des techniques de transformation; comme le discrétisation, le partitionnement flou et la réduction des dimensions afin d'améliorer les performances des arbres de décision. Cependant, ces techniques peuvent être utilisées dans d'autres méthodes d'apprentissage comme la discrétisation qui peut être utilisées pour la classification bayesienne.Dans le processus de fouille de données, la phase de préparation de données occupe généralement 80 percent du temps. En autre, elle est critique pour la qualité de la modélisation. La discrétisation des attributs continus demeure ainsi un problème très important qui affecte la précision, la complexité, la variance et la compréhension des modèles d'induction. Dans cette thèse, nous avons proposes et développé des techniques qui ce basent sur le ré-échantillonnage. Nous avons également étudié d'autres alternatives comme le partitionnement flou pour une induction floue des arbres de décision. Ainsi la logique floue est incorporée dans le processus d'induction pour augmenter la précision des modèles et réduire la variance, en maintenant l'interprétabilité.Finalement, nous adoptons un schéma d'apprentissage topologique qui vise à effectuer une réduction de dimensions non-linéaire. Nous modifions une technique d'apprentissage à base de variété topologiques `manifolds' pour savoir si on peut augmenter la précision et l'interprétabilité de la classification

Style APA, Harvard, Vancouver, ISO itp.

5

Ardeshir, G. "Decision tree simplification for classifier ensembles". Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843022/.

Pełny tekst źródła

Streszczenie:

Design of ensemble classifiers involves three factors: 1) a learning algorithm to produce a classifier (base classifier), 2) an ensemble method to generate diverse classifiers, and 3) a combining method to combine decisions made by base classifiers. With regard to the first factor, a good choice for constructing a classifier is a decision tree learning algorithm. However, a possible problem with this learning algorithm is its complexity which has only been addressed previously in the context of pruning methods for individual trees. Furthermore, the ensemble method may require the learning algorithm to produce a complex classifier. Considering the fact that performance of simplification methods as well as ensemble methods changes from one domain to another, our main contribution is to address a simplification method (post-pruning) in the context of ensemble methods including Bagging, Boosting and Error-Correcting Output Code (ECOC). Using a statistical test, the performance of ensembles made by Bagging, Boosting and ECOC as well as five pruning methods in the context of ensembles is compared. In addition to the implementation a supporting theory called Margin, is discussed and the relationship of Pruning to bias and variance is explained. For ECOC, the effect of parameters such as code length and size of training set on performance of Pruning methods is also studied. Decomposition methods such as ECOC are considered as a solution to reduce complexity of multi-class problems in many real problems such as face recognition. Focusing on the decomposition methods, AdaBoost.OC which is a combination of Boosting and ECOC is compared with the pseudo-loss based version of Boosting, AdaBoost.M2. In addition, the influence of pruning on the performance of ensembles is studied. Motivated by the result that both pruned and unpruned ensembles made by AdaBoost.OC have similar accuracy, pruned ensembles are compared with ensembles of single node decision trees. This results in the hypothesis that ensembles of simple classifiers may give better performance as shown for AdaBoost.OC on the identification problem in face recognition. The implication is that in some problems to achieve best accuracy of an ensemble, it is necessary to select base classifier complexity.

Style APA, Harvard, Vancouver, ISO itp.

6

Ahmad, Amir. "Data Transformation for Decision Tree Ensembles". Thesis, University of Manchester, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.508528.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

7

Cai, Jingfeng. "Decision Tree Pruning Using Expert Knowledge". University of Akron / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=akron1158279616.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

8

Wu, Shuning. "Optimal instance selection for improved decision tree". [Ames, Iowa : Iowa State University], 2007.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

Sinnamon, Roslyn M. "Binary decision diagrams for fault tree analysis". Thesis, Loughborough University, 1996. https://dspace.lboro.ac.uk/2134/7424.

Pełny tekst źródła

Streszczenie:

This thesis develops a new approach to fault tree analysis, namely the Binary Decision Diagram (BDD) method. Conventional qualitative fault tree analysis techniques such as the "top-down" or "bottom-up" approaches are now so well developed that further refinement is unlikely to result in vast improvements in terms of their computational capability. The BDD method has exhibited potential gains to be made in terms of speed and efficiency in determining the minimal cut sets. Further, the nature of the binary decision diagram is such that it is more suited to Boolean manipulation. The BDD method has been programmed and successfully applied to a number of benchmark fault trees. The analysis capabilities of the technique have been extended such that all quantitative fault tree top event parameters, which can be determined by conventional Kinetic Tree Theory, can now be derived directly from the BDD. Parameters such as the top event probability, frequency of occurrence and expected number of occurrences can be calculated exactly using this method, removing the need for the approximations previously required. Thus the BDD method is proven to have advantages in terms of both accuracy and efficiency. Initiator/enabler event analysis and importance measures have been incorporated to extend this method into a full analysis procedure.

Style APA, Harvard, Vancouver, ISO itp.

10

Ho, Colin Kok Meng. "Discretization and defragmentation for decision tree learning". Thesis, University of Essex, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.299072.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

11

Kassim, M. E. "Elliptical cost-sensitive decision tree algorithm (ECSDT)". Thesis, University of Salford, 2018. http://usir.salford.ac.uk/47191/.

Pełny tekst źródła

Streszczenie:

Cost-sensitive multiclass classification problems, in which the task of assessing the impact of the costs associated with different misclassification errors, continues to be one of the major challenging areas for data mining and machine learning. The literature reviews in this area show that most of the cost-sensitive algorithms that have been developed during the last decade were developed to solve binary classification problems where an example from the dataset will be classified into only one of two available classes. Much of the research on cost-sensitive learning has focused on inducing decision trees, which are one of the most common and widely used classification methods, due to the simplicity of constructing them, their transparency and comprehensibility. A review of the literature shows that inducing nonlinear multiclass cost-sensitive decision trees is still in its early stages and further research could result in improvements over the current state of the art. Hence, this research aims to address the following question: 'How can non-linear regions be identified for multiclass problems and utilized to construct decision trees so as to maximize the accuracy of classification, and minimize misclassification costs?' This research addresses this problem by developing a new algorithm called the Elliptical Cost-Sensitive Decision Tree algorithm (ECSDT) that induces cost-sensitive non-linear (elliptical) decision trees for multiclass classification problems using evolutionary optimization methods such as particle swarm optimization (PSO) and Genetic Algorithms (GAs). In this research, ellipses are used as non-linear separators, because of their simplicity and flexibility in drawing non-linear boundaries by modifying and adjusting their size, location and rotation towards achieving optimal results. The new algorithm was developed, tested, and evaluated in three different settings, each with a different objective function. The first considered maximizing the accuracy of classification only; the second focused on minimizing misclassification costs only, while the third considered both accuracy and misclassification cost together. ECSDT was applied to fourteen different binary-class and multiclass data sets and the results have been compared with those obtained by applying some common algorithms from Weka to the same datasets such as J48, NBTree, MetaCost, and the CostSensitiveClassifier. The primary contribution of this research is the development of a new algorithm that shows the benefits of utilizing elliptical boundaries for cost-sensitive decision tree learning. The new algorithm is capable of handling multiclass problems and an empirical evaluation shows good results. More specifically, when considering accuracy only, ECSDT performs better in terms of maximizing accuracy on 10 out of the 14 datasets, and when considering minimizing misclassification costs only, ECSDT performs better on 10 out of the 14 datasets, while when considering both accuracy and misclassification costs, ECSDT was able to obtain higher accuracy on 10 out of the 14 datasets and minimize misclassification costs on 5 out of the 14 datasets. The ECSDT also was able to produce smaller trees when compared with J48, LADTree and ADTree.

Style APA, Harvard, Vancouver, ISO itp.

12

Yedida, Venkata Rama Kumar Swamy. "Protein Function Prediction Using Decision Tree Technique". University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1216313412.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

13

Badulescu, Laviniu Aurelian. "ATTRIBUTE SELECTION MEASURE IN DECISION TREE GROWING". Universitaria Publishing House, 2007. http://hdl.handle.net/10150/105610.

Pełny tekst źródła

Streszczenie:

One of the major tasks in Data Mining is classification. The growing of Decision Tree from data is a very efficient technique for learning classifiers. The selection of an attribute used to split the data set at each Decision Tree node is fundamental to properly classify objects; a good selection will improve the accuracy of the classification. In this paper, we study the behavior of the Decision Trees induced with 14 attribute selection measures over three data sets taken from UCI Machine Learning Repository.

Style APA, Harvard, Vancouver, ISO itp.

14

Barros, Rodrigo Coelho. "On the automatic design of decision-tree induction algorithms". Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-21032014-144814/.

Pełny tekst źródła

Streszczenie:

Decision-tree induction is one of the most employed methods to extract knowledge from data. There are several distinct strategies for inducing decision trees from data, each one presenting advantages and disadvantages according to its corresponding inductive bias. These strategies have been continuously improved by researchers over the last 40 years. This thesis, following recent breakthroughs in the automatic design of machine learning algorithms, proposes to automatically generate decision-tree induction algorithms. Our proposed approach, namely HEAD-DT, is based on the evolutionary algorithms paradigm, which improves solutions based on metaphors of biological processes. HEAD-DT works over several manually-designed decision-tree components and combines the most suitable components for the task at hand. It can operate according to two different frameworks: i) evolving algorithms tailored to one single data set (specific framework); and ii) evolving algorithms from multiple data sets (general framework). The specific framework aims at generating one decision-tree algorithm per data set, so the resulting algorithm does not need to generalise beyond its target data set. The general framework has a more ambitious goal, which is to generate a single decision-tree algorithm capable of being effectively applied to several data sets. The specific framework is tested over 20 UCI data sets, and results show that HEAD-DTs specific algorithms outperform algorithms like CART and C4.5 with statistical significance. The general framework, in turn, is executed under two different scenarios: i) designing a domain-specific algorithm; and ii) designing a robust domain-free algorithm. The first scenario is tested over 35 microarray gene expression data sets, and results show that HEAD-DTs algorithms consistently outperform C4.5 and CART in different experimental configurations. The second scenario is tested over 67 UCI data sets, and HEAD-DTs algorithms were shown to be competitive with C4.5 and CART. Nevertheless, we show that HEAD-DT is prone to a special case of overfitting when it is executed under the second scenario of the general framework, and we point to possible alternatives for solving this problem. Finally, we perform an extensive experiment for evaluating the best single-objective fitness function for HEAD-DT, combining 5 classification performance measures with three aggregation schemes. We evaluate the 15 fitness functions in 67 UCI data sets, and the best of them are employed to generate algorithms tailored to balanced and imbalanced data. Results show that the automatically-designed algorithms outperform CART and C4.5 with statistical significance, indicating that HEAD-DT is also capable of generating custom algorithms for data with a particular kind of statistical profile
Árvores de decisão são amplamente utilizadas como estratégia para extração de conhecimento de dados. Existem muitas estratégias diferentes para indução de árvores de decisão, cada qual com suas vantagens e desvantagens tendo em vista seu bias indutivo. Tais estratégias têm sido continuamente melhoradas por pesquisadores nos últimos 40 anos. Esta tese, em sintonia com recentes descobertas no campo de projeto automático de algoritmos de aprendizado de máquina, propõe a geração automática de algoritmos de indução de árvores de decisão. A abordagem proposta, chamada de HEAD-DT, é baseada no paradigma de algoritmos evolutivos. HEAD-DT evolui componentes de árvores de decisão que foram manualmente codificados e os combina da forma mais adequada ao problema em questão. HEAD-DT funciona conforme dois diferentes frameworks: i) evolução de algoritmos customizados para uma única base de dados (framework específico); e ii) evolução de algoritmos a partir de múltiplas bases (framework geral). O framework específico tem por objetivo gerar um algoritmo por base de dados, de forma que o algoritmo projetado não necessite de poder de generalização que vá além da base alvo. O framework geral tem um objetivo mais ambicioso: gerar um único algoritmo capaz de ser efetivamente executado em várias bases de dados. O framework específico é testado em 20 bases públicas da UCI, e os resultados mostram que os algoritmos específicos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor do que algoritmos como CART e C4.5. O framework geral é executado em dois cenários diferentes: i) projeto de algoritmo específico a um domínio de aplicação; e ii) projeto de um algoritmo livre-de-domínio, robusto a bases distintas. O primeiro cenário é testado em 35 bases de expressão gênica, e os resultados mostram que o algoritmo gerado por HEAD-DT consistentemente supera CART e C4.5 em diferentes configurações experimentais. O segundo cenário é testado em 67 bases de dados da UCI, e os resultados mostram que o algoritmo gerado por HEAD-DT é competitivo com CART e C4.5. No entanto, é mostrado que HEAD-DT é vulnerável a um caso particular de overfitting quando executado sobre o segundo cenário do framework geral, e indica-se assim possíveis soluções para tal problema. Por fim, é realizado uma análise detalhada para avaliação de diferentes funções de fitness de HEAD-DT, onde 5 medidas de desempenho são combinadas com três esquemas de agregação. As 15 versões são avaliadas em 67 bases da UCI e as melhores versões são utilizadas para geração de algoritmos customizados para bases balanceadas e desbalanceadas. Os resultados mostram que os algoritmos gerados por HEAD-DT apresentam desempenho preditivo significativamente melhor que CART e C4.5, em uma clara indicação que HEAD-DT também é capaz de gerar algoritmos customizados para certo perfil estatístico dos dados de classificação

Style APA, Harvard, Vancouver, ISO itp.

15

Tsang, Pui-kwan Smith, i 曾沛坤. "Efficient decision tree building algorithms for uncertain data". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2008. http://hub.hku.hk/bib/B41290719.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

16

Reay, Karen A. "Efficient fault tree analysis using binary decision diagrams". Thesis, Loughborough University, 2002. https://dspace.lboro.ac.uk/2134/7579.

Pełny tekst źródła

Streszczenie:

The Binary Decision Diagram (BDD) method has emerged as an alternative to conventional techniques for performing both qualitative and quantitative analysis of fault trees. BDDs are already proving to be of considerable use in reliability analysis, providing a more efficient means of analysing a system, without the need for the approximations previously used in the traditional approach of Kinetic Tree Theory. In order to implement this technique, a BDD must be constructed from the fault tree, according to some ordering of the fault tree variables. The selected variable ordering has a crucial effect on the resulting BDD size and the number of calculations required for its construction; a bad choice of ordering can lead to excessive calculations and a BDD many orders of magnitude larger than one obtained using an ordering more suited to the tree. Within this thesis a comparison is made of the effectiveness of several ordering schemes, some of which have not previously been investigated. Techniques are then developed for the efficient construction of BDDs from fault trees. The method of Faunet reduction is applied to a set of fault trees and is shown to significantly reduce the size of the resulting BDDs. The technique is then extended to incorporate an additional stage that results in further improvements in BDD size. A fault tree analysis strategy is proposed that increases the likelihood of obtaining a BDD for any given fault tree. This method implements simplification techniques, which are applied to the fault tree to obtain a set of concise and independent subtrees, equivalent to the original fault tree structure. BDDs are constructed for each subtree and the quantitative analysis is developed for the set of BDDs to obtain the top event parameters and the event criticality functions.

Style APA, Harvard, Vancouver, ISO itp.

17

Федоров, Д. П. "Comparison of classifiers based on the decision tree". Thesis, ХНУРЕ, 2021. https://openarchive.nure.ua/handle/document/16430.

Pełny tekst źródła

Streszczenie:

The main purpose of this work is to compare classifiers. Random Forest and XGBoost are two popular machine learning algorithms. In this paper, we looked at how they work, compared their features, and obtained accurate results from their robots.

Style APA, Harvard, Vancouver, ISO itp.

18

Igboamalu, Frank Nonso. "Decision tree classifiers for incident call data sets". Master's thesis, University of Cape Town, 2017. http://hdl.handle.net/11427/27076.

Pełny tekst źródła

Streszczenie:

Information technology (IT) has become one of the key technologies for economic and social development in any organization. Therefore the management of Information technology incidents, and particularly in the area of resolving the problem very fast, is of concern to Information technology managers. Delays can result when incorrect subjects are assigned to Information technology incident calls: because the person sent to remedy the problem has the wrong expertise or has not brought with them the software or hardware they need to help that user. In the case study used for this work, there are no management checks in place to verify the assigning of incident description subjects. This research aims to develop a method that will tackle the problem of wrongly assigned subjects for incident descriptions. In particular, this study explores the Information technology incident calls database of an oil and gas company as a case study. The approach was to explore the Information technology incident descriptions and their assigned subjects; thereafter the correctly-assigned records were used for training decision tree classification algorithms using Waikato Environment for Knowledge Analysis (WEKA) software. Finally, the records incorrectly assigned a subject by human operators were used for testing. The J48 algorithm gave the best performance and accuracy, and was able to correctly assign subjects to 81% of the records wrongly classified by human operators.

Style APA, Harvard, Vancouver, ISO itp.

19

Yenco, Aileen C. "Decision Tree for Ground Improvement in Transportation Applications". University of Akron / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=akron1384435786.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

20

Tsang, Pui-kwan Smith. "Efficient decision tree building algorithms for uncertain data". Click to view the E-thesis via HKUTO, 2008. http://sunzi.lib.hku.hk/hkuto/record/B41290719.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

21

Shah, Hamzei G. Hossein. "Decision tree learning for intelligent mobile robot navigation". Thesis, Loughborough University, 1998. https://dspace.lboro.ac.uk/2134/6968.

Pełny tekst źródła

Streszczenie:

The replication of human intelligence, learning and reasoning by means of computer algorithms is termed Artificial Intelligence (Al) and the interaction of such algorithms with the physical world can be achieved using robotics. The work described in this thesis investigates the applications of concept learning (an approach which takes its inspiration from biological motivations and from survival instincts in particular) to robot control and path planning. The methodology of concept learning has been applied using learning decision trees (DTs) which induce domain knowledge from a finite set of training vectors which in turn describe systematically a physical entity and are used to train a robot to learn new concepts and to adapt its behaviour. To achieve behaviour learning, this work introduces the novel approach of hierarchical learning and knowledge decomposition to the frame of the reactive robot architecture. Following the analogy with survival instincts, the robot is first taught how to survive in very simple and homogeneous environments, namely a world without any disturbances or any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex environments by adding further worlds to its existing knowledge. The repertoire of the robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered decision trees (DTs) accommodating a number of primitives. To classify robot perceptions, control rules are synthesised using symbolic knowledge derived from searching the hierarchy of DTs. A second novel concept is introduced, namely that of multi-dimensional fuzzy associative memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs. In this thesis, the feasibility of the developed techniques is illustrated in the robot applications, their benefits and drawbacks are discussed.

Style APA, Harvard, Vancouver, ISO itp.

22

Wickramarachchi, Darshana Chitraka. "Oblique decision trees in transformed spaces". Thesis, University of Canterbury. Mathematics and Statistics, 2015. http://hdl.handle.net/10092/11051.

Pełny tekst źródła

Streszczenie:

Decision trees (DTs) play a vital role in statistical modelling. Simplicity and interpretability of the solution structure have made the method popular in a wide range of disciplines. In data classification problems, DTs recursively partition the feature space into disjoint sub-regions until each sub-region becomes homogeneous with respect to a particular class. Axis parallel splits, the simplest form of splits, partition the feature space parallel to feature axes. However, for some problem domains DTs with axis parallel splits can produce complicated boundary structures. As an alternative, oblique splits are used to partition the feature space potentially simplifying the boundary structure. Various approaches have been explored to find optimal oblique splits. One approach is based on optimisation techniques. This is considered the benchmark approach, however, its major limitation is that the tree induction algorithm is computationally expensive. On the other hand, split finding approaches based on heuristic arguments have gained popularity and have made improvements on benchmark methods. This thesis proposes a methodology to induce oblique decision trees in transformed spaces based on a heuristic argument. As the first goal of the thesis, a new oblique decision tree algorithm, called HHCART (\underline{H}ouse\underline{H}older \underline{C}lassification and \underline{R}egression \underline{T}ree) is proposed. The proposed algorithm utilises a series of Householder matrices to reflect the training data at each non-terminal node during the tree construction. Householder matrices are constructed using the eigenvectors from each classes' covariance matrix. Axis parallel splits in the reflected (or transformed) spaces provide an efficient way of finding oblique splits in the original space. Experimental results show that the accuracy and size of the HHCART trees are comparable with some benchmark methods in the literature. The appealing features of HHCART is that it can handle both qualitative and quantitative features in the same oblique split, conceptually simple and computationally efficient. Data mining applications often come with massive example sets and inducing oblique DTs for such example sets often consumes considerable time. HHCART is a serial computing memory resident algorithm which may be ineffective when handling massive example sets. As the second goal of the thesis parallel computing and disk resident versions of the HHCART algorithm are presented so that HHCART can be used irrespective of the size of the problem. HHCART is a flexible algorithm and the eigenvectors defining Householder matrices can be replaced by other vectors deemed effective in oblique split finding. The third endeavour of this thesis explores this aspect of HHCART. HHCART can be used with other vectors in order to improve classification results. For example, a normal vector of the angular bisector, introduced in the Geometric Decision Tree (GDT) algorithm, is used to construct the Householder reflection matrix. The proposed method produces better results than GDT for some problem domains. In the second case, \textit{Class Representative Vectors} are introduced and used to construct Householder reflection matrices. The results of this experiment show that these oblique trees produce classification results competitive with those achieved with some benchmark decision trees. DTs are constructed using two approaches, namely: top-down and bottom-up. HHCART is a top-down tree, which is the most common approach. As the fourth idea of the thesis, the concept of HHCART is used to induce a new DT, HHBUT, using the bottom-up approach. The bottom-up approach performs cluster analysis prior to the tree building to identify the terminal nodes. The use of the Bayesian Information Criterion (BIC) to determine the number of clusters leads to accurate and compact trees when compared with Cross Validation (CV) based bottom-up trees. We suggest that HHBUT is a good alternative to the existing bottom-up tree especially when the number of examples is much higher than the number of features.

Style APA, Harvard, Vancouver, ISO itp.

23

Zhou, Guoqing. "Co-Location Decision Tree for Enhancing Decision-Making of Pavement Maintenance and Rehabilitation". Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/26059.

Pełny tekst źródła

Streszczenie:

A pavement management system (PMS) is a valuable tool and one of the critical elements of the highway transportation infrastructure. Since a vast amount of pavement data is frequently and continuously being collected, updated, and exchanged due to rapidly deteriorating road conditions, increased traffic loads, and shrinking funds, resulting in the rapid accumulation of a large pavement database, knowledge-based expert systems (KBESs) have therefore been developed to solve various transportation problems. This dissertation presents the development of theory and algorithm for a new decision tree induction method, called co-location-based decision tree (CL-DT.) This method will enhance the decision-making abilities of pavement maintenance personnel and their rehabilitation strategies. This idea stems from shortcomings in traditional decision tree induction algorithms, when applied in the pavement treatment strategies. The proposed algorithm utilizes the co-location (co-occurrence) characteristics of spatial attribute data in the pavement database. With the proposed algorithm, one distinct event occurrence can associate with two or multiple attribute values that occur simultaneously in spatial and temporal domains. This research dissertation describes the details of the proposed CL-DT algorithms and steps of realizing the proposed algorithm. First, the dissertation research describes the detailed colocation mining algorithm, including spatial attribute data selection in pavement databases, the determination of candidate co-locations, the determination of table instances of candidate colocations, pruning the non-prevalent co-locations, and induction of co-location rules. In this step, a hybrid constraint, i.e., spatial geometric distance constraint condition and a distinct event-type constraint condition, is developed. The spatial geometric distance constraint condition is a neighborhood relationship-based spatial joins of table instances for many prevalent co-locations with one prevalent co-location; and the distance event-type constraint condition is a Euclidean distance between a set of attributes and its corresponding clusters center of attributes. The dissertation research also developed the spatial feature pruning method using the multi-resolution pruning criterion. The cross-correlation criterion of spatial features is used to remove the nonprevalent co-locations from the candidate prevalent co-location set under a given threshold. The dissertation research focused on the development of the co-location decision tree (CL-DT) algorithm, which includes the non-spatial attribute data selection in the pavement management database, co-location algorithm modeling, node merging criteria, and co-location decision tree induction. In this step, co-location mining rules are used to guide the decision tree generation and induce decision rules. For each step, this dissertation gives detailed flowcharts, such as flowchart of co-location decision tree induction, co-location/co-occurrence decision tree algorithm, algorithm of colocation/co-occurrence decision tree (CL-DT), and outline of steps of SFS (Sequential Feature Selection) algorithm. Finally, this research used a pavement database covering four counties, which are provided by NCDOT (North Carolina Department of Transportation), to verify and test the proposed method. The comparison analyses of different rehabilitation treatments proposed by NCDOT, by the traditional DT induction algorithm and by the proposed new method are conducted. Findings and conclusions include: (1) traditional DT technology can make a consistent decision for road maintenance and rehabilitation strategy under the same road conditions, i.e., less interference from human factors; (2) the traditional DT technology can increase the speed of decision-making because the technology automatically generates a decision-tree and rules if the expert knowledge is given, which saves time and expenses for PMS; (3) integration of the DT and GIS can provide the PMS with the capabilities of graphically displaying treatment decisions, visualizing the attribute and non-attribute data, and linking data and information to the geographical coordinates. However, the traditional DT induction methods are not as quite intelligent as oneâ s expectations. Thus, post-processing and refinement is necessary. Moreover, traditional DT induction methods for pavement M&R strategies only used the non-spatial attribute data. It has been demonstrated from this dissertation research that the spatial data is very useful for the improvement of decision-making processes for pavement treatment strategies. In addition, the decision trees are based on the knowledge acquired from pavement management engineers for strategy selection. Thus, different decision-trees can be built if the requirement changes.
Ph. D.

Style APA, Harvard, Vancouver, ISO itp.

24

Chang, Namsik. "Knowledge discovery in databases with joint decision outcomes: A decision-tree induction approach". Diss., The University of Arizona, 1995. http://hdl.handle.net/10150/187227.

Pełny tekst źródła

Streszczenie:

Inductive symbolic learning algorithms have been used successfully over the years to build knowledge-based systems. One of these, a decision-tree induction algorithm, has formed the central component in several commercial packages because of its particular efficiency, simplicity, and popularity. However, the decision-tree induction algorithms developed thus far are limited to domains where each decision instance's outcome belongs to only a single decision outcome class. Their goal is merely to specify the properties necessary to distinguish instances pertaining to different decision outcome classes. These algorithms are not readily applicable to many challenging new types of applications in which decision instances have outcomes belonging to more than one decision outcome class (i.e., joint decision outcomes). Furthermore, when applied to domains with a single decision outcome, these algorithms become less efficient as the number of the pre-defined outcome classes increases. The objective of this dissertation is to modify previous decision-tree induction techniques in order to apply them to applications with joint decision outcomes. We propose a new decision-tree induction approach called the Multi-Decision-Tree Induction (MDTI) approach. Data was collected for a patient image retrieval application where more than one prior radiological examination would be retrieved based on characteristics of the current examination and patient status. We present empirical comparisons of the MDTI approach with the Backpropagation network algorithm and the traditional knowledge-engineer-driven knowledge acquisition approach, using the same set of cases. These comparisons are made in terms of recall rate, precision rate, average number of prior examinations suggested, and understandability of the acquired knowledge. The results show that the MDTI approach outperforms the Backpropagation network algorithms and is comparable to the traditional approach in all performance measures considered, while requiring much less learning time than either approach. To gain analytical and empirical insights into MDTI, we have compared this approach with the two best known symbolic learning algorithms (i.e., ID3 and AQ) using data domains with a single decision outcome. It has been found analytically that rules generated by the MDTI approach are more general and supported by more instances in the training set. Four empirical experiments have supported the findings.

Style APA, Harvard, Vancouver, ISO itp.

25

Hari, Vijaya. "Empirical Investigation of CART and Decision Tree Extraction from Neural Networks". Ohio University / OhioLINK, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1235676338.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Flöter, André. "Analyzing biological expression data based on decision tree induction". [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=978444728.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

27

Rangwala, Maimuna H. "Empirical investigation of decision tree extraction from neural networks". Ohio : Ohio University, 2006. http://www.ohiolink.edu/etd/view.cgi?ohiou1151608193.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

28

Flöter, André. "Analyzing biological expression data based on decision tree induction". Phd thesis, Universität Potsdam, 2005. http://opus.kobv.de/ubp/volltexte/2006/641/.

Pełny tekst źródła

Streszczenie:

Modern biological analysis techniques supply scientists with various forms of data. One category of such data are the so called "expression data". These data indicate the quantities of biochemical compounds present in tissue samples.

Recently, expression data can be generated at a high speed. This leads in turn to amounts of data no longer analysable by classical statistical techniques. Systems biology is the new field that focuses on the modelling of this information.

At present, various methods are used for this purpose. One superordinate class of these methods is machine learning. Methods of this kind had, until recently, predominantly been used for classification and prediction tasks. This neglected a powerful secondary benefit: the ability to induce interpretable models.

Obtaining such models from data has become a key issue within Systems biology. Numerous approaches have been proposed and intensively discussed. This thesis focuses on the examination and exploitation of one basic technique: decision trees.

The concept of comparing sets of decision trees is developed. This method offers the possibility of identifying significant thresholds in continuous or discrete valued attributes through their corresponding set of decision trees. Finding significant thresholds in attributes is a means of identifying states in living organisms. Knowing about states is an invaluable clue to the understanding of dynamic processes in organisms. Applied to metabolite concentration data, the proposed method was able to identify states which were not found with conventional techniques for threshold extraction.

A second approach exploits the structure of sets of decision trees for the discovery of combinatorial dependencies between attributes. Previous work on this issue has focused either on expensive computational methods or the interpretation of single decision trees a very limited exploitation of the data. This has led to incomplete or unstable results. That is why a new method is developed that uses sets of decision trees to overcome these limitations.

Both the introduced methods are available as software tools. They can be applied consecutively or separately. That way they make up a package of analytical tools that usefully supplement existing methods.

By means of these tools, the newly introduced methods were able to confirm existing knowledge and to suggest interesting and new relationships between metabolites.

Neuere biologische Analysetechniken liefern Forschern verschiedenste Arten von Daten. Eine Art dieser Daten sind die so genannten "Expressionsdaten". Sie geben die Konzentrationen biochemischer Inhaltsstoffe in Gewebeproben an.

Neuerdings können Expressionsdaten sehr schnell erzeugt werden. Das führt wiederum zu so großen Datenmengen, dass sie nicht mehr mit klassischen statistischen Verfahren analysiert werden können. "System biology" ist eine neue Disziplin, die sich mit der Modellierung solcher Information befasst.

Zur Zeit werden dazu verschiedenste Methoden benutzt. Eine Superklasse dieser Methoden ist das maschinelle Lernen. Dieses wurde bis vor kurzem ausschließlich zum Klassifizieren und zum Vorhersagen genutzt. Dabei wurde eine wichtige zweite Eigenschaft vernachlässigt, nämlich die Möglichkeit zum Erlernen von interpretierbaren Modellen.

Die Erstellung solcher Modelle hat mittlerweile eine Schlüsselrolle in der "Systems biology" erlangt. Es sind bereits zahlreiche Methoden dazu vorgeschlagen und diskutiert worden. Die vorliegende Arbeit befasst sich mit der Untersuchung und Nutzung einer ganz grundlegenden Technik: den Entscheidungsbäumen.

Zunächst wird ein Konzept zum Vergleich von Baummengen entwickelt, welches das Erkennen bedeutsamer Schwellwerte in reellwertigen Daten anhand ihrer zugehörigen Entscheidungswälder ermöglicht. Das Erkennen solcher Schwellwerte dient dem Verständnis von dynamischen Abläufen in lebenden Organismen. Bei der Anwendung dieser Technik auf metabolische Konzentrationsdaten wurden bereits Zustände erkannt, die nicht mit herkömmlichen Techniken entdeckt werden konnten.

Ein zweiter Ansatz befasst sich mit der Auswertung der Struktur von Entscheidungswäldern zur Entdeckung von kombinatorischen Abhängigkeiten zwischen Attributen. Bisherige Arbeiten hierzu befassten sich vornehmlich mit rechenintensiven Verfahren oder mit einzelnen Entscheidungsbäumen, eine sehr eingeschränkte Ausbeutung der Daten. Das führte dann entweder zu unvollständigen oder instabilen Ergebnissen. Darum wird hier eine Methode entwickelt, die Mengen von Entscheidungsbäumen nutzt, um diese Beschränkungen zu überwinden.

Beide vorgestellten Verfahren gibt es als Werkzeuge für den Computer, die entweder hintereinander oder einzeln verwendet werden können. Auf diese Weise stellen sie eine sinnvolle Ergänzung zu vorhandenen Analyswerkzeugen dar.

Mit Hilfe der bereitgestellten Software war es möglich, bekanntes Wissen zu bestätigen und interessante neue Zusammenhänge im Stoffwechsel von Pflanzen aufzuzeigen.

Style APA, Harvard, Vancouver, ISO itp.

29

Gao, Ying. "using decision tree to analyze the turnover of employees". Thesis, Uppsala universitet, Institutionen för informatik och media, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325113.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Sjunnebo, Joakim. "Application of the Boosted Decision Tree Algorithmto Waveform Discrimination". Thesis, KTH, Fysik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-129408.

Pełny tekst źródła

Streszczenie:

The Polarised Gamma-ray Observer (PoGOLite) is a balloon-borne experiment aimed at measuring the polarisation of hard X-rays from astronomical sources. In the planned flight environment the neutron background is high. A smaller version of PoGOLite, named PoGOLino, was constructed with the goal of measuring the neutron background rates and was launched in March 2013. The signals produced in the detectors of both these instruments give rise to waveforms of different shapes depending on the type of detector the interaction occurred in. A method to distinguish between signal and background waveforms based on their shape has been developed. This was done using a machine learning algorithm called boosted decision trees, implemented in the software package Toolkit for Multivariate Data Analysis (TMVA). By constructing new discriminating variables the classification efficiency was improved. The developed classification will be applied to the measurements taken during the 2013 flight of PoGOLino and the method can also be used for the data analysis of future PoGOLite measurements.

Style APA, Harvard, Vancouver, ISO itp.

31

SOBRAL, ANA PAULA BARBOSA. "HOURLY LOAD FORECASTING A NEW APPROACH THROUGH DECISION TREE". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2003. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=3710@1.

Pełny tekst źródła

Streszczenie:

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
A importância da previsão de carga a curto prazo (até uma semana à frente) em crescido recentemente. Com os processos de privatização e implantação de ompetição no setor elétrico brasileiro, a previsão de tarifas de energia vai se tornar extremamente importante. As previsões das cargas elétricas são fundamentais para alimentar as ferramentas analíticas utilizadas na sinalização das tarifas. Em conseqüência destas mudanças estruturais no setor, a variabilidade e a não-estacionaridade das cargas elétricas tendem a aumentar devido à dinâmica dos preços da energia. Em função das mudanças estruturais do setor elétrico, previsores mais autônomos são necessários para o novo cenário que se aproxima. As ferramentas disponíveis no mercado internacional para previsão de carga elétrica requerem uma quantidade significativa de informações on-line, principalmente no que se refere a dados meteorológicos. Como a realidade brasileira ainda não permite o acesso a essas informações será proposto um previsor de carga para o curto-prazo, considerando restrições na aquisição dos dados de temperatura. Logo, tem-se como proposta um modelo de previsão de carga horária de curto prazo (um dia a frente) empregando dados de carga elétrica e dados meteorológicos (temperatura) através de modelos de árvore de decisão. Decidiu-se pelo modelo de árvore de decisão, pois este modelo além de apresentar uma grande facilidade de interpretação dos resultados, apresenta pouquíssima ênfase em sua utilização na área de previsão de carga elétrica.
The importance of load forecasting for the short term (up to one-week ahead) has been steadily growing in the last years. Load forecasts are the basis for the forecasting of energy prices, and the privatisation, and the introduction of competitiveness in the Brazilian electricity sector, have turned price forecasting into an extremely important task. As a consequence of structural changes in the electricity sector, the variability and the non-stationarity of the electrical loads have tended to increase, because of the dynamics of the energy prices. As a consequence of these structural changes, new forecasting methods are needed to meet the new scenarios. The tools that are available for load forecasting in the international market require a large amount of online information, specially information about weather data. Since this information is not yet readily available in Brazil, this thesis proposes a short-term load forecaster that takes into consideration the restrictions in the acquisition of temperature data. A short-term (one-day ahead) forecaster of hourly loads is proposed that combines load data and weather data (temperature), by means of decision tree models. Decision trees were chosen because those models, despite being easy to interpret, have been very rarely used for load forecasting.

Style APA, Harvard, Vancouver, ISO itp.

32

MARQUES, DANIEL DOS SANTOS. "A DECISION TREE LEARNER FOR COST-SENSITIVE BINARY CLASSIFICATION". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28239@1.

Pełny tekst źródła

Streszczenie:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Problemas de classificação foram amplamente estudados na literatura de aprendizado de máquina, gerando aplicações em diversas áreas. No entanto, em diversos cenários, custos por erro de classificação podem variar bastante, o que motiva o estudo de técnicas de classificação sensível ao custo. Nesse trabalho, discutimos o uso de árvores de decisão para o problema mais geral de Aprendizado Sensível ao Custo do Exemplo (ASCE), onde os custos dos erros de classificação variam com o exemplo. Uma das grandes vantagens das árvores de decisão é que são fáceis de interpretar, o que é uma propriedade altamente desejável em diversas aplicações. Propomos um novo método de seleção de atributos para construir árvores de decisão para o problema ASCE e discutimos como este pode ser implementado de forma eficiente. Por fim, comparamos o nosso método com dois outros algoritmos de árvore de decisão propostos recentemente na literatura, em 3 bases de dados públicas.
Classification problems have been widely studied in the machine learning literature, generating applications in several areas. However, in a number of scenarios, misclassification costs can vary substantially, which motivates the study of Cost-Sensitive Learning techniques. In the present work, we discuss the use of decision trees on the more general Example-Dependent Cost-Sensitive Problem (EDCSP), where misclassification costs vary with each example. One of the main advantages of decision trees is that they are easy to interpret, which is a highly desirable property in a number of applications. We propose a new attribute selection method for constructing decision trees for the EDCSP and discuss how it can be efficiently implemented. Finally, we compare our new method with two other decision tree algorithms recently proposed in the literature, in 3 publicly available datasets.

Style APA, Harvard, Vancouver, ISO itp.

33

Булах, В. А., Л. О. Кіріченко i Т. А. Радівілова. "Classification of Multifractal Time Series by Decision Tree Methods". Thesis, КНУ, 2018. http://openarchive.nure.ua/handle/document/5840.

Pełny tekst źródła

Streszczenie:

The article considers classification task of model fractal time series by the methods of machine learning. To classify the series, it is proposed to use the meta algorithms based on decision trees. To modeling the fractal time series, binomial stochastic cascade processes are used. Classification of time series by the ensembles of decision trees models is carried out. The analysis indicates that the best results are obtained by the methods of bagging and random forest which use regression trees.

Style APA, Harvard, Vancouver, ISO itp.

34

Assareh, Amin. "OPTIMIZING DECISION TREE ENSEMBLES FOR GENE-GENE INTERACTION DETECTION". Kent State University / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=kent1353971575.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Azad, Mohammad. "Decision and Inhibitory Trees for Decision Tables with Many-Valued Decisions". Diss., 2018. http://hdl.handle.net/10754/628023.

Pełny tekst źródła

Streszczenie:

Decision trees are one of the most commonly used tools in decision analysis, knowledge representation, machine learning, etc., for its simplicity and interpretability. We consider an extension of dynamic programming approach to process the whole set of decision trees for the given decision table which was previously only attainable by brute-force algorithms. We study decision tables with many-valued decisions (each row may contain multiple decisions) because they are more reasonable models of data in many cases. To address this problem in a broad sense, we consider not only decision trees but also inhibitory trees where terminal nodes are labeled with “̸= decision”. Inhibitory trees can sometimes describe more knowledge from datasets than decision trees. As for cost functions, we consider depth or average depth to minimize time complexity of trees, and the number of nodes or the number of the terminal, or nonterminal nodes to minimize the space complexity of trees. We investigate the multi-stage optimization of trees relative to some cost functions, and also the possibility to describe the whole set of strictly optimal trees. Furthermore, we study the bi-criteria optimization cost vs. cost and cost vs. uncertainty for decision trees, and cost vs. cost and cost vs. completeness for inhibitory trees. The most interesting application of the developed technique is the creation of multi-pruning and restricted multi-pruning approaches which are useful for knowledge representation and prediction. The experimental results show that decision trees constructed by these approaches can often outperform the decision trees constructed by the CART algorithm. Another application includes the comparison of 12 greedy heuristics for single- and bi-criteria optimization (cost vs. cost) of trees. We also study the three approaches (decision tables with many-valued decisions, decision tables with most common decisions, and decision tables with generalized decisions) to handle inconsistency of decision tables. We also analyze the time complexity of decision and inhibitory trees over arbitrary sets of attributes represented by information systems in the frameworks of local (when we can use in trees only attributes from problem description) and global (when we can use in trees arbitrary attributes from the information system) approaches.

Style APA, Harvard, Vancouver, ISO itp.

36

Boz, Olcay. "Converting a trained neural network to a decision tree dectext-decision tree extractor /". Diss., 2000. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:9982861.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

37

YU, CHIH-FENG, i 余致鋒. "Application of Decision Tree C5.0 to Fund Decision". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/y98nsm.

Pełny tekst źródła

Streszczenie:

碩士
國立嘉義大學
企業管理學系
106
In recent years, financial literacy of citizens has been improving. Furthermore, financial investment channels have likewise multiplied. Most investment tools all need a lot of financial know-how in order to obtain steady profits. Compared to other financial tools, mutual fund risks and barriers to entry are relatively low. The total number of kinds of mutual funds have been increasing yearly and within the many mutual funds available, picking the right fund and strategy to take as the best investment methods are what investors focus on. Every mutual fund has a set of efficiency benchmark. This study analyzes and discusses at the local mutual fund market and uses efficiency benchmark data from 2012 to 2017 of Taiwan’s local stock type and global investment stock type mutual funds. The research uses data mining to analyze the data from these benchmarks and looks for selection and manipulation strategies that can be applied to the mutual funds. Through data mining decision tree analysis, the study categorizes the mutual funds into three types: buy, hold, and sell. This research uses maximum return to explore the problem of investment strategy on mutual funds. Data analysis results help most investors to understand mutual fund strategy and the meaning of each index in order to minimize losses in the mutual fund market.

Style APA, Harvard, Vancouver, ISO itp.

38

Huang, Xiao-Juan, i 黃小娟. "Decision-Tree Based Image Clustering". Thesis, 2002. http://ndltd.ncl.edu.tw/handle/42912242158073405104.

Pełny tekst źródła

Streszczenie:

碩士
南華大學
資訊管理學系碩士班
90
In this thesis, we propose an image clustering method based on CLTree for image segmentation. CLTree is a clustering algorithm that uses decision-tree technique. It’s quit different from existing clustering methods, and it finds clusters without making any prior assumptions or any input parameters. Whether a clustering is good or bad depends on the user's subjective judgment, so we offer three image segmentation results. The experimental results reveal that all of them perform well.

Style APA, Harvard, Vancouver, ISO itp.

39

Wu, Chia-Chi, i 吳家齊. "Resource-Constrained Decision Tree Induction". Thesis, 2010. http://ndltd.ncl.edu.tw/handle/57990131846994037048.

Pełny tekst źródła

Streszczenie:

博士
國立中央大學
資訊管理研究所
98
Classification is one of the most important research domains in data mining. Among the existing classifiers, decision trees are probably the most popular and commonly-used classification models. Most of the decision tree algorithms aimed to maximize the classification accuracy and minimize the classification error. However, in many real-world applications, there are various types of cost or resource consumption involved in both the induction of decision tree and the classification of future instance. Furthermore, the problem we face may require us to complete a classification task with limited resource. Therefore, how to build an optimum decision tree with resource constraint becomes an important issue. In this study, we first propose two algorithms which are improved versions of traditional TDIDT(Top-Down Induction on Decision Trees) algorithms. Then, we adopt a brand new approach to deal with multiple resource constraints. This approach extracts association classification rules from training dataset first, and then builds a decision tree from the extracted rules. Empirical evaluations were carried out using real datasets, and the results indicated that the proposed methods can achieve satisfactory results in handling data under different resource constraints.

Style APA, Harvard, Vancouver, ISO itp.

40

Jeng, Yung Mo, i 鄭永模. "The Fuzzy Decision Tree Induction". Thesis, 1993. http://ndltd.ncl.edu.tw/handle/11456447856313611299.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

41

Shi-Feng, Hsi. "The Defuzzification for Fuzzy Decision Tree". 2001. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0009-0112200611304405.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

42

Hsi, Shi-Feng, i 奚世峰. "The Defuzzification for Fuzzy Decision Tree". Thesis, 2001. http://ndltd.ncl.edu.tw/handle/48869673305420105003.

Pełny tekst źródła

Streszczenie:

碩士
元智大學
資訊管理研究所
89
In recent years, fuzzy decision tree had been widely used to extracting classification knowledge from a set of feature-based data. And many researchers are engaged in the more efficient and optimal algorithms to construct fuzzy decision trees. However, very few papers discuss the process of defuzzification in fuzzy decision tree. Therefore, we propose a new method that emphasizes on the defuzzification process. The tree build by our method is called weighted fuzzy decision tree. It uses the concept of weighted fuzzy production rule(WFPR) in defuzzification process and the concept of fuzzy Bayesian inference(FBI) method to find the parameters needed in the inference process of WFPR. To verify the accuracy of our method for classification, standard benchmark datasets are used. When the tree is build as non-perfect decision tree, our proposed method has higher accuracy for classification than other defuzzification methods; when the tree is perfect decision tree, our method is also acceptable.

Style APA, Harvard, Vancouver, ISO itp.

43

Randall, William D. "Software reusability: a decision tree model". Thesis, 1988. http://hdl.handle.net/10945/23120.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

44

Lin, Cheng-ying, i 林政頴. "Privacy Preserving for Distributed Decision Tree". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/01048697791498275075.

Pełny tekst źródła

Streszczenie:

碩士
國立臺南大學
數位學習科技學系碩士班
96
As the recent development of the computer science, the data quantity of enterprise database increases rapidly. To extract the usefulness information from huge databases, many efficient data mining technologies have been applied. In recent years, the data mining tools are more and more powerful, and the risk of privacy leak has become an urgent problem. Privacy preserving data mining is a relatively new research area in data mining and knowledge discovery. In a common situation, databases are distributed among several organizations who would like to cooperate mining to extract global knowledge, but each party needs prevent it’s privacy not directly sharing the data. Therefore, this study presents an algorithm for privacy preserving distributed decision tree based on C4.5. While this has been done for horizontally partitioned data, this study presents an algorithm for vertically partitioned attributes. Each site computes a portion of data, and then they exchange the result to each other. The goal of this paper is to obtain correct data mining results and preserve the privacy of each site.

Style APA, Harvard, Vancouver, ISO itp.

45

Chen, Yih-Ming, i 陳奕名. "Borderline SMOTE adaptive boosted decision tree". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/02768976104039544520.

Pełny tekst źródła

Streszczenie:

碩士
國立交通大學
統計學研究所
104
The problem of learning from imbalanced data has been receiving a growing attention. Since dealing with imbalanced data may decrease the efficiency of classifier, many researchers have been working on this domain and coming up with many solutions, such as the method of combining SMOTE(Synthetic Minority Over-sampling Technique) and decision tree. In this study, we review the existing methods including SMOTE, Borderline SMOTE, Adaptive Boosting and SMOTE Boosting. To improve these methods, we propose an approach Borderline SMOTE Boosting. This approach is compared with the existing methods using three real data examples. The results show that the proposed method leads to a better result.

Style APA, Harvard, Vancouver, ISO itp.

46

Ngan, Dang Thi Kim, i 鄧氏金銀. "HTTP Botnet detection using decision tree". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/78666177227649974399.

Pełny tekst źródła

Streszczenie:

碩士
中國文化大學
資訊管理學系
102
Botnet is the most dangerous and widespread threat among the diverse forms of malware internet-attacks nowaday. A botnet is a group of damaged computers connected via Internet which are remotely accessed and controlled by hackers to make various network attacks. Malicious activities include DDoS attack, spam, click fraud, identity theft and information phishing. The most basic characteristic of botnets is the use of command and control channels to communicate with botnet and through which bonet can be updated and command. Botnet has become a common and effective tool used by Botmaster in many cyber-attacks. Recently malicious botnets develop to HTTP botnets instead of typical IRC botnets. HTTP botnets is the latest generations of Botnet ,and it use the standard HTTP protocol to contact with their bots. By using the normal HTTP traffic, the bots is consider as normal users of the networks, and the current network security systems cannot detect out them. To solve this problem, a method based on network behavior analysis system was evolved to improve modify and adding new features to the current methods of detecting HTTP-based Botnets and their bots.

Style APA, Harvard, Vancouver, ISO itp.

47

Lai, Jian-Cheng, i 賴建丞. "Fast Quad-Tree Depth Decision Algorithm for HEVC Coding Tree Block". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/39ucm4.

Pełny tekst źródła

Streszczenie:

碩士
國立虎尾科技大學
資訊工程研究所
102
High Efficiency Video Coding (HEVC) is recently developed for ultra high definition video compression technique, which provides a higher compression ratio and throughput compared with previously video compression standard H.264/AVC. Therefore, this technique is widely used to limited bandwidth network transmission and confined storage space. In order to obtain the higher compression ratio and maintain video quality, which provides variable block partition and mode prediction for HEVC encoder. If each block is computed during the mode decision process, a lot of encoding time is consumed. It makes limiting the applicability in real time for HEVC. Hence, there are many fast algorithms proposed to eliminate the block partition or mode prediction. In natural videos, the neighbor blocks have high correlation with current block, by which the reference block method is studied to terminate or eliminate the block or mode prediction. This method uses the lower computation of mode reduction to obtain a best compression ratio and time saving. Therefore, that is widely proposed for HEVC fast algorithm. On the other hand, the non-reference method has been proposed by extracting the feature of video frames. But the non-reference method predict the terminated condition. This thesis, proposes two quad-tree depth decision methods : one is the reference method and the other one non-reference method for depth-correlation and edge strength detection method, respectively. In reference block method, we find the correlation of up to 90% correlation with the co-located coding tree block (CTB) in the previous frame. Therefore, we use the co-located CTB depth information to limit the depth partition of CTB. Different from the previously proposed method, the proposed method adopts the extension of partition depth by one level. But it is poor prediction in fast moving object sequence or change scene. The fast moving and changing scenes are lower correlation between frames. Based on aforementioned disadvantage, the edge strength detection method is proposed to detect the structure variation of CTB to predict the encoded depth. Since this method does not require the reference to neighbor block, a better prediction with variation video sequence can be obtained. But it makes the poor prediction for unobvious edge video. For example, in dark videos, the edge are not obvious and the proposed algorithm makes the poor prediction of depth level. Finally, the proposed fast methods are implemented in HM 10.1 model to demonstrate the efficiency of our algorithm. The proposed edge density detection method can obtain 23.1% of time savings with BD-bitrate close to 0.28% on average and depth-correlation method can provide about 21.1% of time savings and BD-bitrate increase of 0.17% on average.

Style APA, Harvard, Vancouver, ISO itp.

48

Chao-YenChien i 簡兆彥. "Building Balanced Search Tree based on Layered Decision Tree for Packet Classification". Thesis, 2012. http://ndltd.ncl.edu.tw/handle/45153250848607847087.

Pełny tekst źródła

Streszczenie:

碩士
國立成功大學
資訊工程學系碩博士班
100
Packet classification is an important building block of the Internet routers for many network applications, such as Quality of Service (QoS), security, monitoring, analysis, and network intrusion detection (NIDS). In this thesis, we propose a scheme called Layer based Search Tree (LST) to solve multi-field packet classification problem. LST improves the traditional decision tree based schemes (e.g. HyperCuts and EffiCuts) by reconstructing the leaf nodes of the decision tree as an approximately balanced search tree. Since all the address subspace covered by each node of LST is disjoint, the buckets of the leaf and internal nodes in LST must not be empty. Thus, only the rules in one bucket can match the header values of the incoming packet. Searches on LST are completed immediately after the packet matches a rule in some internal node. In software environment, the experimental results show that LST requires less memory storage even if LST categorizes the rules by two fields to reduce rule duplication rather than five fields in EffiCuts. Besides, LST needs less number of memory accesses than HyperCuts and EffiCuts. In addition, we design the hardware search engine with pipeline and parallel architecture for the LST in Xilinx Virtex-5 FPGA environment. Because the memory usage of LST is very efficient, our search engine can support the ACL, FW, and IPC tables of 50k rules. LST search engine with dual ported memory can sustain the throughput of over 120 Gbps for the packets of minimum size (40 bytes).

Style APA, Harvard, Vancouver, ISO itp.

49

Yang, Tsan-Hui, i 楊璨輝. "Behavior Cloning by RL-based Decision Tree". Thesis, 2006. http://ndltd.ncl.edu.tw/handle/32882692325935020525.

Pełny tekst źródła

Streszczenie:

碩士
國立中正大學
電機工程所
95
It is hard to define a state space or the proper reward function in reinforcement learning to make the robot act as expected. In this paper, we demonstrate the expected behavior for a robot. Then a RL-based decision tree approach which decides to split according to long–term evaluations, instead of a top-down greedy strategy which finds out the relationship between the input and output from the demonstration data. We use this method to teach a robot for target seeking problem. In order to promote the performance in tackling target seeking problem, we add a Q-learning along with the state space based on RL-based decision tree. The experiment result shows that Q-learning can promote the performance quickly. For demonstration, we build a mobile robot powered by an embedded board. The robot can detect the ball of the range in any direction with omni-directional vision system. With such powerful embedded computing capability and the efficient machine vision system, the robot can inherit the learned behavior from a simulator which has learned the empirical behavior and continue to learn with Q-learning to improve the performance of target seeking problem.

Style APA, Harvard, Vancouver, ISO itp.

50

Shao, Fu-Hsiang, i 邵福祥. "The kalman filter embedded fuzzy decision tree". Thesis, 1997. http://ndltd.ncl.edu.tw/handle/84437871359722247166.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Decision tree”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych