Dissertations / Theses: 'DECISION TREE TECHNIQUE'

1

Yedida, Venkata Rama Kumar Swamy. "Protein Function Prediction Using Decision Tree Technique." University of Akron / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=akron1216313412.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Li, Yunjie. "Applying Data Mining Techniques on Continuous Sensed Data : For daily living activity recognition." Thesis, Mittuniversitetet, Avdelningen för informations- och kommunikationssystem, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-23424.

Full text

Abstract:

Nowadays, with the rapid development of the Internet of Things, the applicationfield of wearable sensors has been continuously expanded and extended, especiallyin the areas of remote electronic medical treatment, smart homes ect. Human dailyactivities recognition based on the sensing data is one of the challenges. With avariety of data mining techniques, the activities can be automatically recognized. Butdue to the diversity and the complexity of the sensor data, not every kind of datamining technique can performed very easily, until after a systematic analysis andimprovement. In this thesis, several data mining techniques were involved in theanalysis of a continuous sensing dataset in order to achieve the objective of humandaily activities recognition. This work studied several data mining techniques andfocuses on three of them; Decision Tree, Naive Bayes and neural network, analyzedand compared these techniques according to the classification results. The paper alsoproposed some improvements to the data mining techniques according to thespecific dataset. The comparison of the three classification results showed that eachclassifier has its own limitations and advantages. The proposed idea of combing theDecision Tree model with the neural network model significantly increased theclassification accuracy in this experiment.

APA, Harvard, Vancouver, ISO, and other styles

3

Thomas, Clifford S. "From 'tree' based Bayesian networks to mutual information classifiers : deriving a singly connected network classifier using an information theory based technique." Thesis, University of Stirling, 2005. http://hdl.handle.net/1893/2623.

Full text

Abstract:

For reasoning under uncertainty the Bayesian network has become the representation of choice. However, except where models are considered 'simple' the task of construction and inference are provably NP-hard. For modelling larger 'real' world problems this computational complexity has been addressed by methods that approximate the model. The Naive Bayes classifier, which has strong assumptions of independence among features, is a common approach, whilst the class of trees is another less extreme example. In this thesis we propose the use of an information theory based technique as a mechanism for inference in Singly Connected Networks. We call this a Mutual Information Measure classifier, as it corresponds to the restricted class of trees built from mutual information. We show that the new approach provides for both an efficient and localised method of classification, with performance accuracies comparable with the less restricted general Bayesian networks. To improve the performance of the classifier, we additionally investigate the possibility of expanding the class Markov blanket by use of a Wrapper approach and further show that the performance can be improved by focusing on the class Markov blanket and that the improvement is not at the expense of increased complexity. Finally, the two methods are applied to the task of diagnosing the 'real' world medical domain, Acute Abdominal Pain. Known to be both a different and challenging domain to classify, the objective was to investigate the optiniality claims, in respect of the Naive Bayes classifier, that some researchers have argued, for classifying in this domain. Despite some loss of representation capabilities we show that the Mutual Information Measure classifier can be effectively applied to the domain and also provides a recognisable qualitative structure without violating 'real' world assertions. In respect of its 'selective' variant we further show that the improvement achieves a comparable predictive accuracy to the Naive Bayes classifier and that the Naive Bayes classifier's 'overall' performance is largely due the contribution of the majority group Non-Specific Abdominal Pain, a group of exclusion.

APA, Harvard, Vancouver, ISO, and other styles

4

Dalkiran, Evrim. "Discrete and Continuous Nonconvex Optimization: Decision Trees, Valid Inequalities, and Reduced Basis Techniques." Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/77366.

Full text

Abstract:

This dissertation addresses the modeling and analysis of a strategic risk management problem via a novel decision tree optimization approach, as well as development of enhanced Reformulation-Linearization Technique (RLT)-based linear programming (LP) relaxations for solving nonconvex polynomial programming problems, through the generation of valid inequalities and reduced representations, along with the design and implementation of efficient algorithms. We first conduct a quantitative analysis for a strategic risk management problem that involves allocating certain available failure-mitigating and consequence-alleviating resources to reduce the failure probabilities of system safety components and subsequent losses, respectively, together with selecting optimal strategic decision alternatives, in order to minimize the risk or expected loss in the event of a hazardous occurrence. Using a novel decision tree optimization approach to represent the cascading sequences of probabilistic events as controlled by key decisions and investment alternatives, the problem is modeled as a nonconvex mixed-integer 0-1 factorable program. We develop a specialized branch-and-bound algorithm in which lower bounds are computed via tight linear relaxations of the original problem that are constructed by utilizing a polyhedral outer-approximation mechanism in concert with two alternative linearization schemes having different levels of tightness and complexity. We also suggest three alternative branching schemes, each of which is proven to guarantee convergence to a global optimum for the underlying problem. Extensive computational results and sensitivity analyses are presented to provide insights and to demonstrate the efficacy of the proposed algorithm. In particular, our methodology outperformed the commercial software BARON (Version 8.1.5), yielding a more robust performance along with an 89.9% savings in effort on average. Next, we enhance RLT-based LP relaxations for polynomial programming problems by developing two classes of valid inequalities: v-semidefinite cuts and bound-grid-factor constraints. The first of these uses concepts derived from semidefinite programming. Given an RLT relaxation, we impose positive semidefiniteness on suitable dyadic variable-product matrices, and correspondingly derive implied semidefinite cuts. In the case of polynomial programs, there are several possible variants for selecting such dyadic variable-product matrices for imposing positive semidefiniteness restrictions in order to derive implied valid inequalities, which leads to a new class of cutting planes that we call v-semidefinite cuts. We explore various strategies for generating such cuts within the context of an RLT-based branch-and-cut scheme, and exhibit their relative effectiveness towards tightening the RLT relaxations and solving the underlying polynomial programming problems, using a test-bed of randomly generated instances as well as standard problems from the literature. Our results demonstrate that these cutting planes achieve a significant tightening of the lower bound in contrast with using RLT as a stand-alone approach, thereby enabling an appreciable reduction in the overall computational effort, even in comparison with the commercial software BARON. Empirically, our proposed cut-enhanced algorithm reduced the computational effort required by the latter two approaches by 44% and 77%, respectively, over a test-bed of 60 polynomial programming problems. As a second cutting plane strategy, we introduce a new class of bound-grid-factor constraints that can be judiciously used to augment the basic RLT relaxations in order to improve the quality of lower bounds and enhance the performance of global branch-and-bound algorithms. Certain theoretical properties are established that shed light on the effect of these valid inequalities in driving the discrepancies between RLT variables and their associated nonlinear products to zero. To preserve computational expediency while promoting efficiency, we propose certain concurrent and sequential cut generation routines and various grid-factor selection rules. The results indicate a significant tightening of lower bounds, which yields an overall reduction in computational effort of 21% for solving a test-bed of 15 challenging polynomial programming problems to global optimality in comparison with the basic RLT procedure, and over a 100-fold speed-up in comparison with the commercial software BARON. Finally, we explore equivalent, reduced size RLT-based formulations for polynomial programming problems. Utilizing a basis partitioning scheme for an embedded linear equality subsystem, we show that a strict subset of RLT defining equalities imply the remaining ones. Applying this result, we derive significantly reduced RLT representations and develop certain coherent associated branching rules that assure convergence to a global optimum, along with static as well as dynamic basis selection strategies to implement the proposed procedure. In addition, we enhance the RLT relaxations with v-semidefinite cuts, which are empirically shown to further improve the relative performance of the reduced RLT method over the usual RLT approach. Computational results presented using a test-bed of 10 challenging polynomial programs to evaluate the different reduction strategies demonstrate that our superlative proposed approach achieved more than a four-fold improvement in computational effort in comparison with both the commercial software BARON and a recently developed open-source code, Couenne, for solving nonconvex mixed-integer nonlinear programming problems. Moreover, our approach robustly solved all the test cases to global optimality, whereas BARON and Couenne were jointly able to solve only a single instance to optimality within the set computational time limit, having an unresolved average optimality gap of 260% and 437%, respectively, for the other nine instances. This dissertation makes several broader contributions to the field of nonconvex optimization, including factorable, nonlinear mixed-integer programming problems. The proposed decision tree optimization framework can serve as a versatile management tool in the arenas of homeland security and health-care. Furthermore, we have advanced the frontier for tackling formidable nonconvex polynomial programming problems that arise in emerging fields such as signal processing, biomedical engineering, materials science, and risk management. An open-source software using the proposed reduced RLT representations, semidefinite cuts, bound-grid-factor constraints, and range reduction strategies, is currently under preparation. In addition, the different classes of challenging polynomial programming test problems that are utilized in the computational studies conducted in this dissertation have been made available for other researchers via the Web-page http://filebox.vt.edu/users/dalkiran/website/. It is our hope and belief that the modeling and methodological contributions made in this dissertation will serve society in a broader context through the myriad of widespread applications they support.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

5

Twala, Bhekisipho. "Effective techniques for handling incomplete data using decision trees." Thesis, Open University, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418465.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Millerand, Gaëtan. "Enhancing decision tree accuracy and compactness with improved categorical split and sampling techniques." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279454.

Full text

Abstract:

Decision tree is one of the most popular algorithms in the domain of explainable AI. From its structure, it is simple to induce a set of decision rules which are totally understandable for a human. That is why there is currently research on improving decision or mapping other models into a tree. Decision trees generated by C4.5 or ID3 tree suffer from two main issues. The first one is that they often have lower performances in term of accuracy for classification tasks or mean square error for regression tasks compared to state-of-the-art models like XGBoost or deep neural networks. On almost every task, there is an important gap between top models like XGboost and decision trees. This thesis addresses this problem by providing a new method based on data augmentation using state-of-the-art models which outperforms the old ones regarding evaluation metrics. The second problem is the compactness of the decision tree, as the depth increases the set of rules becomes exponentially big, especially when the splitted attribute is a categorical one. Standards solution to handle categorical values are to turn them into dummy variables or to split on each value producing complex models. A comparative study of current methods of splitting categorical values in classification problems is done in this thesis. A new method is also studied in the case of regression.
Beslutsträd är en av de mest populära algoritmerna i den förklarbara AI-domänen. I själva verket är det från dess struktur verkligen enkelt att framställa en uppsättning beslutsregler som är helt förståeliga för en vanlig användare. Därför forskas det för närvarande på att förbättra beslut eller kartlägga andra modeller i ett träd. Beslutsträd genererat av C4.5 eller ID3-träd lider av två huvudproblem. Den första är att de ofta har lägre prestanda när det gäller noggrannhet för klassificeringsuppgifter eller medelkvadratfel för regressionsuppgiftens noggrannhet jämfört med modernaste modeller som XGBoost eller djupa neurala nätverk. I nästan varje uppgift finns det faktiskt ett viktigt gap mellan toppmodeller som XGboost och beslutsträd. Detta examensarbete tar upp detta problem genom att tillhandahålla en ny metod baserad på dataförstärkning med hjälp av modernaste modeller som överträffar de gamla när det gäller utvärderingsmätningar. Det andra problemet är beslutsträdets kompakthet, allteftersom djupet ökar, blir uppsättningen av regler exponentiellt stor, särskilt när det delade attributet är kategoriskt. Standardlösning för att hantera kategoriska värden är att förvandla dem till dummiesvariabler eller dela på varje värde som producerar komplexa modeller. En jämförande studie av nuvarande metoder för att dela kategoriska värden i klassificeringsproblem görs i detta examensarbete, en ny metod studeras också i fallet med regression.

APA, Harvard, Vancouver, ISO, and other styles

7

Valente, Lorenzo. "Reconstruction of non-prompt charmed baryon Λc with boosted decision trees technique." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/21033/.

Full text

Abstract:

L'esperimento ALICE studia la fisica dell'interazione forte a estreme densità di energia attraverso la collisione di ioni pesanti. In tali condizioni è possibile la formazione dello stato della materia chiamato plasma di quark e gluoni. A causa della ridotta vita media di tale stato, lo studio è molto complesso ed è pertanto possibile condurlo solo in modo indiretto sulla base delle modalità di raffreddamento e dalle particelle rilasciate nel processo. Uno dei principali metodi d’indagine è lo studio di adroni contenenti quark pesanti (charm e beauty) e di come queste particelle, prodotte nei primi stadi della collisione, interagiscono con questo stato della materia. L'obiettivo della tesi è la ricostruzione del barione charmato Λc e la distinzione del segnale non-prompt da quello prompt attraverso la tecnica dei Boosted Decision Trees. L'analisi è stata condotta attraverso l'approccio di analisi multivariata in cui è possibile considerare le proprietà di più eventi contemporaneamente, ricavando il maggior numero di informazioni attraverso tecniche di machine learning.

APA, Harvard, Vancouver, ISO, and other styles

8

Townsend, Whitney Jeanne. "Discrete function representations utilizing decision diagrams and spectral techniques." Thesis, Mississippi State : Mississippi State University, 2002. http://library.msstate.edu/etd/show.asp?etd=etd-07012002-160303.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ravula, Ravindar Reddy. "Classification of Malware using Reverse Engineering and Data Mining Techniques." University of Akron / OhioLINK, 2011. http://rave.ohiolink.edu/etdc/view?acc_num=akron1311042709.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Jia, Xiuping Electrical Engineering Australian Defence Force Academy UNSW. "Classification techniques for hyperspectral remote sensing image data." Awarded by:University of New South Wales - Australian Defence Force Academy. School of Electrical Engineering, 1996. http://handle.unsw.edu.au/1959.4/38713.

Full text

Abstract:

Hyperspectral remote sensing image data, such as that recorded by AVIRIS with 224 spectral bands, provides rich information on ground cover types. However, it presents new problems in machine assisted interpretation, mainly in long processing times and the difficulties of class training due to the low ratio of number of training samples to the number of bands. This thesis investigates feasible and efficient feature reduction and image classification techniques which are appropriate for hyperspectral image data. The study is reported in three parts. The first concerns a deterministic approach for hyperspectral data interpretation. Multigroup and multiple threshold spectral coding procedures, and associated techniques for spectral matching and classification, are proposed and tested. By coding on subgroups of bands using one or three thresholds, spectral searching and matching becomes simple, fast and free of the need for radiometric correction. Modifications of existing statistical techniques are proposed in the second part of the investigation A block-based maximum likelihood classification technique is developed. Several subgroups are formed from the complete set of spectral bands in the data, based on the properties of global correlation among the bands. Subgroups which are poorly correlated with each other are treated independently using conventional maximum likelihood classification. Experimental results demonstrate that, when using appropriate subgroup sizes, the new method provides a compromise among classification accuracy, processing time and available training pixels. Furthermore, a segmented, and possibly multi-layer, principal components transformation is proposed as a possible feature reduction technique prior to classification, and for effective colour display. The transformation is performed efficiently on each of the highly correlated subgroups of bands independently. Selected features from each transformed subgroup can be then transformed again to achieve a satisfactory data reduction ratio and to generate the three most significant components for colour display. Classification accuracy is improved and high quality colour image display is achieved in experiments using two AVIRIS data sets.

APA, Harvard, Vancouver, ISO, and other styles

11

Buontempo, Frances Vivien. "Rapid toxicity prediction of organic chemicals using data mining techniques and SAR based on genetic programming for decision tree generation." Thesis, University of Leeds, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.416813.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Peng, Tian. "Structural system identification by dynamic observability technique." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/672173.

Full text

Abstract:

Structure system identification (SSI) can be classified as static and dynamic depending on the type of excitation. SSI by Observability Method (OM) using static tests was proposed and analyzed to address the observability of the estimated parameters. This mathematical approach has been used in other fields such as hydraulics, electrical, and power networks or transportation. Usually, the structural behavior of engineering structures can be identified according to dynamic characteristics such as mode shapes, natural frequencies, and damping ratios. However, the analysis of SSI by dynamic Observability Method using dynamic information is lacking. This Ph.D. thesis developed the dynamic Observability Method using masses, modal frequencies, modal deflections based on the static OM to obtain the geometrical and mechanical parameters of the structure. This thesis mainly contains three aspects of work. Firstly, in chapter 3, the development, for the first time, of constrained observability techniques (COM) for parametric estimation of structures using dynamic information such as frequencies and mode-shapes was proposed. New algorithms are introduced based on the dynamic eigenvalue equation. Two step by step examples are used to illustrate the functioning of these. Parametric expressions for the observed variables are successfully obtained,which will allow the study of the sensitivity of each of the variables in the problem and the error distribution, which is an advantage with respect to non-parametric SSI techniques. A large structure is used to validate this new application, whose structural properties can be obtained satisfactorily in either the whole or local analysis, and the results show that the required measurement set is smaller than the required for a static analysis. Chapters 4 and 5 are the applications of COM to fill the shortcomings of current research, such as the optimal SHM+SSI strategy and uncertainty quantification. Secondly, in chapter 4, the role of the SHM strategy and the SSI analysis based on the Constrained Observability Method (COM), which aims at reducing the estimation error, is discussed. A machine learning decision tool to help building the best-combined strategy of SHM and SSI that can result in the most accurate estimations of the structural properties is proposed, and the combination of COM and decision tree algorithm is used for the first time. The machine learning algorithm is based on the theory of Decision Trees. Decision trees are firstly presented to investigate the influence of the variables (layout of bridge, span length, measurement set, and weight factor in the objective function of the COM) involved in the SHM+SSI process on the error estimation in a general structure. The verification of the method with a real bridge with different levels of damage shows that the method is robust even for a high damage level, showing the SHM+SSI strategy that yields the most accurate estimation. Finally, an analysis of uncertainty quantification (UQ) is necessary to assess the effect of uncertainties on the estimated parameters and to provide a way to evaluate these uncertainties. This work is carried out in chapter 5. There are a large number of UQ approaches in science and engineering. It is identified that the proposed dynamic Constrained Observability Method (COM) can make up for some of the shortcomings of existing methods. After that, the COM is used to analyze a real bridge. A result is compared with a method based on a Bayesian approach demonstrating its applicability and correct performance through the analysis of a reinforced concrete beam.
La identificación del sistema estructural puede clasificarse como estático y dinámico según el tipo de excitación. Recientemente, se ha propuesto y analizado SSI mediante el Método de Observabilidad (OM) utilizando medidas experimentales de pruebas estáticas para abordar la observabilidad de los parámetros estimados. Este enfoque matemático se ha utilizado en otros campos como la hidráulica, la electricidad y las redes de energía o transporte. Por lo general, el comportamiento de las estructuras de ingeniería se puede identificar de acuerdo con características dinámicas como formas modales, frecuencias naturales y amortiguamiento. Sin embargo, hasta la fecha, no se han propuesto análisis de SSI por el método de observabilidad utilizando información dinámica. Esta tesis desarrolla el Método de Observabilidad Dinámico usando masas, frecuencias propias y modos de vibración para identificar los parámetros mecánicos de los elementos de una estructura. A tal fin, se desarrollan tres líneas de trabajo. En primer lugar, se propone la primera aplicación de técnicas de observabilidad restringida para la estimación paramétrica de estructuras utilizando información dinámica como frecuencias y modos de vibración. Se introducen nuevos algoritmos basados en la ecuación dinámica de valores propios. Se utilizan dos ejemplos paso a paso para ilustrar su l funcionamiento. Se obtienen con éxito expresiones paramétricas para las variables observadas, lo que permite estudiar la sensibilidad de cada una de las variables en el problema y la distribución del error, lo cual es una ventaja respecto a las técnicas SSI no paramétricas. Para la validación de esta nueva aplicación se utiliza una estructura compleja, cuyas propiedades estructurales se pueden obtener satisfactoriamente en el análisis total o local, y los resultados muestran que el conjunto de medidas requerido es menor que en el caso del análisis estático. Los capítulos 4 y 5 son las aplicaciones de COM para subsanar las deficiencias de la investigación actual, como la estrategia óptima de SHM + SSI y la cuantificación de la incertidumbre. En segundo lugar, se discute el papel que juega la estrategia SHM y el análisis SSI basado en el Método de Observabilidad Restringido (COM), con el objetivo reducir el error de estimación. Se propone una herramienta de decisión de aprendizaje automático para ayudar a construir la mejor estrategia combinada de SHM y SSI que puede resultar en estimaciones más precisas de las propiedades estructurales. Para ello, se utiliza la combinación de algoritmo COM dinámico y el método de los árboles de decisión por primera vez. Los árboles de decisión se presentan, en primer lugar, como una herramienta útil para investigar la influencia de las variables (tipología estructural del puente, longitud del vano, conjunto de medidas experimentales y pesos en la función objetivo) involucradas en el proceso SHM + SSI con el objetivo de minimizar el error en la identificación de la estructura. La verificación del método con un puente real con diferentes niveles de daño muestra que el método es robusto incluso para un nivel de daño importante, resultando en la estrategia SHM + SSI que arroja la estimación más precisa. Por último, es necesario un análisis de cuantificación de la incertidumbre (UQ) para evaluar el efecto de las incertidumbres sobre los parámetros estimados y proporcionar una forma de evaluar las incertidumbres en los parámetros identificados. Hay una gran cantidad de enfoques de UQ en ciencia e ingeniería. En primer lugar, se identifica que el Método de Observabilidad Restringido (COM) dinámico propuesto puede compensar algunas de las deficiencias de los métodos existentes. Posteriormente, el COM se utiliza para analizar un puente real. Se compara el resultado con un método existente basado, demostrando su aplicabilidad y correcto desempeño mediante la aplicación a una viga de hormigón armado. Además, se obtiene como resultado que el mejor conjunto de puntos de medición experimental dependerá de la incertidumbre epistémica incorporada en el modelo. Dado que la incertidumbre epistémica se puede eliminar a medida que aumenta el conocimiento de la estructura, la ubicación óptima de los sensores debe lograrse considerando no sólo la precisión de los mismos, sino también los modos de vibración de la estructura.

APA, Harvard, Vancouver, ISO, and other styles

13

Chida, Anjum A. "Protein Tertiary Model Assessment Using Granular Machine Learning Techniques." Digital Archive @ GSU, 2012. http://digitalarchive.gsu.edu/cs_diss/65.

Full text

Abstract:

The automatic prediction of protein three dimensional structures from its amino acid sequence has become one of the most important and researched fields in bioinformatics. As models are not experimental structures determined with known accuracy but rather with prediction it’s vital to determine estimates of models quality. We attempt to solve this problem using machine learning techniques and information from both the sequence and structure of the protein. The goal is to generate a machine that understands structures from PDB and when given a new model, predicts whether it belongs to the same class as the PDB structures (correct or incorrect protein models). Different subsets of PDB (protein data bank) are considered for evaluating the prediction potential of the machine learning methods. Here we show two such machines, one using SVM (support vector machines) and another using fuzzy decision trees (FDT). First using a preliminary encoding style SVM could get around 70% in protein model quality assessment accuracy, and improved Fuzzy Decision Tree (IFDT) could reach above 80% accuracy. For the purpose of reducing computational overhead multiprocessor environment and basic feature selection method is used in machine learning algorithm using SVM. Next an enhanced scheme is introduced using new encoding style. In the new style, information like amino acid substitution matrix, polarity, secondary structure information and relative distance between alpha carbon atoms etc is collected through spatial traversing of the 3D structure to form training vectors. This guarantees that the properties of alpha carbon atoms that are close together in 3D space and thus interacting are used in vector formation. With the use of fuzzy decision tree, we obtained a training accuracy around 90%. There is significant improvement compared to previous encoding technique in prediction accuracy and execution time. This outcome motivates to continue to explore effective machine learning algorithms for accurate protein model quality assessment. Finally these machines are tested using CASP8 and CASP9 templates and compared with other CASP competitors, with promising results. We further discuss the importance of model quality assessment and other information from proteins that could be considered for the same.

APA, Harvard, Vancouver, ISO, and other styles

14

Izad, Shenas Seyed Abdolmotalleb. "Predicting High-cost Patients in General Population Using Data Mining Techniques." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23461.

Full text

Abstract:

In this research, we apply data mining techniques to a nationally-representative expenditure data from the US to predict very high-cost patients in the top 5 cost percentiles, among the general population. Samples are derived from the Medical Expenditure Panel Survey’s Household Component data for 2006-2008 including 98,175 records. After pre-processing, partitioning and balancing the data, the final MEPS dataset with 31,704 records is modeled by Decision Trees (including C5.0 and CHAID), Neural Networks. Multiple predictive models are built and their performances are analyzed using various measures including correctness accuracy, G-mean, and Area under ROC Curve. We conclude that the CHAID tree returns the best G-mean and AUC measures for top performing predictive models ranging from 76% to 85%, and 0.812 to 0.942 units, respectively. Among a primary set of 66 attributes, the best predictors to estimate the top 5% high-cost population include individual’s overall health perception, history of blood cholesterol check, history of physical/sensory/mental limitations, age, and history of colonic prevention measures. It is worthy to note that we do not consider number of visits to care providers as a predictor since it has a high correlation with the expenditure, and does not offer a new insight to the data (i.e. it is a trivial predictor). We predict high-cost patients without knowing how many times the patient was visited by doctors or hospitalized. Consequently, the results from this study can be used by policy makers, health planners, and insurers to plan and improve delivery of health services.

APA, Harvard, Vancouver, ISO, and other styles

15

Park, Samuel M. "A Comparison of Machine Learning Techniques to Predict University Rates." University of Toledo / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1564790014887692.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Amein, Hussein Aly Abbass. "Computational intelligence techniques for decision making : with applications to the dairy industry." Thesis, Queensland University of Technology, 2000. https://eprints.qut.edu.au/36867/1/36867_Digitised%20Thesis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Irniger, Christophe-André. "Graph matching filtering databases of graphs using machine learning techniques." Berlin Aka, 2005. http://deposit.ddb.de/cgi-bin/dokserv?id=2677754&prov=M&dok_var=1&dok_ext=htm.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Mistry, Pritesh. "A Knowledge Based Approach of Toxicity Prediction for Drug Formulation. Modelling Drug Vehicle Relationships Using Soft Computing Techniques." Thesis, University of Bradford, 2015. http://hdl.handle.net/10454/14440.

Full text

Abstract:

This multidisciplinary thesis is concerned with the prediction of drug formulations for the reduction of drug toxicity. Both scientific and computational approaches are utilised to make original contributions to the field of predictive toxicology. The first part of this thesis provides a detailed scientific discussion on all aspects of drug formulation and toxicity. Discussions are focused around the principal mechanisms of drug toxicity and how drug toxicity is studied and reported in the literature. Furthermore, a review of the current technologies available for formulating drugs for toxicity reduction is provided. Examples of studies reported in the literature that have used these technologies to reduce drug toxicity are also reported. The thesis also provides an overview of the computational approaches currently employed in the field of in silico predictive toxicology. This overview focuses on the machine learning approaches used to build predictive QSAR classification models, with examples discovered from the literature provided. Two methodologies have been developed as part of the main work of this thesis. The first is focused on use of directed bipartite graphs and Venn diagrams for the visualisation and extraction of drug-vehicle relationships from large un-curated datasets which show changes in the patterns of toxicity. These relationships can be rapidly extracted and visualised using the methodology proposed in chapter 4. The second methodology proposed, involves mining large datasets for the extraction of drug-vehicle toxicity data. The methodology uses an area-under-the-curve principle to make pairwise comparisons of vehicles which are classified according to the toxicity protection they offer, from which predictive classification models based on random forests and decisions trees are built. The results of this methodology are reported in chapter 6.

APA, Harvard, Vancouver, ISO, and other styles

19

Phadke, Amit Ashok. "Predicting open-source software quality using statistical and machine learning techniques." Master's thesis, Mississippi State : Mississippi State University, 2004. http://library.msstate.edu/etd/show.asp?etd=etd-11092004-105801.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

AFRASINEI, GABRIELA MIHAELA. "Study of land degradation and desertification dynamics in North Africa areas using remote sensing techniques." Doctoral thesis, Università degli Studi di Cagliari, 2016. http://hdl.handle.net/11584/266730.

Full text

Abstract:

In fragile-ecosystem arid and semi-arid land, climatic variations, water scarcity and human pressure accelerate ongoing degradation of natural resources. In order to implement sustainable management, the ecological state of the land must be known and diachronic studies to monitor and assess desertification processes are indispensable in this respect. The present study is developed in the frame of WADIS-MAR (www.wadismar.eu). This is one of the five Demonstration Projects implemented within the Regional Programme “Sustainable Water Integrated Management (SWIM)” (www.swim-sm.eu ), funded by the European Commission and which aims to contribute to the effective implementation and extensive dissemination of sustainable water management policies and practices in the Southern Mediterranean Region. The WADIS-MAR Project concerns the realization of an integrated water harvesting and artificial aquifer recharge techniques in two watersheds in Maghreb Region: Oued Biskra in Algeria and wadi Oum Zessar in Tunisia. The WADIS MAR Project is coordinated by the Desertification Research Center of the University of Sassari in partnership with the University of Barcelona (Spain), Institut des Régions Arides (Tunisia) and Agence Nationale des Ressources Hydrauliques (Algeria) and the international organization Observatorie du Sahara et du Sahel. The project is coordinated by Prof. Giorgio Ghiglieri. The project aims at the promotion of an integrated, sustainable water harvesting and agriculture management in two watersheds in Tunisia and Algeria. As agriculture and animal husbandry are the two main economic activities in these areas, demand and pressure on natural resources increase in order to cope with increasing population’s needs. In arid and semiarid study areas of Algeria and Tunisia, sustainable development of agriculture and resources management require the understanding of these dynamics as it withstands monitoring of desertification processes. Vegetation is the first indicator of decay in the ecosystem functions as it is sensitive to any disturbance, as well as soil characteristics and dynamics as it is edaphically related to the former. Satellite remote sensing of land affected by sand encroachment and salinity is a useful tool for decision support through detection and evaluation of desertification indicating features. Land cover, land use, soil salinization and sand encroachment are examples of such indicators that if integrated in a diachronic assessment, can provide quantitative and qualitative information on the ecological state of the land, particularly degradation tendencies. In recent literature, detecting and mapping features in saline and sandy environments with remotely sensed imagery has been reported successful through the use of both multispectral and hyperspectral imagery, yet the limitations to both image types maintain “no agreed-on best approach to this technology for monitoring and mapping soil salinity and sand encroachment”. Problems regarding the image classification of features in these particular areas have been reported by several researchers, either with statistical or neural/connectionist algorithms for both fuzzy and hard classifications methods. In this research, salt and sand features were assessed through both visual interpretation and automated classification approaches, employing historical and present Landsat imagery (from 1984 to 2015). The decision tree analysis was chosen because of its high flexibility of input data range and type, the easiness of class extraction through non-parametric, multi-stage classification. It makes no a priori assumption on class distribution, unlike traditional statistical classifiers. The visual interpretation mapping of land cover and land use was undergone according to acknowledged standard nomenclature and methodology, such as CORINE land cover or AFRICOVER 2000, Global Land Cove 2000 etc. The automated one implies a decision tree (DT) classifier and an unsupervised classification applied to the principal components (PC) extracted from Knepper ratios composite in order to assess their validity for the change detection analysis. In the Tunisian study area, it was possible to conduct a thorough ground truth survey resulting in a record of 400 ground truth points containing several information layers (ground survey sheet information on various land components, photographs, reports in various file formats) stored within the a shareable standalone geodatabase. Spectral data were also acquired in situ using the handheld ASD FieldSpec 3 Jr. Full Range (350 – 2500 nm) spectroradiometer and samples were taken for X-ray diffraction analysis. The sampling sites were chosen on the basis of a geomorphological analysis, ancillary data and the previously interpreted land cover/land use map, specifically generated for this study employing Landsat 7 and 8 imagery. The spectral campaign has enabled the acquisition of spectral reflectance measurements of 34 points, of which 14 points for saline surfaces (9 samples); 10 points for sand encroachment areas (10 samples); 3 points for typical vegetation (halophyte and psammophyte) and 7 points for mixed surfaces. Five of the eleven indices employed in the Decision Tree construction were constructed throughout the current study, among which we propose also a salinity index (SMI) for the extraction of highly saline areas. Their application have resulted in an accuracy of more than 80%. For the error estimation phase, the interpreted land cover/use map (both areas) and ground truth data (Oum Zessar area only) supported the results of the 1984 to 2014 salt – affected areas diachronic analysis obtained through both automatic methods. Although IsoDATA classification maps applied to Knepper ratios Principal Component Analysis has proven its good potential as an approach of fast automated, user-independent classifier, accuracy assessment has shown that decision tree outstood it and was proven to have a substantial advantage over the former. The employment of the Decision Tree classifier has proven to be more flexible and adequate for the extraction of highly and moderately saline areas and major land cover types, as it allows multi-source information and higher user control, with an accuracy of more than 80%. Integrating results with ancillary spatial data, we could argue driving forces, anthropic vs natural, as well as source areas, and understand and estimate the metrics of desertification processes. In the Biskra area (Algeria), results indicate that the expansion of irrigated farmland in the past three decades contributes to an ongoing secondary salinization of soils, with an increase of over 75%. In the Oum Zessar area (Tunisia), there was substantial change in several landscape components in the last decades, related to increased anthropic pressure and settlement, agricultural policies and national development strategies. One of the most concerning aspects is the expansion of sand encroached areas over the last three decades of around 27%.

APA, Harvard, Vancouver, ISO, and other styles

21

Pabarškaitė, Židrina. "Enhancements of pre-processing, analysis and presentation techniques in web log mining." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090713_142203-05841.

Full text

Abstract:

As Internet is becoming an important part of our life, more attention is paid to the information quality and how it is displayed to the user. The research area of this work is web data analysis and methods how to process this data. This knowledge can be extracted by gathering web servers’ data – log files, where all users’ navigational patters about browsing are recorded. The research object of the dissertation is web log data mining process. General topics that are related with this object: web log data preparation methods, data mining algorithms for prediction and classification tasks, web text mining. The key target of the thesis is to develop methods how to improve knowledge discovery steps mining web log data that would reveal new opportunities to the data analyst. While performing web log analysis, it was discovered that insufficient interest has been paid to web log data cleaning process. By reducing the number of redundant records data mining process becomes much more effective and faster. Therefore a new original cleaning framework was introduced which leaves records that only corresponds to the real user clicks. People tend to understand technical information more if it is similar to a human language. Therefore it is advantageous to use decision trees for mining web log data, as they generate web usage patterns in the form of rules which are understandable to humans. However, it was discovered that users browsing history length is different, therefore specific data... [to full text]
Internetui skverbiantis į mūsų gyvenimą, vis didesnis dėmesys kreipiamas į informacijos pateikimo kokybę, bei į tai, kaip informacija yra pateikta. Disertacijos tyrimų sritis yra žiniatinklio serverių kaupiamų duomenų gavyba bei duomenų pateikimo galutiniam naudotojui gerinimo būdai. Tam reikalingos žinios išgaunamos iš žiniatinklio serverio žurnalo įrašų, kuriuose fiksuojama informacija apie išsiųstus vartotojams žiniatinklio puslapius. Darbo tyrimų objektas yra žiniatinklio įrašų gavyba, o su šiuo objektu susiję dalykai: žiniatinklio duomenų paruošimo etapų tobulinimas, žiniatinklio tekstų analizė, duomenų analizės algoritmai prognozavimo ir klasifikavimo uždaviniams spręsti. Pagrindinis disertacijos tikslas – perprasti svetainių naudotojų elgesio formas, tiriant žiniatinklio įrašus, tobulinti paruošimo, analizės ir rezultatų interpretavimo etapų metodologijas. Darbo tyrimai atskleidė naujas žiniatinklio duomenų analizės galimybes. Išsiaiškinta, kad internetinių duomenų – žiniatinklio įrašų švarinimui buvo skirtas nepakankamas dėmesys. Parodyta, kad sumažinus nereikšmingų įrašų kiekį, duomenų analizės procesas tampa efektyvesnis. Todėl buvo sukurtas naujas metodas, kurį pritaikius žinių pateikimas atitinka tikruosius vartotojų maršrutus. Tyrimo metu nustatyta, kad naudotojų naršymo istorija yra skirtingų ilgių, todėl atlikus specifinį duomenų paruošimą – suformavus fiksuoto ilgio vektorius, tikslinga taikyti iki šiol nenaudotus praktikoje sprendimų medžių algoritmus... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

22

Smith, Eugene Herbie. "An analytical framework for monitoring and optimizing bank branch network efficiency / E.H. Smith." Thesis, North-West University, 2009. http://hdl.handle.net/10394/5029.

Full text

Abstract:

Financial institutions make use of a variety of delivery channels for servicing their customers. The primary channel utilised as a means of acquiring new customers and increasing market share is through the retail branch network. The 1990s saw the Internet explosion and with it a threat to branches. The relatively low cost associated with virtual delivery channels made it inevitable for financial institutions to direct their focus towards such new and more cost efficient technologies. By the beginning of the 21st century -and with increasing limitations identified in alternative virtual delivery channels, the financial industry returned to a more balanced view which may be seen as the revival of branch networks. The main purpose of this study is to provide a roadmap for financial institutions in managing their branch network. A three step methodology, representative of data mining and management science techniques, will be used to explain relative branch efficiency. The methodology consists of clustering analysis (CA), data envelopment analysis (DEA) and decision tree induction (DTI). CA is applied to data internal to the financial institution for increasing' the discriminatory power of DEA. DEA is used to calculate the relevant operating efficiencies of branches deemed homogeneous during CA. Finally, DTI is used to interpret the DEA results and additional data describing the market environment the branch operates in, as well as inquiring into the nature of the relative efficiency of the branch.
Thesis (M.Com. (Computer Science))--North-West University, Potchefstroom Campus, 2010.

APA, Harvard, Vancouver, ISO, and other styles

23

Kuratomi, Alejandro. "GNSS Position Error Estimated by Machine Learning Techniques with Environmental Information Input." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-262692.

Full text

Abstract:

In Intelligent Transport Systems (ITS), specifically in autonomous driving operations, accurate vehicle localization is essential for safe operations. The localization accuracy depends on both position and positioning error estimates. Technologies aiming to improve positioning error estimation are required and are currently being researched. This project has investigated machine learning algorithms applied to positioning error estimation by assessing relevant information obtained from a GNSS receiver and adding environmental information coming from a camera mounted on a radio controlled vehicle testing platform. The research was done in two stages. The first stage consists of the machine learning algorithms training and testing on existing GNSS data coming from Waysure´s data base from tests ran in 2016, which did not consider the environment surrounding the GNSS receiver used during the tests. The second stage consists of the machine learning algorithms training and testing on GNSS data coming from new test runs carried on May 2019, which include the environment surrounding the GNSS receiver used. The results of both stages are compared. The relevant features are obtained as a result of the machine learning decision trees algorithm and are presented. This report concludes that there is no statistical evidence indicating that the tested environmental input from the camera could improve positioning error estimation accuracy with the built machine learning models.
Inom Intelligenta transportsystem (ITS), specifikt för självkörande fordon, så är en exakt fordonspositionering en nödvändighet för ökad trafiksäkerhet. Positionsnoggrannheten beror på estimering av både positionen samt positionsfelet. Olika tekniker och tillämpningar som siktar på att förbättra positionsfeluppskattningen behövs, vilket det nu forskas kring. Denna uppsats undersöker olika maskininlärningsalgoritmer inriktade på estimering av positionsfel. Algoritmerna utvärderar relevant information från en GNSS-mottagare, samt information från en kamera om den kringliggande miljön. En GNSS-mottagare och kamera monterades på en radiostyrd mobil testplattform för insamling av data. Examensarbetet består av två delar. Första delen innehåller träning och testning av valda maskininlärningsalgoritmer med GNSS-data tillhandahållen av Waysure från tester gjorda under 2016. Denna data inkluderar ingen information från den omkringliggande miljön runt GNSS-mottagaren. Andra delen består av träning och testning av valda maskininlärningsalgoritmer på GNSS-data som kommer från nya tester gjorda under maj 2019, vilka inkluderar miljöinformation runt GNSS-mottagaren. Resultaten från båda delar analyseras. De viktigaste egenskaper som erhålls från en trädbaserad modell, algoritmens beslutsträd, presenteras. Slutsatsen från denna rapport är att det inte går att statistiskt säkerställa att inkludering av information från den omkringliggande miljön från en kamera förbättrar noggrannheten vid estimering av positionsfelet med de valda maskininlärningsmodellerna.

APA, Harvard, Vancouver, ISO, and other styles

24

Peroutka, Lukáš. "Návrh a implementace Data Mining modelu v technologii MS SQL Server." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-199081.

Full text

Abstract:

This thesis focuses on design and implementation of a data mining solution with real-world data. The task is analysed, processed and its results evaluated. The mined data set contains study records of students from University of Economics, Prague (VŠE) over the course of past three years. First part of the thesis focuses on theory of data mining, definition of the term, history and development of this particular field. Current best practices and meth-odology are described, as well as methods for determining the quality of data and methods for data pre-processing ahead of the actual data mining task. The most common data mining techniques are introduced, including their basic concepts, advantages and disadvantages. The theoretical basis is then used to implement a concrete data mining solution with educational data. The source data set is described, analysed and some of the data are chosen as input for created models. The solution is based on MS SQL Server data mining platform and it's goal is to find, describe and analyse potential as-sociations and dependencies in data. Results of respective models are evaluated, including their potential added value. Also mentioned are possible extensions and suggestions for further development of the solution.

APA, Harvard, Vancouver, ISO, and other styles

25

Girard, Nathalie. "Vers une approche hybride mêlant arbre de classification et treillis de Galois pour de l'indexation d'images." Thesis, La Rochelle, 2013. http://www.theses.fr/2013LAROS402/document.

Full text

Abstract:

La classification d'images s'articule généralement autour des deux étapes que sont l'étape d'extraction de signatures suivie de l'étape d'analyse des données extraites, ces dernières étant généralement quantitatives. De nombreux modèles de classification ont été proposés dans la littérature, le choix du modèle le plus adapté est souvent guidé par les performances en classification ainsi que la lisibilité du modèle. L'arbre de classification et le treillis de Galois sont deux modèles symboliques connus pour leur lisibilité. Dans sa thèse [Guillas 2007], Guillas a utilisé efficacement les treillis de Galois pour la classification d'images, et des liens structurels forts avec les arbres de classification ont été mis en évidence. Les travaux présentés dans ce manuscrit font suite à ces résultats, et ont pour but de définir un modèle hybride entre ces deux modèles, qui réunissent leurs avantages (leur lisibilité respective, la robustesse du treillis et le faible espace mémoire de l'arbre). A ces fins, l'étude des liens existants entre les deux modèles a permis de mettre en avant leurs différences. Tout d'abord, le type de discrétisation, les arbres utilisent généralement une discrétisation locale tandis que les treillis, initialement définis pour des données binaires, utilisent une discrétisation globale. A partir d'une étude des propriétés des treillis dichotomiques (treillis définis après une discrétisation), nous proposons une discrétisation locale pour les treillis permettant d'améliorer ses performances en classification et de diminuer sa complexité structurelle. Puis, le processus de post-élagage mis en œuvre dans la plupart des arbres a pour objectif de diminuer la complexité de ces derniers, mais aussi d'augmenter leurs performances en généralisation. Les simplifications de la structure de treillis (exponentielle en la taille de données dans les pires cas), quant à elles, sont motivées uniquement par une diminution de la complexité structurelle. En combinant ces deux simplifications, nous proposons une simplification de la structure du treillis obtenue après notre discrétisation locale et aboutissant à un modèle de classification hybride qui profite de la lisibilité des deux modèles tout en étant moins complexe que le treillis mais aussi performant que celui-ci
Image classification is generally based on two steps namely the extraction of the image signature, followed by the extracted data analysis. Image signature is generally numerical. Many classification models have been proposed in the literature, among which most suitable choice is often guided by the classification performance and the model readability. Decision trees and Galois lattices are two symbolic models known for their readability. In her thesis {Guillas 2007}, Guillas efficiently used Galois lattices for image classification. Strong structural links between decision trees and Galois lattices have been highlighted. Accordingly, we are interested in comparing models in order to design a hybrid model between those two. The hybrid model will combine the advantages (robustness of the lattice, low memory space of the tree and readability of both). For this purpose, we study the links between the two models to highlight their differences. Firstly, the discretization type where decision trees generally use a local discretization while Galois lattices, originally defined for binary data, use a global discretization. From the study of the properties of dichotomic lattice (specific lattice defined after discretization), we propose a local discretization for lattice that allows us to improve its classification performances and reduces its structural complexity. Then, the process of post-pruning implemented in most of the decision trees aims to reduce the complexity of the latter, but also to improve their classification performances. Lattice filtering is solely motivated by a decrease in the structural complexity of the structures (exponential in the size of data in the worst case). By combining these two processes, we propose a simplification of the lattice structure constructed after our local discretization. This simplification leads to a hybrid classification model that takes advantage of both decision trees and Galois lattice. It is as readable as the last two, while being less complex than the lattice but also efficient

APA, Harvard, Vancouver, ISO, and other styles

26

Roussel, Mylène. "Analyse et interprétation d'images appliquées aux algues microscopiques." Compiègne, 1993. http://www.theses.fr/1993COMP560S.

Full text

Abstract:

L'étude du séquencement de méthodes permettant de classifier et d'identifier des objets à partir d'images brutes est peu usitée. On s'intéresse ici au cas des algues microscopiques présentes dans l'eau douce. Les images brutes sont issues d'un microscope optique. Elles présentent les caractéristiques suivantes : dynamique en contraste faible, fonds non homogènes, présence d'artefacts. . . Le séquencement de méthodes s'articule autour de deux parties au travers d'une méthodologie de résolution permettant d'élargir le champ d'application des processus au cas plus général d'objets biologiques. La première partie porte sur les traitements bas-niveau de segmentation, de localisation des objets, de suivis des contours et de reconstruction, afin d'obtenir des contours représentatifs des objets rendant possible l'extraction de caractéristiques. La deuxième partie traite de l'extraction de caractéristiques pertinentes, de la sélection basée sur une méthode d'analyse en composantes principales de caractéristiques discriminantes, permettant d'entreprendre avec succès une classification. L'ensemble de données limité nous porte vers le choix d'une méthode de classification basée sur les arbres de décision binaires

APA, Harvard, Vancouver, ISO, and other styles

27

Ziani, Abdellatif. "Etude et réalisation d'un système d'analyse automatique du sommeil." Rouen, 1989. http://www.theses.fr/1989ROUES028.

Full text

Abstract:

Le système d'enregistrement et de lecture Medilog 9000 permet d'étudier le sommeil à domicile. Ce travail a pour but d'effectuer une analyse automatique des signaux à la sortie de l'appareil. Le traitement du signal est effectué en deux étapes: d'abord extraire les paramètres les plus informationnels en utilisant une technique de reconnaissance de formes et, plus particulièrement, une reconnaissance syntaxique - puis la quantification automatique des stades de sommeil en utilisant la programmation par arbre de décision

APA, Harvard, Vancouver, ISO, and other styles

28

Teng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.

Full text

Abstract:

S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.

APA, Harvard, Vancouver, ISO, and other styles

29

Syu, Hong-Cheng, and 許宏誠. "A Study on Applying Decision Tree Technique to Motorcycle Accidents." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/41582493817344070813.

Full text

Abstract:

碩士
國立臺灣大學
工業工程學研究所
105
With the development of the economy and improvement of people’s living, Transportation has become more popularity and convenient. In recent years, the traffic accident rate has increased sharply. In Taipei, there are more than 13,000 traffic accidents every year, with the motor vehicle accident being one of the highest traffic proportion among all the car types. There are different factors of traffic accidents, including time, environment, weather…etc. Each traffic accident may cause different injure level. According to the traffic data from the Department of Transportation in Taipei City, there are 35 categories from traffic record including environmental factors, transport facilities and personal information. And it’s hard to analysize all the factors simultaneously. To better understand the traffic factors and prevent citizens from traffic accidents. This study will preprocessing the traffic accident data from 2008 to 2013 and use decision tree (CHAID) to analyize the traffic data, figure out the main factors of motor vehicle accidents in Taipei capital, and provide valuable information for the Department of Transportation in Taipei City to support their decision-making, transportation planning, reduce the traffic accidents.

APA, Harvard, Vancouver, ISO, and other styles

30

Hu, Wan-Neng, and 胡萬能. "Research on Selecting Cases of Business Tax by Applying Decision Tree Technique." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/22640939803024615444.

Full text

Abstract:

碩士
元智大學
資訊管理學系
99
Tax revenue is the major income gained by the government. Business tax income, however, is the major source of total tax revenue. Consequently, it is the first priority for taxation agency to curb business tax evasion effectively. As the business tax fillings has been transformed and stored in the central database since 1999, it is desirable and visible to apply information technologies for detecting tax evasion. This study applied data mining techniques, C&R Tree, C5.0 and CHAID, to find out the optimal patterns and models in the case selection for further intervention and investigation by humans. After carrying out the empirical research, in terms of the accuracy, the study recommended that C5.0 is much better than other data mining techniques used in this study. It seems that taxation agency will improve the efficiency of the manpower if data mining techniques are introduced into the tax evasion detection and investigation processes.

APA, Harvard, Vancouver, ISO, and other styles

31

Lin, Yang-Tze, and 林楊澤. "Hiding Quasi-identifier on Decision Tree utilizing Swapping Technique for Preserving Privacy." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/40510420253466026628.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
97
Classification is an important issue in data mining, and decision tree is one of the most popular techniques for classification analysis. Some data sources contain private personal information that people are unwilling to reveal. The disclosure of person-specific data is possible to endanger thousands of people, and therefore the dataset should be protected before it is released for mining. However, techniques to hide private information usually modify the original dataset without considering influences on the prediction accuracy of a classification model. In this research, we propose an algorithm to protect personal privacy for classification model based on decision tree. Our goal is to hide all person-specific information with minimized data perturbation. Furthermore, the prediction capability of the decision tree classifier can be maintained. As demonstrated in the experiments, the proposed algorithm can successfully hide private information with fewer disturbances of the classifier.

APA, Harvard, Vancouver, ISO, and other styles

32

YADAV, MAYANK. "USE OF ENSEMBLE LEARNERS TO PREDICT NUMBER OF DEFECTS IN A SOFTWARE." Thesis, 2023. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19838.

Full text

Abstract:

Presently Fault detection is crucial in industry. Early discovery of faults may aid in the prevention of subsequent abnormal events. Fault detection can be achieved in a variety of ways. This research will go through the fundamental approaches. At this moment, methods for finding flaws faster than the customary time restriction are necessary. Detection methods include data and signal approaches, process model-based methods, and knowledge-based methods. Some treatments need very precise models. Early issue discovery increases life expectancy, enhances safety, and lowers maintenance costs. When choosing a fault detection system, several factors must be considered. Principal Component Analysis can help find flaws in large-scale systems. Signal models are used when difficulties arise as a result of process changes. This research includes a systematic review from the literature, along with a selection of noteworthy applications. In this research, we would want to go through different real-world scenarios that employ different defect detection methodologies. In other words, we will look at both hardware and software concerns. The first case considers fault detection, and a decision tree technique is utilized to detect these defective lines. The algorithm is designed to categorize as defective or non-faulty whenever possible. In second scenario, to discover faults in each dataset, we shall employ the "ensemble learning" learning technique. We will be working on the datasets. During testing activity, software shows occurrences of multiple defects. And, that too capable of causing instant failures; thereby decreasing the software’s capability.

APA, Harvard, Vancouver, ISO, and other styles

33

Lin, Ming-chieh, and 林明潔. "Application of Decision-tree Technique in Assessing Power Wheelchair for People with Disabilities." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/65107726207011353710.

Full text

Abstract:

碩士
國立臺灣科技大學
工業管理系
100
According to the statistical report in Dec. 2012 by Department of Statistics, Ministry of the Interior, Taiwan, majority are the people with physical disability with a percentage of thirty-five. Some mobility impaired disable people can move about and complete functional task via power wheelchair. There are vast variety of power wheelchairs in structure and function which available the market, thus users in great need of professional assessment to avoid improper use causing harmful accident and abandonment of mobility devices of body deformity. Physical and occupational therapists does not encounter assistive technology training until work force. They acquire tacit assistive technology assessment decision making through experience and trial and error. This study transform tacit to explicit knowledge and establish power wheelchair assessment decision making principles. Decision tree is one of common classification methods, it is simple, clear and easy to use. This study divide power wheelchair into five categories, (1)Power base (2)Human machine interface (3)Tilt and Recline system(4)Standing and Elevating system(5)Seating system, and using leaf and node to classify and code to develop five decision trees. Furthermore, test decision tree feasibility using power wheelchair data from physical impaired high school and university students (Ministry of Education). The five decision trees are established to assist therapists in finding the fittest power wheelchair for people with physical disability.

APA, Harvard, Vancouver, ISO, and other styles

34

LIU, YUN-SHOU, and 劉允守. "Harnessing the decision tree technique to the customer churn analysis for automobile repairs." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/m7ar85.

Full text

Abstract:

碩士
國立中正大學
企業管理學系碩士在職專班
106
The work of Taiwan’s automotive after-sales service is belonging to a technical service-oriented industry; this is a high-labor-intensive industry as well. Due to the industry’s uniqueness, the degree of automation is quite low. In this study, our research subject is the service maintenance of authorized operations for the domestic brand automobile manufacturers. Since the number of new cars for Taiwan’s new car sales is declining year by year, the industry of aftermarket service is more difficult to be operated so that the sales price of new cars is fiercely competitive. As such, the revenue and profit generated by the after-sales service for the company are declined; the operation management is obviously turning into more important. We need to pay attention to the retention and churn of customers, which are particularly important for the business operations. How to maintain the customer’s turnover rate is one of the most important management issues. The technical level of maintenance personnel and the consumption of customers experience in the maintenance of after-sales service has the inseparable relationship. In addition, we also are able to find out the potential factors of the churn of service personnel and customer experience through the maintenance history records. According to the argument above, this study employs the C5.0 decision tree in data mining to find relevant factors and build models through building decision trees to address the issue − reducing customer churn.

APA, Harvard, Vancouver, ISO, and other styles

35

Chen, Wei-Ting, and 陳韋廷. "Applying Decision Tree Data Mining Technique to Track the Concept Drift of Porn Web Filtering." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/90241852350617504678.

Full text

Abstract:

碩士
國立東華大學
資訊管理碩士學位學程
99
With the development of the Internet, the proliferation of Internet pornography affects physical and mental development of young people seriously. How to filter porn web pages effectively becomes an issue worth exploring. The study proposed the filtering mechanism method: the porn web filter using a decision tree-based approach to tracking concept drift and the weighted sliding window to calculate concept drift weight score, which helps determine the decision tree concept drift porn web. Filtering mechanism is divided into training and implementation phase. In training phase, we extracted the features of porn web and gave each rule score. In implementation phase, we extracted the features of unknown porn web and scored this web. In this study, a higher frequency of a particular time will drift keyword in keyword library and give a higher weight partition. To take this approach allows the filter to adapt to real-world web page concept drift and improve the recognition accuracy of porn web pages. In this study, the filter can adapt the dynamic web environment that can improve traditional machine learning classification method. The results of experiment, the use of porn web pages with the keyword database and decision tree techniques concepts drift method, is accurate classification rate of 97.06%. This accuracy is not lower than other machine learning recognizing porn web pages using experimental methods.

APA, Harvard, Vancouver, ISO, and other styles

36

Chou, Yu-Lin, and 周佑霖. "A Technique for Speaker Independent Automatic Speech Recognition Based on Decision Tree State Tying with GCVHMM." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/42812214292865347038.

Full text

Abstract:

碩士
國立交通大學
電機與控制工程系
90
This paper proposed a new speech recognition technique for continuous speech-independent recognition of spoken Mandarin digits. One popular tool for solving such a problem is the HMM-based one-state algorithm, which is a connected word pattern matching method. However, two problems existing in this conventional method prevent it from practical use on our target problem. One is the lack of a proper selection mechanism for robust acoustic models for speaker-independent recognition. The other is the information of intersyllable co-articulatory effect in the acoustic model is contained or not. At first, a generalized common-vector (GCV) approach is developed based on the eigenanalysis of covariance matrix to extract an invariant feature over different speakers as well as the acoustical environment effects and the phase or temporal difference. The GCV scheme is then integrated into the conventional HMM to form the new GCV-based HMM, called GCVHMM, which is good at speaker-independent recognition. For the second problem, context-dependent model is done in order to account for the co-articulatory effects of neighboring phones. It is important because the co-articulatory effect for continuous speech is significantly stronger than that for isolated utterances. However, there must be numerous context-dependent models generated because of modeling the variations of sounds and pronunciations. Furthermore, if the parameters in those models are all distinct, the total number of model parameters would be very huge. To solve the problems above, the decision tree state tying technique is used to reduce the number of parameter, hence reduce the computation complexity. In our experiments on the recognition of speaker-independent continuous speech sentences, the proposed scheme is shown to increase the average recognition rate of the conventional HMM-based one-state algorithm by over 26.039% without using any grammar or lexical information.

APA, Harvard, Vancouver, ISO, and other styles

37

(12214559), Sonal Chawda. "Determination of distance relay characteristics using an inductive learning system." Thesis, 1993. https://figshare.com/articles/thesis/Determination_of_distance_relay_characteristics_using_an_inductive_learning_system/19326599.

Full text

Abstract:

In this research an attempt has been made to design distance relays as per protection system requirements. This is achieved by using an Inductive learning technique. The inductive learning algorithm which belongs to the family of machine learning by examples is used to convert a set of impedance values into a decision tree. The impedance values are obtained by conducting fault study on the system to be protected. A number of tests have been carried out on various transmission line configurations. The required software for generating the

decision tree has been developed.

APA, Harvard, Vancouver, ISO, and other styles

38

Yen, Ya-Ru, and 顏雅茹. "Automatic Analysis of Name Card Contents by Image Processing and Decision-Tree Classification Techniques." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/58750596392335149930.

Full text

Abstract:

碩士
國立交通大學
資訊科學系
91
A system for automatic analysis of various contents of name card images using decision-tree classification techniques is proposed. Five major phases of name card content analysis are identified, including basic block extraction, logo extraction, card type classification, text line classification, and card image reconstruction. In the phase of basic block extraction, edge detection and region-growing techniques are applied to extract basic blocks in name card images. Then, a moment-preserving thresholding technique is used to reduce the colors in each basic block. In the phase of logo extraction, several effective features are proposed to classify extracted blocks into logo blocks and text blocks. In the phase of card type classification, the width/height radios of text blocks are used to classify card types into Chinese and English. In the phase of text line type classification for Chinese name cards, nine types of text lines are recognized, including name line, title line, e-mail line, web address line, mobile phone number line, fax number line, phone number line, government publications number line, and address line. And text line types in English name cards identical to those in Chinese name cards except the government publications number line are also recognized. Adaptive decision-tree methods for classifying these text line types both in Chinese and in English name cards are proposed. Finally, a suitable compression method is proposed to reduce the data volumes of the recognized name card contents to save storage space and display time. Good experimental results reveal the feasibility of the proposed methods.

APA, Harvard, Vancouver, ISO, and other styles

39

Wang, Peiwen, and 王珮紋. "Using Data Mining Technique To Build Cash Prediction：An Application Of Decision Trees." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/87823544665254884322.

Full text

Abstract:

碩士
國立中正大學
會計與資訊科技研究所
100
Cash is very important property for enterprises, but it pays less attention rather than all the assets in the enterprises.The enterprises choose to hold some cash in spite of assets have higher reward after investment. According to the statistics of previous study, especially high-tech electronics industry always has high cash holdings.The high-tech electronics industry spent huge expenses.That means the company may incur the situation of insufficient funds. It is necessary to prepare a certain amount of cash. This paper uses setpwise regression analysis to find suitable variables for cash holdings of high-tech electronics industry in Taiwan. The selected ratios include the cash dividend payout、rate of research costs、leverage、liability、operating cash flow、investment cash flow、financing cash flow、ratio of operating cash flow, ratio of cash flow, size of the company. Using decision tree methods (AD Tree、Decision stump、 J48、NB Tree、LMT、Random Forest、Random Tree、REP Tree、Simple CART) to predict the accurate rate after classification by decision tree methods.This study have three experiments, namely: (1) the predictive ability of the decision tree algorithm; (2) of the decision tree algorithm with performance improvement algorithm; (3) choose the best decision tree forecast rate comparison with the logistic regression model. In three experiments, the Random Forest is the highest and better rate than the prediction of the logistic regression model.

APA, Harvard, Vancouver, ISO, and other styles

40

Kuo, Min-Hsien, and 郭敏賢. "A Study of Applying Decision Trees Techniques to Undergraduate Major Selection." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/56117109244459743652.

Full text

Abstract:

碩士
中國文化大學
資訊管理學系
100
In recent years, a number of institutions in Taiwan have implemented the policy of “choice of major at the upper level of college” or “Degree Program”. It has become a trend that most of the universities enroll students regardless of department. However, due to lack of motivation of career exploration, and poor recognition of the declaration of major at sophomore or junior year, the major and department finally students chose often dose not exactly match their needs. Therefore, the purpose of this study is to base on students' learning motivation, and then analyze the gap between the motivation and personal personality. After that, an appropriate major is suggested according to students' motivation, they can follow it to choose their major through self-recognition exploration. In this way, chance of choosing a wrong major or occupation will be greatly reduced. Students are able to learn with correct direction of career development. This research's research base is advanced study class students on Chinese Culture University.By the analysis in this research, it is hoped that uncertainty of students selecting their major can be improved. With the help of data mining model, students' learning motivation can be increased and students will get assistance to find out the best learning direction by reviewing the suggestion. Students will find they then understand more about their personality and learning direction. In summary, result and implementation of this research will assist students precisely make their choice for their future life and exploring themselves. On the other hand, academic staff or teachers can have a solid base to guide students career planning.

APA, Harvard, Vancouver, ISO, and other styles

41

Yu, Hao, and 余豪. "Fault diagnosis of an automotive starter motor using a decision tree and neural network techniques." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/96496501892465953541.

Full text

Abstract:

碩士
國立彰化師範大學
車輛科技研究所
99
This study proposes an automotive starter motor fault diagnosis system using component analysis and fault conditions classification based on a decision tree and neural network. Traditionally, the fault diagnosis method depends on the technician's experience, but some faults might be judged inaccurately due to the experience of the technician making subjective decisions. The purpose of the start system in a vehicle is to rotate the crankshaft smoothly to start the engine. In the present study, a starter motor fault diagnosis system is proposed and developed for the classification of different fault conditions. The proposed system consists of feature extraction using principal component analysis (PCA) and Independent components analysis (ICA) to reduce the complexity of the feature vectors, together with classification using the decision tree and neural network techniques. In the output signal classification, three of the classification and regression trees (CART), Decision tree C4.5 and radial basis function networks (RBFN) are used to classify and compare the synthetic fault types in an experimental automotive starter motor platform. The experimental results indicate that the proposed fault diagnosis is effective and can be used for automotive starter motor of various fault operating conditions.

APA, Harvard, Vancouver, ISO, and other styles

42

"Techniques in data mining: decision trees classification and constraint-based itemsets mining." 2001. http://library.cuhk.edu.hk/record=b5890757.

Full text

Abstract:

Cheung, Yin-ling.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.
Includes bibliographical references (leaves 117-124).
Abstracts in English and Chinese.
Abstract --- p.ii
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Data Mining Techniques --- p.1
Chapter 1.1.1 --- Classification --- p.1
Chapter 1.1.2 --- Association Rules Mining --- p.2
Chapter 1.1.3 --- Estimation --- p.2
Chapter 1.1.4 --- Prediction --- p.2
Chapter 1.1.5 --- Clustering --- p.2
Chapter 1.1.6 --- Description --- p.3
Chapter 1.2 --- Problem Definition --- p.3
Chapter 1.3 --- Thesis Organization --- p.4
Chapter I --- Decision Tree Classifiers --- p.6
Chapter 2 --- Background --- p.7
Chapter 2.1 --- Introduction to Classification --- p.7
Chapter 2.2 --- Classification Using Decision Trees --- p.8
Chapter 2.2.1 --- Constructing a Decision Tree --- p.10
Chapter 2.2.2 --- Related Work --- p.11
Chapter 3 --- Strategies to Enhance the Performance in Building Decision Trees --- p.14
Chapter 3.1 --- Introduction --- p.15
Chapter 3.1.1 --- Related Work --- p.15
Chapter 3.1.2 --- Post-evaluation vs Pre-evaluation of Splitting Points --- p.19
Chapter 3.2 --- Schemes to Construct Decision Trees --- p.27
Chapter 3.2.1 --- One-to-many Hashing --- p.27
Chapter 3.2.2 --- Many-to-one and Horizontal Hashing --- p.28
Chapter 3.2.3 --- A Scheme using Paired Attribute Lists --- p.29
Chapter 3.2.4 --- A Scheme using Database Replication --- p.31
Chapter 3.3 --- Performance Analysis --- p.32
Chapter 3.4 --- Experimental Results --- p.38
Chapter 3.4.1 --- Performance --- p.38
Chapter 3.4.2 --- Test 1 : Smaller Decision Tree --- p.40
Chapter 3.4.3 --- Test 2: Bigger Decision Tree --- p.44
Chapter 3.5 --- Conclusion --- p.47
Chapter II --- Mining Association Rules --- p.48
Chapter 4 --- Background --- p.49
Chapter 4.1 --- Definition --- p.49
Chapter 4.2 --- Association Algorithms --- p.51
Chapter 4.2.1 --- Apriori-gen --- p.51
Chapter 4.2.2 --- Partition --- p.53
Chapter 4.2.3 --- DIC --- p.54
Chapter 4.2.4 --- FP-tree --- p.54
Chapter 4.2.5 --- Vertical Data Mining --- p.58
Chapter 4.3 --- Taxonomies of Association Rules --- p.58
Chapter 4.3.1 --- Multi-level Association Rules --- p.58
Chapter 4.3.2 --- Multi-dimensional Association Rules --- p.59
Chapter 4.3.3 --- Quantitative Association Rules --- p.59
Chapter 4.3.4 --- Random Sampling --- p.60
Chapter 4.3.5 --- Constraint-based Association Rules --- p.60
Chapter 5 --- Mining Association Rules without Support Thresholds --- p.62
Chapter 5.1 --- Introduction --- p.63
Chapter 5.1.1 --- Itemset-Loop --- p.66
Chapter 5.2 --- New Approaches --- p.67
Chapter 5.2.1 --- "A Build-Once and Mine-Once Approach, BOMO" --- p.68
Chapter 5.2.2 --- "A Loop-back Approach, LOOPBACK" --- p.74
Chapter 5.2.3 --- "A Build-Once and Loop-Back Approach, BOLB" --- p.77
Chapter 5.2.4 --- Discussion --- p.77
Chapter 5.3 --- Generalization: Varying Thresholds Nk for k-itemsets --- p.78
Chapter 5.4 --- Performance Evaluation --- p.78
Chapter 5.4.1 --- Generalization: Varying Nk for k-itemsets --- p.84
Chapter 5.4.2 --- Non-optimal Thresholds --- p.84
Chapter 5.4.3 --- "Different Decrease Factors,f" --- p.85
Chapter 5.5 --- Conclusion --- p.87
Chapter 6 --- Mining Interesting Itemsets with Item Constraints --- p.88
Chapter 6.1 --- Introduction --- p.88
Chapter 6.2 --- Proposed Algorithms --- p.91
Chapter 6.2.1 --- Single FP-tree Approach --- p.92
Chapter 6.2.2 --- Double FP-trees Approaches --- p.93
Chapter 6.3 --- Maximum Support Thresholds --- p.102
Chapter 6.4 --- Performance Evaluation --- p.103
Chapter 6.5 --- Conclusion --- p.109
Chapter 7 --- Conclusion --- p.110
Chapter A --- Probabilistic Analysis of Hashing Schemes --- p.112
Chapter B --- Hash Functions --- p.114
Bibliography --- p.117

APA, Harvard, Vancouver, ISO, and other styles

43

MAO, HUI-WEN, and 毛慧雯. "Apply Data Mining Techniques to a Telecom for VIP and Churn Customers Prediction using Decision Tree." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/55133181218047318075.

Full text

Abstract:

碩士
輔仁大學
資訊工程學系
96
In 2006 years, as the telecom industry more and fiercer. We lost very much revenue. We analyze various viewpoints and take two conclusions: One is that we fall out of more and more original customers. Another is that the customer’s contribution decreases. These two reasons result from that the customers have no truehearted attitude toward telcom industry. The quarter of customer amount run away to other telecom providers or break off contract. CRM (Customer Relationship Management) is very important because the customers understand their request and they know how to choose different product before they make decision. CRM raise the interaction between the customers and our services. More detailed we understand the customers’ requirement, more suitable is our product that we design for specific customers. Then we can promote our product to the vast market. This thesis proposed a customer prediction mechanism. Two core concepts are integrated into our research: prediction and customer relationship. There are many relations between customers and providers. For VIP customers we need to enhance the VIP customers’ interaction and for implicit-loss customers we need to struggle to increase the confidence to our product. Our research can reach the following purposes: 1. Our mechanism can predict if the customer is VIP or implicit-lost. 2. We can know if the customer is excellent or bad quality. Customer attribute can help us to analyze the customer’s behavior. Our research uses C4.5 decision-tree solution to classify the customer rank by analyzing the customers’ attributes and finding some rules then to finding our want to get customers Keywords: Decision Tree、Prediction、VIP、Churn、Data Mining

APA, Harvard, Vancouver, ISO, and other styles

44

Wang, Po-chun, and 王泊鈞. "Combining Image Processing Techniques with Decision Tree Theory to Study the Vocal Fold Diseases Identification System." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/31203969053163705830.

Full text

Abstract:

碩士
國立臺灣科技大學
自動化及控制研究所
100
Larynx is the main breathing channel and vocal mechanism. Clinically, otolaryngologists use strobo-laryngoscopes to observe the movements of vocal fold and diagnose vocal fold disorders. As the current diagnostic method is to select images on the computer screen manually, this study attempted to design a set of automatic vocal fold diseases identification system. Using the films taken by doctors as the samples for experimental analysis, this study used image processing techniques to capture the images of the vocal fold opening to the maximum position and closing to the minimum position in order to replace the manual image selection process and enhance diagnostic efficiency. As the filming process may involve human factors that cause blurred images and non-vocal fold image, this study included texture analysis to measure the image smoothness and entropy, in order to develop a set of selection and elimination system that can effectively enhance the accuracy of the capture images. Moreover, for the images of the vocal fold opening to the maximum position, image processing was used to automatically analyze the glottis images and vibration position of the vocal fold, in order to obtain physiological parameters and plot the mucosa fluctuation diagram as the references for vocal fold health promotion. The vocal fold diseases identification system can be used to obtain the physiological parameters for normal, vocal paralysis, and vocal nodules. Decision tree method was used as a classier to categorize the vocal fold diseases. The identification accuracy was proven to be 92.6%, and it could be improved to 98.7% after combining image processing. Finally, the study measures texture feature and establishes a statistic table in the area of lesions between vocal cancer and vocal polyp. This system can serve as a reference for clinical use.

APA, Harvard, Vancouver, ISO, and other styles

45

Liao, Po-Sen, and 廖柏森. "Applying DEA and data reduction techniques to building decision trees for mutual fund selection." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/89116243577463232886.

Full text

Abstract:

碩士
崑山科技大學
資訊管理研究所
99
Mutual funds are popular in the matured financial market of Taiwan. Many investors use it as a financial instrument because mutual funds management is not complicated and liquidity preference. However, with imposed risk, selecting a good mutual fund is always an important issue. As the financial market volatility is high, investors must avoid the risk through the investment portfolio to improve the stability. This study used a three-stage of data reduction combined with data envelopment analysis (DEA) to build decision tree models to help investors gain profits. Using data preprocessing for sample classification is the first stage, and then DEA is used to evaluate mutual funds performance with a single criterion, namely, the efficiency of a decision making unit.. The third stage of data reduction is proceeded using three data reduction methods, that is, PCA, wrap and common factors. All reduction data set are used to build decision tree models and compared with that of original raw data. The study finds that in the bull market size of data set will affect the accuracy of the decision tree model. The decision tree model built with original raw data set has the highest accuracy. For the bear market wrapper select less attributes in the five equal interval discretion case. The accuracy of tree model derived from wrapper is higher than those from the other two methods, even outperforms the TAL method. It means a decision tree with high accuracy is possible with a reduced data set through the proposed data reduction techniques.

APA, Harvard, Vancouver, ISO, and other styles

46

LIAO, HUI-CHEN, and 廖慧臻. "Applications of Decision Tree Techniques to Predict College Students’ Career Directions: A Study in the Department of Information Management." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/58t57r.

Full text

Abstract:

碩士
大葉大學
資訊管理學系碩士班
105
Due to information technology driving the society, industry sectors demand for information management professionals, however, many students struggle to choose their career directions. Past studies revealed factors affecting college students' career choices such as academic achievement, student club experiences, internships, gender, still not being adopted in a holistic model. The purpose of this study was to explore the factors affecting career choices for information management students in colleges. Based on data collected from MIS department students at a university in central Taiwan, decision tree algorithms, namely CART, CHAID and C5.0, were employed to test a predictive model for students’ career choice. Results showed that C5.0 performed better than other decision trees. According to the C5.0 classification, student’s gender, study interests, academic achievement, personality, vocational interests, parental education, parental occupation, family socio-economic status, perceived the occupational expectation of parents, student club experiences, part-time works and internship experiences were predictable to information management careers. Among them, the most significant predictor was student club experiences, followed by vocational interests. For those MIS students ever joined academic or autonomy student clubs or had social, enterprising, or conventional vocational interest types preferred to enterprise information management posts. If MIS students joined to recreational or autonomy types of student clubs or had investigative, enterprising, or conventional types of vocational interests preferred to network management posts. If MIS students joined to service or autonomy types of student clubs preferred to information support and services posts. If MIS students joined to academic, recreational, or sporting types of student clubs or had conventional type of vocational interest preferred to digital content and communication posts. If MIS students joined to recreational or fellowship types of student clubs or had realistic or investigative types of vocational interests preferred to software development and programming posts. If MIS students joined to recreational or fellowship types of student clubs or had social or enterprising types of vocational interests preferred to e-commerce and marketing posts. It is suggested that MIS students should participate more student club activities, and career consultants consider students’ club experiences when assist them to choose career directions.

APA, Harvard, Vancouver, ISO, and other styles

47

Lu, Wen-Chi, and 呂文吉. "Applying Decision Tree Of Data Mining Techniques for Device Repair and Maintenance in a Hospital - Northern in a regional Hospital as an Example." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/ydx629.

Full text

Abstract:

碩士
元培醫事科技大學
資訊管理系數位創新管理碩士班
104
In the tendency of the day, integrated medical service for improving the professional and health quality is a global trend in the future. Currently, the quality of Taiwan's medical service is in the top level in the world. Following the medical hardware system keep on upgrading, the IT system in hospital is getting important. No matter connect the information integrated system or the direct/indirect hardware instruments, we all need to rely on the IT department to plan and setup the related services. This study collect the real applied repair data in a regional hospital. Analyze the repair record data from IT department and using the data mining decision tree technology to collect total 942 records of IT system abnormal information from January to December in 2015. Start to do the decision analysis based on the repair categories, and use the C4.5 decision segment to find out usually failure hardware items. And then provide the repair suggestions in the future. We hope we can decrease the failure rate and improve the medical quality based on reducing the repair frequency and time. Our final goal is: A. Analyze the failure history of instrument and find out the corresponding solve methods. B. Using the decision tree to find out the high failure rate instrument and setup a solving model. C. Create the failure solving SOP (standard operation procedure) and upload into KM (knowledge management) platform for helping the users to solve problems easily. Focus on different hardware items and create different repair strategies to systemize/SOP the repair procedures.

APA, Harvard, Vancouver, ISO, and other styles

48

Juozenaite, Ineta. "Application of machine learning techniques for solving real world business problems : the case study - target marketing of insurance policies." Master's thesis, 2018. http://hdl.handle.net/10362/32410.

Full text

Abstract:

Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business Intelligence
The concept of machine learning has been around for decades, but now it is becoming more and more popular not only in the business, but everywhere else as well. It is because of increased amount of data, cheaper data storage, more powerful and affordable computational processing. The complexity of business environment leads companies to use data-driven decision making to work more efficiently. The most common machine learning methods, like Logistic Regression, Decision Tree, Artificial Neural Network and Support Vector Machine, with their applications are reviewed in this work. Insurance industry has one of the most competitive business environment and as a result, the use of machine learning techniques is growing in this industry. In this work, above mentioned machine learning methods are used to build predictive model for target marketing campaign of caravan insurance policies to achieve greater profitability. Information Gain and Chi-squared metrics, Regression Stepwise, R package “Boruta”, Spearman correlation analysis, distribution graphs by target variable, as well as basic statistics of all variables are used for feature selection. To solve this real-world business problem, the best final chosen predictive model is Multilayer Perceptron with backpropagation learning algorithm with 1 hidden layer and 12 hidden neurons.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'DECISION TREE TECHNIQUE'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles