Dissertations / Theses: 'Classification tree models'

1

Liu, Dan. "Tree-based Models for Longitudinal Data." Bowling Green State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1399972118.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Keller-Schmidt, Stephanie. "Stochastic Tree Models for Macroevolution." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-96504.

Full text

Abstract:

Phylogenetic trees capture the relationships between species and can be investigated by morphological and/or molecular data. When focusing on macroevolution, one considers the large-scale history of life with evolutionary changes affecting a single species of the entire clade leading to the enormous diversity of species obtained today. One major problem of biology is the explanation of this biodiversity. Therefore, one may ask which kind of macroevolutionary processes have given rise to observable tree shapes or patterns of species distribution which refers to the appearance of branching orders and time periods. Thus, with an increasing number of known species in the context of phylogenetic studies, testing hypotheses about evolution by analyzing the tree shape of the resulting phylogenetic trees became matter of particular interest. The attention of using those reconstructed phylogenies for studying evolutionary processes increased during the last decades. Many paleontologists (Raup et al., 1973; Gould et al., 1977; Gilinsky and Good, 1989; Nee, 2004) tried to describe such patterns of macroevolution by using models for growing trees. Those models describe stochastic processes to generate phylogenetic trees. Yule (1925) was the first who introduced such a model, the Equal Rate Markov (ERM) model, in the context of biological branching based on a continuous-time, uneven branching process. In the last decades, further dynamical models were proposed (Yule, 1925; Aldous, 1996; Nee, 2006; Rosen, 1978; Ford, 2005; Hernández-García et al., 2010) to address the investigation of tree shapes and hence, capture the rules of macroevolutionary forces. A common model, is the Aldous\\\' Branching (AB) model, which is known for generating trees with a similar structure of \\\"real\\\" trees. To infer those macroevolutionary forces structures, estimated trees are analyzed and compared to simulated trees generated by models. There are a few drawbacks on recent models such as a missing biological motivation or the generated tree shape does not fit well to one observed in empirical trees. The central aim of this thesis is the development and study of new biologically motivated approaches which might help to better understand or even discover biological forces which lead to the huge diversity of organisms. The first approach, called age model, can be defined as a stochastic procedure which describes the growth of binary trees by an iterative stochastic attachment of leaves, similar to the ERM model. At difference with the latter, the branching rate at each clade is no longer constant, but decreasing in time, i.e., with the age. Thus, species involved in recent speciation events have a tendency to speciate again. The second introduced model, is a branching process which mimics the evolution of species driven by innovations. The process involves a separation of time scales. Rare innovation events trigger rapid cascades of diversification where a feature combines with previously existing features. The model is called innovation model. Three data sets of estimated phylogenetic trees are used to analyze and compare the produced tree shape of the new growth models. A tree shape statistic considering a variety of imbalance measurements is performed. Results show that simulated trees of both growth models fit well to the tree shape observed in real trees. In a further study, a likelihood analysis is performed in order to rank models with respect to their ability to explain observed tree shapes. Results show that the likelihoods of the age model and the AB model are clearly correlated under the trees in the databases when considering small and medium-sized trees with up to 19 leaves. For a data set, representing of phylogenetic trees of protein families, the age model outperforms the AB model. But for another data set, representing phylogenetic trees of species, the AB model performs slightly better. To support this observation a further analysis using larger trees is necessary. But an exact computation of likelihoods for large trees implies a huge computational effort. Therefore, an efficient method for likelihood estimation is proposed and compared to the estimation using a naive sampling strategy. Nevertheless, both models describe the tree generation process in a way which is easy to interpret biologically. Another interesting field of research in biology is the coevolution between species. This is the interaction of species across groups such that the evolution of a species from one group can be triggered by a species from another group. Most prominent examples are systems of host species and their associated parasites. One problem is the reconciliation of the common history of both groups of species and to predict the associations between ancestral hosts and their parasites. To solve this problem some algorithmic methods have been developed in recent years. But only a few host parasite systems have been analyzed in sufficient detail which makes an evaluation of these methods complex. Within the scope of coevolution, the proposed age model is applied to the generation of cophylogenies to evaluate such host parasite reconciliation methods. The presented age model as well as the innovation model produce tree shapes which are similar to obtained tree structures of estimated trees. Both models describe an evolutionary dynamics and might provide a further opportunity to infer macroevolutionary processes which lead to the biodiversity which can be obtained today. Furthermore with the application of the age model in the context of coevolution by generating a useful benchmark set of cophylogenies is a first step towards systematic studies on evaluating reconciliation methods.

APA, Harvard, Vancouver, ISO, and other styles

3

Shafi, Ghufran. "Development of roadway link screening criteria for microscale carbon monoxide and particulate matter conformity analyses through application of classification tree model." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/28222.

Full text

Abstract:

Thesis (M. S.)--Civil and Environmental Engineering, Georgia Institute of Technology, 2008.
Committee Chair: Guensler, Randall; Committee Member: Rodgers, Michael; Committee Member: Russell, Armistead.

APA, Harvard, Vancouver, ISO, and other styles

4

Victors, Mason Lemoyne. "A Classification Tool for Predictive Data Analysis in Healthcare." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/5639.

Full text

Abstract:

Hidden Markov Models (HMMs) have seen widespread use in a variety of applications ranging from speech recognition to gene prediction. While developed over forty years ago, they remain a standard tool for sequential data analysis. More recently, Latent Dirichlet Allocation (LDA) was developed and soon gained widespread popularity as a powerful topic analysis tool for text corpora. We thoroughly develop LDA and a generalization of HMMs and demonstrate the conjunctive use of both methods in predictive data analysis for health care problems. While these two tools (LDA and HMM) have been used in conjunction previously, we use LDA in a new way to reduce the dimensionality involved in the training of HMMs. With both LDA and our extension of HMM, we train classifiers to predict development of Chronic Kidney Disease (CKD) in the near future.

APA, Harvard, Vancouver, ISO, and other styles

5

Shew, Cameron Hunter. "TRANSFERABILITY AND ROBUSTNESS OF PREDICTIVE MODELS TO PROACTIVELY ASSESS REAL-TIME FREEWAY CRASH RISK." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/863.

Full text

Abstract:

This thesis describes the development and evaluation of real-time crash risk assessment models for four freeway corridors, US-101 NB (northbound) and SB (southbound) as well as I-880 NB and SB. Crash data for these freeway segments for the 16-month period from January 2010 through April 2011 are used to link historical crash occurrences with real-time traffic patterns observed through loop detector data. The analysis techniques adopted for this study are logistic regression and classification trees, which are one of the most common data mining tools. The crash risk assessment models are developed based on a binary classification approach (crash and non-crash outcomes), with traffic parameters measured at surrounding vehicle detection station (VDS) locations as the independent variables. The classification performance assessment methodology accounts for rarity of crashes compared to non-crash cases in the sample instead of the more common pre-specified threshold-based classification. Prior to development of the models, some of the data-related issues such as data cleaning and aggregation were addressed. Based on the modeling efforts, it was found that the turbulence in terms of speed variation is significantly associated with crash risk on the US-101 NB corridor. The models estimated with data from US-101 NB were evaluated based on their classification performance, not only on US-101 NB, but also on the other three freeways for transferability assessment. It was found that the predictive model derived from one freeway can be readily applied to other freeways, although the classification performance decreases. The models which transfer best to other roadways were found to be those that use the least number of VDSs–that is, using one upstream and downstream station rather than two or three. The classification accuracy of the models is discussed in terms of how the models can be used for real-time crash risk assessment, which may be helpful to authorities for freeway segments with newly installed traffic surveillance apparatuses, since the real-time crash risk assessment models from nearby freeways with existing infrastructure would be able to provide a reasonable estimate of crash risk. These models can also be applied for developing and testing variable speed limits (VSLs) and ramp metering strategies that proactively attempt to reduce crash risk. The robustness of the model output is assessed by location, time of day and day of week. The analysis shows that on some locations the models may require further learning due to higher than expected false positive (e.g., the I-680/I-280 interchange on US-101 NB) or false negative rates. The approach for post-processing the results from the model provides ideas to refine the model prior to or during the implementation.

APA, Harvard, Vancouver, ISO, and other styles

6

Motloung, Rethabile Frangenie. "Understanding current and potential distribution of Australian acacia species in southern Africa." Diss., University of Pretoria, 2014. http://hdl.handle.net/2263/79720.

Full text

Abstract:

This dissertation presents research on the value of using different sources of data to explore the factors determining invasiveness of introduced species. The research draws upon the availability of data on the historical trial plantings of alien species and other sources. The focus of the study is on Australian Acacia species as a taxon introduced into southern Africa (Lesotho, South Africa and Swaziland). The first component of the study focused on understanding the factors determining introduction outcome of species in historical trial plantings and invasion success of Australian Acacia species using Species Distribution Models (SDMs) and classification tree techniques. SDMs were calibrated using the native range occurrence records (Australia) and were validated using results of 150 years of South African government forestry trial planting records and invaded range data from the Southern African Plant Invaders Atlas. To understand factors associated with survival (‘trial success’) or failure to survive (‘trial failure’) of species in historical trial plantings, classification and regression tree analysis was used. The results indicate climate as one of the factors that explains introduction and/or invasion success of Australian Acacia species in southern Africa. However, the results also indicate that for ‘trial failures’ there are factors other than climate that could have influenced the trial outcome. This study emphasizes the need to integrate data on whether the species has been recorded to be invasive elsewhere with climate matching for invasion risk assessment. The second component of the study focused on understanding the distribution patterns of Australian Acacia species that are not known as invasive in southern Africa. The specific aims were to determine which species still exist at previously recorded sites and determine the current invasion status. This was done by collating data from different sources that list species introduced into southern Africa and then conducting revisits. For the purpose of this study, revisits means conducting field surveys based on recorded occurrences of introduced species. The known occurrence data for species on the list were obtained from different data sources and various invasion biology experts. As it was not practical to do revisits for all species on the list, three ornamental species (Acacia floribunda, A. pendula and A. retinodes) were selected as part of the pilot study for the conducted revisits in this study. Acacia retinodes trees were not found during the revisits. The results provided data that could be used to characterize species based on the Blackburn et al., (2011) scheme. However, it is not clear whether observed Acacia pendula or A. floribunda trees will spread away from the sites hence the need to continuously monitor sites for spread. The methods used in this research establish a protocol for future work on conducting revisits at known localities of introduced species to determine their population dynamics and thereby characterize the species according to the scheme for management purposes.
Dissertation (MSc)--University of Pretoria, 2014.
National Research Foundation (NRF)
Zoology and Entomology
MSc (Zoology)
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

7

Mugodo, James, and n/a. "Plant species rarity and data restriction influence the prediction success of species distribution models." University of Canberra. Resource, Environmental & Heritage Sciences, 2002. http://erl.canberra.edu.au./public/adt-AUC20050530.112801.

Full text

Abstract:

There is a growing need for accurate distribution data for both common and rare plant species for conservation planning and ecological research purposes. A database of more than 500 observations for nine tree species with different ecological and geographical distributions and a range of frequencies of occurrence in south-eastern New South Wales (Australia) was used to compare the predictive performance of logistic regression models, generalised additive models (GAMs) and classification tree models (CTMs) using different data restriction regimes and several model-building strategies. Environmental variables (mean annual rainfall, mean summer rainfall, mean winter rainfall, mean annual temperature, mean maximum summer temperature, mean minimum winter temperature, mean daily radiation, mean daily summer radiation, mean daily June radiation, lithology and topography) were used to model the distribution of each of the plant species in the study area. Model predictive performance was measured as the area under the curve of a receiver operating characteristic (ROC) plot. The initial predictive performance of logistic regression models and generalised additive models (GAMs) using unrestricted, temperature restricted, major gradient restricted and climatic domain restricted data gave results that were contrary to current practice in species distribution modelling. Although climatic domain restriction has been used in other studies, it was found to produce models that had the lowest predictive performance. The performance of domain restricted models was significantly (p = 0.007) inferior to the performance of major gradient restricted models when the predictions of the models were confined to the climatic domain of the species. Furthermore, the effect of data restriction on model predictive performance was found to depend on the species as shown by a significant interaction between species and data restriction treatment (p = 0.013). As found in other studies however, the predictive performance of GAM was significantly (p = 0.003) better than that of logistic regression. The superiority of GAM over logistic regression was unaffected by different data restriction regimes and was not significantly different within species. The logistic regression models used in the initial performance comparisons were based on models developed using the forward selection procedure in a rigorous-fitting model-building framework that was designed to produce parsimonious models. The rigorous-fitting modelbuilding framework involved testing for the significant reduction in model deviance (p = 0.05) and significance of the parameter estimates (p = 0.05). The size of the parameter estimates and their standard errors were inspected because large estimates and/or standard errors are an indication of model degradation from overfilling or effecls such as mullicollinearily. For additional variables to be included in a model, they had to contribule significantly (p = 0.025) to the model prediclive performance. An attempt to improve the performance of species distribution models using logistic regression models in a rigorousfitting model-building framework, the backward elimination procedure was employed for model selection, bul it yielded models with reduced performance. A liberal-filling model-building framework that used significant model deviance reduction at p = 0.05 (low significance models) and 0.00001 (high significance models) levels as the major criterion for variable selection was employed for the development of logistic regression models using the forward selection and backward elimination procedures. Liberal filling yielded models that had a significantly greater predictive performance than the rigorous-fitting logistic regression models (p = 0.0006). The predictive performance of the former models was comparable to that of GAM and classification tree models (CTMs). The low significance liberal-filling models had a much larger number of variables than the high significance liberal-fitting models, but with no significant increase in predictive performance. To develop liberal-filling CTMs, the tree shrinking program in S-PLUS was used to produce a number of trees of differenl sizes (subtrees) by optimally reducing the size of a full CTM for a given species. The 10-fold cross-validated model deviance for the subtrees was plotted against the size of the subtree as a means of selecting an appropriate tree size. In contrast to liberal-fitting logistic regression, liberal-fitting CTMs had poor predictive performance. Species geographical range and species prevalence within the study area were used to categorise the tree species into different distributional forms. These were then used, to compare the effect of plant species rarity on the predictive performance of logistic regression models, GAMs and CTMs. The distributional forms included restricted and rare (RR) species (Eucalyptus paliformis and Eucalyptus kybeanensis), restricted and common (RC) species (Eucalyptus delegatensis, Eucryphia moorei and Eucalyptus fraxinoides), widespread and rare (WR) species (Eucalyptus data) and widespread and common (WC) species (Eucalyptus sieberi, Eucalyptus pauciflora and Eucalyptus fastigata). There were significant differences (p = 0.076) in predictive performance among the distributional forms for the logistic regression and GAM. The predictive performance for the WR distributional form was significantly lower than the performance for the other plant species distributional forms. The predictive performance for the RC and RR distributional forms was significantly greater than the performance for the WC distributional form. The trend in model predictive performance among plant species distributional forms was similar for CTMs except that the CTMs had poor predictive performance for the RR distributional form. This study shows the importance of data restriction to model predictive performance with major gradient data restriction being recommended for consistently high performance. Given the appropriate model selection strategy, logistic regression, GAM and CTM have similar predictive performance. Logistic regression requires a high significance liberal-fitting strategy to both maximise its predictive performance and to select a relatively small model that could be useful for framing future ecological hypotheses about the distribution of individual plant species. The results for the modelling of plant species for conservation purposes were encouraging since logistic regression and GAM performed well for the restricted and rare species, which are usually of greater conservation concern.

APA, Harvard, Vancouver, ISO, and other styles

8

Lazaridès, Ariane. "Classification trees for acoustic models : variations on a theme." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0016/MQ37139.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Löwe, Rakel, and Ida Schneider. "Automatic Differential Diagnosis Model of Patients with Parkinsonian Syndrome : A model using multiple linear regression and classification tree learning." Thesis, Uppsala universitet, Tillämpad kärnfysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413638.

Full text

Abstract:

Parkinsonian syndrome is an umbrella term including several diseases with similar symptoms. PET images are key when differential diagnosing patients with parkinsonsian syndrome. In this work two automatic diagnosing models are developed and evaluated, with PET images as input, and a diagnosis as output. The two devoloped models are evaluated based on performance, in terms of sensitivity, specificity and misclassification error. The models consists of 1) regression model and 2) either a decision tree or a random forest. Two coefficients, alpha and beta, are introduced to train and test the models. The coefficients are the output from the regression model. They are calculated with multiple linear regression, with the patient images as dependent variables, and mean images of four patient groups as explanatory variables. The coefficients are the underlying relationship between the two. The four patient groups consisted of 18 healthy controls, 21 patients with Parkinson's disease, 17 patients with dementia with Lewi bodies and 15 patients with vascular parkinsonism. The models predict the patients with misclassification errors of 27% for the decision tree and 34% for the random forest. The patient group which is easiest to classify according to both models is healthy controls. The patient group which is hardest to classify is vascular parkinsonism. These results implies that alpha and beta are interesting outcomes from PET scans, and could, after further development of the model, be used as a guide when diagnosing in the models developed.

APA, Harvard, Vancouver, ISO, and other styles

10

Purcell, Terence S. "The use of classification trees to characterize the attrition process for Army manpower models." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1997. http://handle.dtic.mil/100.2/ADA336747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Silva, Jesús, Palma Hugo Hernández, Núẽz William Niebles, Alex Ruiz-Lazaro, and Noel Varela. "Natural Language Explanation Model for Decision Trees." Institute of Physics Publishing, 2020. http://hdl.handle.net/10757/652131.

Full text

Abstract:

This study describes a model of explanations in natural language for classification decision trees. The explanations include global aspects of the classifier and local aspects of the classification of a particular instance. The proposal is implemented in the ExpliClas open source Web service [1], which in its current version operates on trees built with Weka and data sets with numerical attributes. The feasibility of the proposal is illustrated with two example cases, where the detailed explanation of the respective classification trees is shown.

APA, Harvard, Vancouver, ISO, and other styles

12

Udaya, Kumar Magesh Kumar. "Classification of Parkinson’s Disease using MultiPass Lvq,Logistic Model Tree,K-Star for Audio Data set : Classification of Parkinson Disease using Audio Dataset." Thesis, Högskolan Dalarna, Datateknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:du-5596.

Full text

Abstract:

Parkinson's disease (PD) is a degenerative illness whose cardinal symptoms include rigidity, tremor, and slowness of movement. In addition to its widely recognized effects PD can have a profound effect on speech and voice.The speech symptoms most commonly demonstrated by patients with PD are reduced vocal loudness, monopitch, disruptions of voice quality, and abnormally fast rate of speech. This cluster of speech symptoms is often termed Hypokinetic Dysarthria.The disease can be difficult to diagnose accurately, especially in its early stages, due to this reason, automatic techniques based on Artificial Intelligence should increase the diagnosing accuracy and to help the doctors make better decisions. The aim of the thesis work is to predict the PD based on the audio files collected from various patients.Audio files are preprocessed in order to attain the features.The preprocessed data contains 23 attributes and 195 instances. On an average there are six voice recordings per person, By using data compression technique such as Discrete Cosine Transform (DCT) number of instances can be minimized, after data compression, attribute selection is done using several WEKA build in methods such as ChiSquared, GainRatio, Infogain after identifying the important attributes, we evaluate attributes one by one by using stepwise regression.Based on the selected attributes we process in WEKA by using cost sensitive classifier with various algorithms like MultiPass LVQ, Logistic Model Tree(LMT), K-Star.The classified results shows on an average 80%.By using this features 95% approximate classification of PD is acheived.This shows that using the audio dataset, PD could be predicted with a higher level of accuracy.

APA, Harvard, Vancouver, ISO, and other styles

13

Pienaar, Neil Deon. "Using the classification and regression tree (CART) model for stock selection on the S&P 700." Master's thesis, University of Cape Town, 2016. http://hdl.handle.net/11427/20728.

Full text

Abstract:

Traditionally, investment practitioners and academics alike have used stock fundamentals and a linear framework in order to predict future stock performance. This approach has been shown to have flaws as literature has shown that stock returns can exhibit non-linearity and involve complex relations beyond that of a linear nature (Hsieh, 1991; Sarantis, 2001; Shively, 2003). These findings present an opportunity to investment practitioners who are better able to model these returns. This dissertation attempts to classify stocks on the S&P 700 index using a Classification and Regression Tree (CART) built during an in-sample period and then used for predicative purposes during an out-of-sample period deliberately comprising both a period of financial crisis and recovery. For these periods, various portfolios and performance measures are calculated in order to assess the models performance relative to the benchmark, the Standard and Poor (S&P) 700 index.

APA, Harvard, Vancouver, ISO, and other styles

14

Lim, Steven. "Recommending TEE-based Functions Using a Deep Learning Model." Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/104999.

Full text

Abstract:

Trusted execution environments (TEEs) are an emerging technology that provides a protected hardware environment for processing and storing sensitive information. By using TEEs, developers can bolster the security of software systems. However, incorporating TEE into existing software systems can be a costly and labor-intensive endeavor. Software maintenance—changing software after its initial release—is known to contribute the majority of the cost in the software development lifecycle. The first step of making use of a TEE requires that developers accurately identify which pieces of code would benefit from being protected in a TEE. For large code bases, this identification process can be quite tedious and time-consuming. To help reduce the software maintenance costs associated with introducing a TEE into existing software, this thesis introduces ML-TEE, a recommendation tool that uses a deep learning model to classify whether an input function handles sensitive information or sensitive code. By applying ML-TEE, developers can reduce the burden of manual code inspection and analysis. ML-TEE's model was trained and tested on functions from GitHub repositories that use Intel SGX and on an imbalanced dataset. The accuracy of the final model used in the recommendation system has an accuracy of 98.86% and an F1 score of 80.00%. In addition, we conducted a pilot study, in which participants were asked to identify functions that needed to be placed inside a TEE in a third-party project. The study found that on average, participants who had access to the recommendation system's output had a 4% higher accuracy and completed the task 21% faster.
Master of Science
Improving the security of software systems has become critically important. A trusted execution environment (TEE) is an emerging technology that can help secure software that uses or stores confidential information. To make use of this technology, developers need to identify which pieces of code handle confidential information and should thus be placed in a TEE. However, this process is costly and laborious because it requires the developers to understand the code well enough to make the appropriate changes in order to incorporate a TEE. This process can become challenging for large software that contains millions of lines of code. To help reduce the cost incurred in the process of identifying which pieces of code should be placed within a TEE, this thesis presents ML-TEE, a recommendation system that uses a deep learning model to help reduce the number of lines of code a developer needs to inspect. Our results show that the recommendation system achieves high accuracy as well as a good balance between precision and recall. In addition, we conducted a pilot study and found that participants from the intervention group who used the output from the recommendation system managed to achieve a higher average accuracy and perform the assigned task faster than the participants in the control group.

APA, Harvard, Vancouver, ISO, and other styles

15

Truong, Alfred Kar Yin. "Fast growing and interpretable oblique trees via logistic regression models." Thesis, University of Oxford, 2009. http://ora.ox.ac.uk/objects/uuid:e0de0156-da01-4781-85c5-8213f5004f10.

Full text

Abstract:

The classification tree is an attractive method for classification as the predictions it makes are more transparent than most other classifiers. The most widely accepted approaches to tree-growth use axis-parallel splits to partition continuous attributes. Since the interpretability of a tree diminishes as it grows larger, researchers have sought ways of growing trees with oblique splits as they are better able to partition observations. The focus of this thesis is to grow oblique trees in a fast and deterministic manner and to propose ways of making them more interpretable. Finding good oblique splits is a computationally difficult task. Various authors have proposed ways of doing this by either performing stochastic searches or by solving problems that effectively produce oblique splits at each stage of tree-growth. A new approach to finding such splits is proposed that restricts attention to a small but comprehensive set of splits. Empirical evidence shows that good oblique splits are found in most cases. When observations come from a small number of classes, empirical evidence shows that oblique trees can be grown in a matter of seconds. As interpretability is the main strength of classification trees, it is important for oblique trees that are grown to be interpretable. As the proposed approach to finding oblique splits makes use of logistic regression, well-founded variable selection techniques are introduced to classification trees. This allows concise oblique splits to be found at each stage of tree-growth so that oblique trees that are more interpretable can be directly grown. In addition to this, cost-complexity pruning ideas which were developed for axis-parallel trees have been adapted to make oblique trees more interpretable. A major and practical component of this thesis is in providing the oblique.tree package in R that allows casual users to experiment with oblique trees in a way that was not possible before.

APA, Harvard, Vancouver, ISO, and other styles

16

Linkevicius, Edgaras. "Single Tree Level Simulator for Lituanian Pine Forests." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-150330.

Full text

Abstract:

Ziele Die Forsteinrichtung in Litauen war in den vergangenen Jahrzehnten vom Leitgedanken geprägt, die Optimierung der Bestandsdichte und die Maximierung der Produktivität in jeder Phase der Bestandsentwicklung als gleichrangige Ziele zu betrachten. Deshalb wurden große Anstrengungen in die Herleitung von Bestandswuchsmodellen für gleichaltrige Kiefern- oder Fichtenreinbestände gelegt. Bei der Anwendung dieser Modelle auf gemischte oder in der Umwandlung befindliche Wälder sind allerdings nur ungenaue Resultate zu erzielen. Um den Erfordernissen einer zeitgemäßen Forstwirtschaft gerecht zu werden, sind geeignete Instrumente zur Prognose von Wachstum und Ertrag strukturreicher Wälder vonnöten. Das Hauptziel dieser Arbeit bestand deshalb in der Neuparametrisierung des Einzelbaumwachstumssimulators BWINPro-S (entwickelt für sächsische Wuchsverhältnisse) für Kiefernwälder auf mineralischen Standorten in Litauen. Zur Zielerreichung dienten folgende Schritte: • Schaffung und Evaluierung einer Datengrundlage für die Modellierung. • Abschätzung der Effekte von Konkurrenz um Wuchsraum auf den Durchmesser-, Grundflächen- und Höhenzuwachs von Einzelbäumen. • Entwicklung eines Durchmesser-Zuwachsmodells sowie Neuparametrisierung der Grundflächen- und Höhenwachstumsmodelle. • Bestimmung der Einzelbaummortalität durch Konkurrenz um Wuchsraum. • Entwicklung eines ersten Ansatzes für einen Einzelbaumwachstumssimulator für Kiefer in Litauen. Hypothesen: 1. Die Standorteigenschaften sind der prägende Faktor für Wachstum und Ertrag von Waldbeständen. 2. Distanzabhängige Konkurrenzindizes zeigen höhere partielle Korrelationen zu Grundflächen- und Höhenzuwachs der Einzelbäume als distanzunabhängige Konkurrenzindizes. 3. Im Vergleich zum Ursprungsmodell BWINPro-S kann durch die Neuparametrisierung eine bessere Anpassung an die Wachstumswirklichkeit in Litauen erzielt werden (in Bezug auf Durchmesser-, Grundflächen- und Höhenzuwachs sowie Mortalitätsschätzung). 4. Ein Einzelbaumwachstumssimulator unterstützt die Entscheidungsträger und Forstplaner in Litauen bei der Optimierung der Waldbewirtschaftung ganz wesentlich. Material und Methoden Der Forschungsansatz gliederte sich wie folgt: 1) Vervollständigung der Datengrundlage. 2) Analyse der Konkurrenzverhältnisse. 3) Modellierung des Einzelbaumwachstums. 4) Validierung der neuentwickelten bzw. neuparametrisierten Modelle. Die Datengrundlage bestand aus Messwerten von 18 Dauerversuchsflächen (PEP) und zwei Validierungsflächen (VP), von denen letztere nur zur Modellüberprüfung herangezogen wurden. Auf allen Flächen stocken vorwiegend aus Naturverjüngung hervorgegangene, einschichtige Kiefernbestände auf kieferntypischen Standorteinheiten. Die Vervollständigung der Datengrundlage erforderte (a) die Erzeugung der Ausgangsdatenbasis, (b) Berechnung fehlender Werte, und (c) Evaluierung der vervollständigten Datengrundlage. Dabei lag das Hauptaugenmerk auf: • Stichprobenumfang und Schätzung der Populationsmittelwerte. • Schätzung des potentiellen Standort-Leistungsvermögens. • Analyse der Beziehung zwischen dem potentiellen Standort-Leistungsvermögen und dem tatsächlichen Waldertrag. Zur Abschätzung der Effekte von Konkurrenz um Wuchsraum auf den Durchmesser-, Grundflächen- und Höhenzuwachs von Einzelbäumen diente folgendes Vorgehen: Zur Konkurrentenidentifikation wurde ein inverser Lichtkegel mit einem Öffnungswinkel von 60 und 80 Grad konstruiert, dessen nach unten gerichtete Spitze (a) an der Kronenansatzhöhe, (b) an der Höhe der größten Kronenbreite, und (c) am Stammfuß des Zentralbaumes ansetzte. Zur Quantifizierung des Konkurrenzdrucks wurden mit Hilfe der partiellen Korrelationsanalyse 20 Konkurrenzindizes geprüft, von denen letztendlich sechs distanzabhängige und zwei distanzunabhängige Indizes in der weiteren Auswertung Berücksichtigung fanden. Die Modellierung des Einzelbaumwachstums erfolgte in drei Schritten: (a) Entwicklung eines originären Einzelbaum-Durchmesserzuwachsmodells, (b) Neuparametrisierung des Grundflächen- und Höhenzuwachsmodells, und (c) Entwicklung und Neuparametrisierung von Mortalitätsmodellen. Zur Bewertung einfacher linearer Regressionsmodelle wurden die statistische Signifikanz und das Bestimmtheitsmaß herangezogen. Bei multiplen linearen Regressionsmodellen wurde die Signifikanz jeder unabhängigen Variablen gesondert geprüft (hinsichtlich Normalverteilung, Varianzhomogenität der Residuen und Multikollinearität). Zur Bewertung einfacher nichtlinearer Regressionsmodelle diente in erster Linie das korrigierte Bestimmtheitsmaß, bei multiplen nichtlinearen Regressionsmodellen fanden darüber hinaus Q-Q-Plots (Quantil-Quantil-Diagramme) und die Prüfung auf Varianzhomogenität der Residuen Verwendung. Die Evaluierung multipler logistischer Regressionsmodelle erfolgte mit Pearsons Chi-Quadrat-Test, die Signifikanz jedes Modellparameters wurde mit der Wald-Statistik geprüft. Die Anpassungsgüte wurde mit Hilfe der Log-Likelihood-Funktion, Cox & Snell- bzw. Nagelkerke-Bestimmtheitsmaßen, Klassifikationstabellen und ROC-Kurven bewertet. Zur Prüfung der neuparametrisierten Grundflächen- und Höhenzuwachsmodelle wurden die modellierten Werte gegen die Messwerte und darüber hinaus die Residuen gegen die Modellwerte geplottet. Außerdem wurden zur Beurteilung die Verzerrung, die Präzision und die Treffgenauigkeit (sowohl als Absolut- als auch als Relativwerte) herangezogen. Ergebnisse und Schlussfolgerungen Die Wachstumsmodelle des Simulators BWINPro-S konnten erfolgreich an die Bedingungen in Litauen angepasst werden. Daraus lassen sich folgende Schlussfolgerungen ableiten: 1. Der stehende Vorrat und die Gesamtwuchsleistung von Kiefernbeständen werden nur z. T. vom standörtlichen Leistungsvermögen determiniert. Die Standorteigenschaften bestimmen das theoretische Leistungsvermögen von Beständen. Ob dieses Potential auch tatsächlich ausgeschöpft werden kann, hängt weitgehend von der Bewirtschaftungsart ab, die geprägt ist durch Beginn, Häufigkeit und Stärke der Durchforstungseingriffe. 2. In Kiefernreinbeständen eignen sich distanzabhängige Konkurrenzindizes besser zur Prognose des mittleren Grundflächenzuwachses als distanzunabhängige Indizes. Zur Beschreibung des Einzelbaum-Durchmesserzuwachses hat sich der Index nach BIGING & DOBBERTIN (1992, in dieser Arbeit als Index CI4 bezeichnet) in Kombination mit der Konkurrentenidentifikationsmethode „Suchkegelansatz in Kronenansatzhöhe, Öffnungswinkel 80 Grad“ als der bestgeeignetste Ansatz erweisen. 3. Der distanzunabhängige Konkurrenzindex nach HEGYI (1974) erreichte die höchsten partiellen Korrelationskoeffizienten mit den mittleren Einzelbaum-Höhenzuwächsen und ergab etwas bessere Resultate bei der Wachstumsprognose als distanzabhängige Indizes. Allerdings waren die Beziehungen zwischen den Konkurrenzindizes und den Einzelbaum-Höhenzuwächsen nur schwach ausgeprägt. 4. Konkurrenz wirkt sich dämpfend auf den Einzelbaum-Durchmesserzuwachs aus, bei zunehmender Konkurrenz sinkt der Zuwachs kontinuierlich ab. Im Gegensatz dazu beschleunigt leichte Konkurrenz das Einzelbaum-Höhenwachstum, bei starker Konkurrenz jedoch wird auch der Höhenzuwachs negativ beeinflusst. 5. Das im Rahmen dieser Arbeit hergeleitete nichtlineare Durchmesserzuwachsmodell ist zur Prognose des Kiefernwachstums bestens geeignet, das Bestimmtheitsmaß beträgt 0,483, die Residuen waren normalverteilt. 6. Die Neuparametrisierung des Grundflächen- und Höhenzuwachsmodells verbesserte die Anpassung an die Wuchsbedingungen in Litauen bedeutend. Eine erste Validierung, durchgeführt für eine Wachstumsprognose über einen 30-jährigen Zeitraum, ergab zufriedenstellende Ergebnisse. 7. Die zwei im Rahmen dieser Arbeit hergeleiteten Mortalitätsschätzer sind zur Vorhersage der natürlichen Absterbeprozesse in den Kiefernbeständen gut geeignet. Beide Ansätze klassifizierten lebende und tote Bäume mit einer Treffgenauigkeit von über 83%, während der in BWINPro-S enthaltene Schätzer nur 77% der Bäume korrekt zuordnete. 8. Der für litauische Verhältnisse neuparametrisierte Wachstumssimulator BWINPro-S ist ein wichtiges Instrument zur Entscheidungsunterstützung für Forstplaner in Litauen
Objectives In Lithuania, during the most recent decades, the leading theory in forest management and planning combined optimization of forest stand density and maximal productivity at every time point of stand development. Thus, great effort was spent in creating stand level models that are highly effective in managing even-aged monocultures of pine or spruce forests. But these models produce significant errors in mixed or converted forests. In order to meet the requirements of contemporary forestry, appropriate forest management tools are required that would be capable to predict the growth and yield of more structured forests. Thus, the overall objective for this study was to re-parameterise the single tree level simulator BWINPro-S (developed for forests in Saxony/Germany) for Lithuanian pine forests that grow on mineral sites. To reach this goal, the following tasks were set: • To create, and to evaluate, a database for modelling. • To estimate the impact of competition for growing space on diameter, basal area and height growth of trees. • To develop a tree diameter model, and re-parameterise basal area and height growth models. • To assess natural tree mortality induced by competition between trees for growing space. • To develop the first approach of STLS for pine in Lithuania. Hypotheses 1. Site quality is the most important factor that affects forest growth and yield. 2. Distance dependent Competition Indices had higher partial correlation with tree basal area and height increment than distance independent Competition Indices. 3. The re-parameterised model based on Lithuanian data fits better under Lithuanian conditions (regarding diameter, basal area, height increment and mortality) than the original model BWINPro-S. 4. A single tree level simulator provides valuable support for decision makers and forest managers to improve forest management in Lithuania. Materials and methods To reach the main goals of this study, the research was structured to four sections: 1) Database completion, 2) Analysis of competition, 3) Modelling tree growth, 4) Validation of developed models. The database consisted of analytical data from 18 permanent experimental plots (PEPs) and 2 Validation Plots (VP) that were used only for the validation of the models. All plots (PEPs and VP) represent mainly naturally regenerated, single layer pine stands that grow on very typical pine sites. Database completion involved (a) establishment of the initial database, (b) modelling of missing data values and (c) evaluation of the complete database, which focused on: • Sample size and estimation of the population’s mean • Estimation of potential site productivity • Estimation of relationship between potential site productivity and forest yield In order to estimate the impact of competition for growing space on diameter, basal area and height growth of trees the following methods were used. To select the competitors, this study focuses on three separate positions for setting the inverse cone: a) at the height of the crown base, b) at the height of widest crown width, and c) at the stem base. The opening angle of the search cone was either 60 or 80 degrees. To estimate the competition, the study by partial correlation analysis evaluated a total of 20 competition indices, of which six distance dependent and two distance independent CIs were applied in the research programme. Modelling of tree growth was divided into three parts: a) development of an original tree diameter increment model, b) re-parameterisation of basal area and height increment models, and c) development of new natural mortality models and re-parameterisation of natural mortality models. Simple linear regression models were evaluated by estimating each model’s statistical significance and coefficient of determination. Statistical analysis of multiple linear regression models was enlarged by conducting further tests: statistical significance was checked for each independent variable: regression assumptions (concerning normal distribution and homogeneity of variance of the models’s residuals, and multicollinearity of the independent variables) were checked. Simple nonlinear regression models were evaluated mainly by adjusted coefficient of determination. For multiple nonlinear regression models, regression assumptions were also checked by producing normal Q-Q plots and by checking homogeneity of variance of model’s residuals. Multiple logistic regression models were evaluated by estimating each model’s statistical significance with Pearson’s chi square statistics and the statistical significance of each model’s parameters with Wald statistics. Goodness of fit was estimated by using log likelihood function values, Cox-Snell and Nagelkerkle’s coefficients of determination, classification tables and ROC curves. The re-parameterised basal area and height increment models were validated by plotting each model’s predicted values against observed values. Also each model’s residuals were plotted against predicted values. Bias, relative bias, precision, relative precision, accuracy and relative accuracy when comparing predicted and observed values were estimated as well. Results and Conclusions The growth models used in the BWINPro-S simulator were successfully re-parameterised for Lithuanian growth conditions. Thus the study may state these conclusions: 1. The accumulated standing volumes and overall productivity of pine stands only partially depends on the productivity potential of sites. Site quality defines the growth potential that could be reached in a stand. The realization of growth potential largely depends on the growing regime in the stand that is defined by the beginning, frequency and intensity of thinning. 2. In pure pine stands, distance dependent competition indices show greater capabilities to predict mean annual basal area increment than distance independent indices. Competition index (coded as CI4 in this study) proposed by BIGING & DOBBERTIN (1992) combined with the selection method height to crown base with opening angle of 80 degrees is recommended as the most efficient for describing the individual diameter growth of trees. 3. HEGYI\\\'S (1974) distance independent competition index scored the highest partial correlation coefficients and produced slightly better results than distance dependent competition indices in predicting mean annual height increment for individual trees. Yet, the generally poor performance of competition indices to predict height increment of individual pine trees was also recorded. 4. Competition has a purely negative impact on tree diameter growth. Increasing competition leads to steady decreases in diameter increment. Nevertheless, although a small amount of competition does stimulate tree height growth, stronger competition has a lasting negative impact on tree height growth. 5. The nonlinear diameter increment model, developed by this study, has high capabilities to predict growth of pine trees. The model’s coefficient of determination value was equal to 0.483. The distribution of the model’s residuals fulfilled the requirements of regression assumptions. 6. The re-parameterisation of the BWINPro-S basal area and height increment models for use in Lithuanian permanent experimental plots, increased their performance. During the first validation procedure, based on 30 years growth simulation, the re-parameterised models produced reliable results. 7. Two individual mortality models, developed by this study, showed very high capabilities to predict the natural mortality of pine trees. The distance dependent natural mortality model scored slightly better results. Both models managed to correctly classify dead and living trees, slightly more than 83% of the time. The re-parameterisation of the BWINPro-S natural mortality model increased its ability to predict the natural mortality of pine trees in Lithuania. Correctly classifying growing and dead trees increased by 6%, from 77 to 83%. 8. BWINPro-S simulator with re-parameterised growth models for Lithuanian conditions is a valuable support tool for decision makers and forest managers in Lithuania
Darbo tikslai Lietuvoje ilgą laiką ūkininkavimas miškuose buvo grindžiamas medynų tankumo optimizavimu ir maksimalaus medynų produktyvumo siekimu visose medynų vystymosi stadijose. Mokslininkai dėjo daug pastangų kurdami medyno lygmens našumo modelius. Šie modeliai buvo patikimi ūkininkaujant vienaamžiuose medynuose. Tačiau jie yra sunkiai pritaikomi mišriuose medynuose. Siekiant patenkinti šiuolaikinio miškininkavimo poreikius, kai vis didesnis dėmesys skiriamas mišrių medynų su keliais ardais auginimui, reikalingi nauji modeliai, kurie sėkmingai prognozuotų mišrių medynų augimą, jų našumą bei reakcijas į įvairias ūkines priemones. Todėl pagrindinis šio darbo tikslas yra parametrizuoti iš naujo BWINPro-S medžio lygio stimuliatorių sukurtą Vokietijos rytinėje žemėje Saksonijoje taip pritaikant jį Lietuvos sąlygoms. Šiam tikslui pasiekti, buvo suformuluoti sekantys uždaviniai: • Paruošti ir įvertinti duomenų bazę reikalingą modeliavimui. • Įvertinti medžių tarpusavio konkurencijos įtaką medžių skersmens, skerspločių sumos ir aukščio prieaugiui. • Sukurti naują medžio skersmens prieaugio modelį ir parametrizuoti iš naujo skerspločių sumos bei aukščio modelius. • Įvertinti pušynų savaiminio retinimosi dėsningumus atsižvelgiant į medžių tarpusavio konkurenciją dėl augimo erdvės. Tikrintinos hipotezės: 1. Medyno augavietė yra svarbiausias veiksnys, lemiantis medynų našumą ir produktyvumą. 2. Konkurencijos indeksai, įvertinantys atstumą tarp medžių, turi didesnes dalinės koreliacijos reikšmes su medžių skerspločių sumos, skersmens ir aukščio prieaugiais lyginant su konkurencijos indeksais, neįvertinančiais atstumo tarp medžių. 3. Parametrizuoti naujai, panaudojant Lietuvoje augančių pušynų duomenis, modeliai geriau tinka Lietuvos sąlygoms (pagal skersmens, skerspločių sumos ir aukščio prieaugį bei savaiminį retinimąsi) lyginant su modeliais, sukurtais Vokietijos sąlygoms. 4. Medžio lygmens augimo simuliatorius yra naudinga priemonė miškų valdytojams siekiant pagerinti ūkininkavimo kokybę Lietuvoje. Darbo metodai Šis darbas buvo suskirstytas į keturias pagrindines dalis: 1) duomenų bazės suformavimas, 2) konkurencijos indeksų analizė, 3) medžių augimo modeliavimas, 4) augimo modelių patikrinimas. Duomenų bazę sudarė 20 pastovių tyrimo barelių, iš kurių 18 buvo skirti modelių kūrimui ir 2 modelių patikrinimui. Tyrimo bareliai buvo įsteigti natūraliai atsikūrusiuose vienaardžiuose pušynuose, augančiuose tipingose pušiai augavietėse. Duomenų bazės įvertinimas buvo atliekamas tokiais etapais: (a) pirminės duomenų bazės suformavimas, (b) trūkstamų matavimų modeliavimas ir (c) duomenų bazės įvertinimas yra grindžiamas: • Imties dydžiu ir populiacijos vidurkio nustatymo tikslumu. • Potencialaus medynų našumo įvertinimu. • Ryšių tarp potencialaus medynų našumo ir medynų našumo bei produktyvumo įvertinimu. Vertinant konkurencijos įtaką medžių skersmens, skerspločių sumos ir aukščio prieaugiui, buvo naudoti konkurentų parinkimo ir konkurencijos įvertinimo metodai. Konkuruojantys medžiai buvo atrenkami pagal apversto kūgio viršūnę, sutapatintą su tiriamojo medžio a) lajos pradžia, b) plačiausia lajos vieta, ir c) medžio šaknies kakleliu. Kūgio kampas buvo keičiamas nuo 60 iki 80 laipsnių. Iš viso buvo tiriama dvidešimt konkurencijos indeksų (du konkurencijos indeksai, nepriklausantys nuo atstumo tarp medžių ir aštuoniolika konkurencijos indeksų, priklausančių nuo atstumo tarp medžių). Konkurencijos indeksai vertinti taikant dalinės koreliacijos metodus. Medžių augimo modeliavimas buvo atliekamas trim etapais: a) originalaus medžių skersmens prieaugio modelio sukūrimas, b) medžių skerspločių sumos ir medžių aukščio prieaugio modelių parametrizavimas naujai, c) sukūrimas originalių ir parametrizavimas naujai jau esamų natūralaus retinimosi modelių. Paprastieji tiesinės regresijos modeliai buvo vertinami naudojant jų statistinį reikšmingumą ir skaičiuojant determinacijos koeficientą. Daugialypių tiesinės regresijos modelių statistinė analizė buvo išplėsta papildomais testais: statistinis reikšmingumas tiriamas kiekvienam nepriklausomam kintamajam, taip pat vertinama ar modelis tenkina pagrindines regresijos sąlygas (nepriklausomi kintamieji nėra tarpusavyje susieti, modelio liekanos turi normalųjį skirstinį, yra tolygiai išsidėstę). Paprastieji netiesinės regresijos modeliai buvo vertinami skaičiuojant koreguotąjį determinacijos koeficientą. Atliekant daugialypių netiesinės regresijos modelių analizę taip pat buvo tikrinama ar tenkinamos regresijos sąlygos. Logistiniai savaiminio retinimosi modeliai buvo vertinami naudojant šiuos statistinius parametrus: modelio X2 suderinamumo kriterijų, Voldo kriterijų, didžiausio tikėtinumo funkcijos vertę, Kokso-Snelo ir Nagelkerkės pseudodeterminacijos koeficientus, klasifikavimo lenteles ir klasifikatoriaus jautrumo ir specifiškumo (ROC) kreives. Parametrizuoti naujai medžių skerspločių sumos ir medžių aukščio prieaugių modeliai buvo tikrinami lyginant modeliuotas medžių skersmens ir aukščio reikšmes su realiai išmatuotomis reikšmėmis analizuojamo periodo pabaigoje. Taip pat buvo tiriamas modelių liekanų išsidėstymas modeliuojamų verčių atžvilgiu. Galiausiai, poslinkio, santykinio poslinkio, tikslumo, santykinio tikslumo, tikslumo be poslinkio ir santykinio tikslumo be poslinkio buvo naudojami vertinant modelių prognozes. Rezultatai ir išvados Augimo modeliai, naudojami BWINPro-S medžio lygio simuliatoriuje, buvo sėkmingai parametrizuoti naujai ir pritaikyti Lietuvos sąlygoms. Remiantis šio darbo rezultatais, buvo gautos sekančios išvados: 1. Sukauptas tūris ir bendras medynų našumas pušynuose tik dalinai priklauso nuo potencialaus augaviečių derlingumo. Augavietės sąlygos lemia tik potencialų medynų našumą kuris gali būti pasiektas medyne. Ar potencialus augavietės našumas bus realizuotas priklauso nuo medžių auginimo rėžimo, kuris apibūdinamas ugdomųjų kirtimų pradžia, kartojimų dažnumu ir jų intensyvumu. 2. Grynuose pušynuose, konkurencijos indeksai, įvertinantys atstumą tarp medžių turi didesnes galimybes prognozuoti skerspločių sumos prieaugį negu konkurencijos indeksai, neįvertinantys atstumo tarp medžių. Konkurencijos indeksas CI4, pasiūlytas BIGING & DOBBERTIN (1992), grindžiamas konkurentų parinkimu pagal apverstą 80 laipsnių kūgį, kurio viršūnė yra sutapatinama su medžių lajos pradžia yra rekomenduojamas kaip pats efektyviausias modeliuojant medžių skersmens prieaugį. 3. HEGYI (1974) konkurencijos indeksas, neįvertinantis atstumo tarp medžių tiriant konkurencijos indeksų įtaką medžių aukščio prieaugiui, parodė kiek geresnius dalinės koreliacijos rezultatus negu kad konkurencijos indeksai, įvertinantys atstumą tarp medžių. Tyrimų rezultatai parodė gana silpną konkurencijos indeksų galimybę prognozuoti medžių aukščio prieaugį. 4. Konkurencija turi išskirtinai neigiamą įtaką medžių skersmens prieaugiui. Didėjanti konkurencija lemia mažėjantį skersmens prieaugį. Nedidelė konkurencija padidina medžių aukščio prieaugį. Tačiau stipresnė konkurencija taip pat turi neigiamą įtaką medžių aukščio prieaugiui. 5. Originalus skersmens prieaugio modelis turi geras galimybes prognozuoti pušies medžių augimą. Šio modelio determinacijos koeficientas buvo lygus 0.483. Modelio liekanos turėjo normalųjį skirstinį ir buvo tolygiai pasiskirsčiusios modeliuojamų verčių atžvilgiu. 6. Parametrizuoti naujai BWINPro-S medžių skerspločių sumos ir medžių aukščio prieaugio modeliai, panaudojant Lietuvos pušynų pastovių tyrimo barelių duomenis, padidino jų prognozavimo galimybes. Pirmieji modelių tikrinimo rezultatai pagrįsti trisdešimties metų augimo prognozėmis, parodė, kad šie modeliai yra patikimi. 7. Du originaliai sukurti pušynų savaiminio retinimosi modeliai pasižymi geromis galimybėmis prognozuoti pušynų savaiminį išsiretinimą. Savaiminio retinimosi modelis, atsižvelgiantis į atstumą tarp medžių pasižymi geresnėmis galimybėmis prognozuoti pušynų savaiminį retinimąsi negu savaiminio retinimosi modelis, neatsižvelgiantis į atstumą tarp medžių. Abu modeliai teisingai klasifikavo daugiau negu 83% augančių ir savaime išsiretinančių medžių. BWINPro-S savaiminio retinimosi modelio parametrizavimas naujai padidino jo teisingai prognozuojamų augančių ir savaime išsiretinančių medžių dalį šešiais procentais, nuo 77 iki 83%. 8. Medžio lygio augimo simuliatorius BWINPro-S su parametrizuotais naujai augimo modeliais yra naudingas įrankis Lietuvos miškų augintojams

APA, Harvard, Vancouver, ISO, and other styles

17

Araya, Yeheyies. "Detecting Swiching Points and Mode of Transport from GPS Tracks." Thesis, Linköpings universitet, Kommunikations- och transportsystem, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91320.

Full text

Abstract:

In recent years, various researches are under progress to enhance the quality of the travel survey. These researches were mainly performed with the aid of GPS technology. Initially the researches were mainly focused on the vehicle travel mode due to the availability of GPS technology in vehicle. But, nowadays due to the accessible of GPS devices for personal uses, researchers have diverted their focus on personal mobility in all travel modes. This master’s thesis aimed at developing a mechanism to extract one type of travel survey information particularly travel mode from collected GPS dataset. The available GPS dataset is collected for travel modes of walk, bike, car, and public transport travel modes such as bus, train and subway. The developed procedure consists of two stages where the first is the dividing the track trips into trips and further the trips into segments by means of a segmentation process. The segmentation process is based on an assumption that a traveler switches from one transportation mode to the other. Thus, the trips are divided into walking and non walking segments. The second phase comprises a procedure to develop a classification model to infer the separated segments with travel modes of walk, bike, bus, car, train and subway. In order to develop the classification model, a supervised classification method has been used where decision tree algorithm is adopted. The highest obtained prediction accuracy of the classification system is walk travel mode with 75.86%. In addition, the travel modes of bike and bus have shown the lowest prediction accuracy. Moreover, the developed system has showed remarkable results that could be used as baseline for further similar researches.

APA, Harvard, Vancouver, ISO, and other styles

18

Lecuyer, Jean-Francois. "Comparison of classification trees and logistic regression to model the severity of collisions involving elderly drivers in Canada." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27700.

Full text

Abstract:

The number of drivers aged 65 years and older in Canada and the proportion of the population these drivers represent have been increasing for many years and will continue to do so in years to come. This increase in the number of elderly drivers could possibly lead to an increase in the numbers of fatalities, serious injuries and collisions involving drivers of this age group[1]. In order to find ways to reduce the number of collisions involving elderly drivers, and in particular the number of fatalities among the victims of collisions involving drivers aged 65 years and older, the relationship between the characteristics of these collisions and their severity was modeled using both classification trees and logistic regression. In this thesis, we explain the theory behind classification trees and logistic regression before analyzing the data. Both techniques are also compared based on the results of the analysis. In particular, we have validated the classification trees with the more rigorous logistic regression analysis. Consequently, the non-statistician can use the visually appealing trees with confidence.

APA, Harvard, Vancouver, ISO, and other styles

19

Huang, Xuan. "Balance-guaranteed optimized tree with reject option for live fish recognition." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/9779.

Full text

Abstract:

This thesis investigates the computer vision application of live fish recognition, which is needed in application scenarios where manual annotation is too expensive, when there are too many underwater videos. This system can assist ecological surveillance research, e.g. computing fish population statistics in the open sea. Some pre-processing procedures are employed to improve the recognition accuracy, and then 69 types of features are extracted. These features are a combination of colour, shape and texture properties in different parts of the fish such as tail/head/top/bottom, as well as the whole fish. Then, we present a novel Balance-Guaranteed Optimized Tree with Reject option (BGOTR) for live fish recognition. It improves the normal hierarchical method by arranging more accurate classifications at a higher level and keeping the hierarchical tree balanced. BGOTR is automatically constructed based on inter-class similarities. We apply a Gaussian Mixture Model (GMM) and Bayes rule as a reject option after the hierarchical classification to evaluate the posterior probability of being a certain species to filter less confident decisions. This novel classification-rejection method cleans up decisions and rejects unknown classes. After constructing the tree architecture, a novel trajectory voting method is used to eliminate accumulated errors during hierarchical classification and, therefore, achieves better performance. The proposed BGOTR-based hierarchical classification method is applied to recognize the 15 major species of 24150 manually labelled fish images and to detect new species in an unrestricted natural environment recorded by underwater cameras in south Taiwan sea. It achieves significant improvements compared to the state-of-the-art techniques. Furthermore, the sequence of feature selection and constructing a multi-class SVM is investigated. We propose that an Individual Feature Selection (IFS) procedure can be directly exploited to the binary One-versus-One SVMs before assembling the full multiclass SVM. The IFS method selects different subsets of features for each Oneversus- One SVM inside the multiclass classifier so that each vote is optimized to discriminate the two specific classes. The proposed IFS method is tested on four different datasets comparing the performance and time cost. Experimental results demonstrate significant improvements compared to the normal Multiclass Feature Selection (MFS) method on all datasets.

APA, Harvard, Vancouver, ISO, and other styles

20

Santos, Ernani Possato dos. "Análise de crédito com segmentação da carteira, modelos de análise discriminante, regressão logística e classification and regression trees (CART)." Universidade Presbiteriana Mackenzie, 2015. http://tede.mackenzie.br/jspui/handle/tede/970.

Full text

Abstract:

Made available in DSpace on 2016-03-15T19:32:56Z (GMT). No. of bitstreams: 1 Ernani Possato dos Santosprot.pdf: 2286270 bytes, checksum: 96bb14c147c5baa96f3ae6ca868056d6 (MD5) Previous issue date: 2015-08-14
The credit claims to be one of the most important tools to trigger and move the economic wheel. Once it is well used it will bring benefits on a large scale to society; although if it is used without any balance it might bring loss to the banks, companies, to governments and also to the population. In relation to this context it becomes fundamental to evaluate models of credit capable of anticipating processses of default with an adequate degree of accuracy so as to avoid or at least to reduce the risk of credit. This study also aims to evaluate three credit risk models, being two parametric models, discriminating analysis and logistic regression, and one non-parametric, decision tree, aiming to check the accuracy of them, before and after the segmentation of such sample through the criteria of costumer s size. This research relates to an applied study about Industry BASE.
O crédito se configura em uma das mais importantes ferramentas para alavancar negócios e girar a roda da economia. Se bem utilizado, trará benefícios em larga escala à sociedade, porém, se utilizado sem equilíbrio, poderá trazer prejuízos, também em larga escala, a bancos, a empresas, aos governos e aos cidadãos. Em função deste contexto, é precípuo avaliar modelos de crédito capazes de prever, com grau adequado de acurácia, processos de default, a fim de se evitar ou, pelo menos, reduzir o risco de crédito. Este estudo tem como finalidade avaliar três modelos de análise do risco de crédito, sendo dois modelos paramétricos, análise discriminante e regressão logística, e um não-paramétrico, árvore de decisão, em que se avaliou a acurácia destes modelos, antes e após a segmentação da amostra desta pesquisa por meio do critério de porte dos clientes. Esta pesquisa se refere a um estudo aplicado sobre a Indústria BASE.

APA, Harvard, Vancouver, ISO, and other styles

21

Rusch, Thomas, Ilro Lee, Kurt Hornik, Wolfgang Jank, and Achim Zeileis. "Influencing elections with statistics: targeting voters with logistic regression trees." Institute of Mathematical Statistics (IMS), 2013. http://epub.wu.ac.at/3979/1/AOAS648.pdf.

Full text

Abstract:

In political campaigning substantial resources are spent on voter mobilization, that is, on identifying and influencing as many people as possible to vote. Campaigns use statistical tools for deciding whom to target ("microtargeting"). In this paper we describe a nonpartisan campaign that aims at increasing overall turnout using the example of the 2004 US presidential election. Based on a real data set of 19,634 eligible voters from Ohio, we introduce a modern statistical framework well suited for carrying out the main tasks of voter targeting in a single sweep: predicting an individual's turnout (or support) likelihood for a particular cause, party or candidate as well as data-driven voter segmentation. Our framework, which we refer to as LORET (for LOgistic REgression Trees), contains standard methods such as logistic regression and classification trees as special cases and allows for a synthesis of both techniques. For our case study, we explore various LORET models with different regressors in the logistic model components and different partitioning variables in the tree components; we analyze them in terms of their predictive accuracy and compare the effect of using the full set of available variables against using only a limited amount of information. We find that augmenting a standard set of variables (such as age and voting history) with additional predictor variables (such as the household composition in terms of party affiliation) clearly improves predictive accuracy. We also find that LORET models based on tree induction beat the unpartitioned models. Furthermore, we illustrate how voter segmentation arises from our framework and discuss the resulting profiles from a targeting point of view. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

22

Rusch, Thomas, Ilro Lee, Kurt Hornik, Wolfgang Jank, and Achim Zeileis. "Influencing Elections with Statistics: Targeting Voters with Logistic Regression Trees." WU Vienna University of Economics and Business, 2012. http://epub.wu.ac.at/3458/1/Report117.pdf.

Full text

Abstract:

Political campaigning has become a multi-million dollar business. A substantial proportion of a campaign's budget is spent on voter mobilization, i.e., on identifying and influencing as many people as possible to vote. Based on data, campaigns use statistical tools to provide a basis for deciding who to target. While the data available is usually rich, campaigns have traditionally relied on a rather limited selection of information, often including only previous voting behavior and one or two demographical variables. Statistical procedures that are currently in use include logistic regression or standard classification tree methods like CHAID, but there is a growing interest in employing modern data mining approaches. Along the lines of this development, we propose a modern framework for voter targeting called LORET (for logistic regression trees) that employs trees (with possibly just a single root node) containing logistic regressions (with possibly just an intercept) in every leaf. Thus, they contain logistic regression and classification trees as special cases and allow for a synthesis of both techniques under one umbrella. We explore various flavors of LORET models that (a) compare the effect of using the full set of available variables against using only limited information and (b) investigate their varying effects either as regressors in the logistic model components or as partitioning variables in the tree components. To assess model performance and illustrate targeting, we apply LORET to a data set of 19,634 eligible voters from the 2004 US presidential election. We find that augmenting the standard set of variables (such as age and voting history) together with additional predictor variables (such as the household composition in terms of party affiliation and each individual's rank in the household) clearly improves predictive accuracy. We also find that LORET models based on tree induction outbeat the unpartitioned competitors. Additionally, LORET models using both partitioning variables and regressors in the resulting nodes can improve the efficiency of allocating campaign resources while still providing intelligible models.
Series: Research Report Series / Department of Statistics and Mathematics

APA, Harvard, Vancouver, ISO, and other styles

23

Meira, Carlos Alberto Alves. "Processo de descoberta de conhecimento em bases de dados para a analise e o alerta de doenças de culturas agricolas e sua aplicação na ferrugem do cafeeiro." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/257023.

Full text

Abstract:

Orientador: Luiz Henrique Antunes Rodrigues
Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Agricola
Made available in DSpace on 2018-08-11T10:02:19Z (GMT). No. of bitstreams: 1 Meira_CarlosAlbertoAlves_D.pdf: 2588338 bytes, checksum: 869cc28d2c71dbc901870285cc32d8f9 (MD5) Previous issue date: 2008
Resumo: Sistemas de alerta de doenças de plantas permitem racionalizar o uso de agrotóxicos, mas são pouco utilizados na prática. Complexidade dos modelos, dificuldade de obtenção dos dados necessários e custos para o agricultor estão entre as razões que inibem o seu uso. Entretanto, o desenvolvimento tecnológico recente - estações meteoro lógicas automáticas, bancos de dados, monitoramento agrometeorológico na Web e técnicas avançadas de análise de dados - permite se pensar em um sistema de acesso simples e gratuito. Uma instância do processo de descoberta de conhecimento em bases de dados foi realizada com o objetivo de avaliar o uso de classificação e de indução de árvores de decisão na análise e no alerta da ferrugem do cafeeiro causada por Hemileia vastatrix. Taxas de infecção calculadas a partir de avaliações mensais de incidência da ferrugem foram agrupadas em três classes: TXl - redução ou estagnação; TX2 - crescimento moderado (até 5 p.p.); e TX3 - crescimento acelerado (acima de 5 p.p.). Dados meteorológicos, carga pendente de frutos do cafeeiro (Coffea arabica) e espaçamento entre plantas foram as variáveis independentes. O conjunto de treinamento totalizou 364 exemplos, preparados a partir de dados coletados em lavouras de café em produção, de outubro de 1998 a outubro de 2006. Uma árvore de decisão foi desenvolvida para analisar a epidemia da ferrugem do cafeeiro. Ela demonstrou seu potencial como modelo simbólico e interpretável, permitindo a identificação das fronteiras de decisão e da lógica contidas nos dados, allf'iliando na compreensão de quais variáveis e como as interações dessas variáveis condicionaram o progresso da doença no campo. As variáveis explicativas mais importantes foram a temperatura média nos períodos de molhamento foliar, a carga pendente de frutos, a média das temperaturas máximas diárias no período de inG:!Jbação e a umidade relativa do ar. Os modelos de alerta foram deserivolvtdos considerando taxas de infecção binárias, segundo os limites de 5 p.p e 10 p.p. (classe- '1' para taxas maiores ou iguais ao limite; classe 'O', caso contrário). Os modelos são específicos para lavouras com alta carga pendente ou para lavouras com baixa carga. Os primeiros tiveram melhor desempenho na avaliação. A estimativa de acurácia, por validação cruzada, foi de até 83%, considerando o alerta a partir de 5 p.p. Houve ainda equilíbrio entre a acurácia e medidas importantes como sensitividade, especificidade e confiabilidade positiva ou negativa. Considerando o alerta a partir de 10 p.p., a acurácia foi de 79%. Para lavouras com baixa carga pendente, os modelos considerando o alerta a partir de 5 p.p. tiveram acurácia de até 72%. Os modelos para a taxa de infecção mais elevada (a partir de 10 p.p.) tiveram desempenho fraco. Os modelos mais bem avaliados mostraram ter potencial para servir como apoio na tomada de decisão referente à adoção de medidas de controle da ferrugem do cafeeiro. O processo de descoberta de conhecimento em bases de dados foi caracterizado, com a intenção de que possa vir a ser útil em aplicações semelhantes para outras culturas agrícolas ou para a própria cultura do café, no caso de outras doenças ou pragas
Abstract: Plant disease warning systems can contribute for diminishing the use of chemicals in agriculture, but they have received limited acceptance in practice. Complexity of models, difficulties in obtaining the required data and costs for the growers are among the reasons that inhibit their use. However, recent technological advance - automatic weather stations, databases, Web based agrometeorological monitoring and advanced techniques of data analysis - allows the development of a system with simple and free access. A process .instance of knowledge discovery in databases has been realized to evaluate the use of classification and decision tree induction in the analysis and warning of coffee rust caused by Hemileia vastatrix. Infection rates calculated from monthly assessments of rust incidence were grouped into three classes: TXl - reduction or stagnation; TX2 - moderate growth (up to 5 pp); and TX3 - accelerated growth (above 5 pp). Meteorological data, expected yield and space between plants were used as independent variables. The training data set contained 364 examples prepared from data collected in coffee-growing areas between October 1998 and October 2006. A decision tree has been developed to analyse the coffee rust epidemics. The decision tree demonstrated its potential as a symbolic and interpretable model. Its mo deI representation identified the existing decision boundaries in the data and the logic underlying them, helping to understand which variables, and interactions between these variables, led to, coffee rust epidemics in the field. The most important explanatory variables were mean temperature during leaf wetness periods, expected yield, mean of maximum temperatures during the incubation period and relative air humidity. The warning models have been developed considering binary infection rates, according to the 5 pp and 10 pp thresholds, (class '1' for rates greater than or equal the threshold; class 'O;, otherwise). These models are specific for growing are as with high expected yield or areas with low expected yield. The former had best performance in the evaluation. The estimated accuracy by cross-validation was up to 83%, considering the waming for 5 pp and higher. There was yet equivalence between accuracy and such important measures like sensitivity, specificity a~d positive or negative reliability. Considering the waming for 10 pp and higher, the accuracy was 79%. For growing areas with low expected yield, the accuracy of the models considering the waming for 5 pp and higher was up to 72%. The models for the higher infection rate (10 pp and higher) had low performance. The best evaluated models showed potential to be used in decision making about coffee rust disease control. The process of knowledge discovery in databases was characterized in such a way it can be employed in similar problems of the application domain with other crops or other coffee diseases or pests
Doutorado
Planejamento e Desenvolvimento Rural Sustentável
Doutor em Engenharia Agrícola

APA, Harvard, Vancouver, ISO, and other styles

24

Krueger, Kirk L. "Effects of Sampling Sufficiency and Model Selection on Predicting the Occurrence of Stream Fish Species at Large Spatial Extents." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/26214.

Full text

Abstract:

Knowledge of species occurrence is a prerequisite for efficient and effective conservation and management. Unfortunately, knowledge of species occurrence is usually insufficient, so models that use environmental predictors and species occurrence records are used to predict species occurrence. Predicting the occurrence of stream fishes is often difficult because sampling data insufficiently describe species occurrence and important environmental conditions and predictive models insufficiently describe relations between species and environmental conditions. This dissertation 1) examines the sufficiency of fish species occurrence records at four spatial extents in Virginia, 2) compares modeling methods for predicting stream fish occurrence, and 3) assesses relations between species traits and model prediction characteristics. The sufficiency of sampling is infrequently addressed at the large spatial extents at which many management and conservation actions take place. In the first chapter of this dissertation I examine factors that determine the sufficiency of sampling to describe stream fish species richness at four spatial extents across Virginia using sampling simulations. Few regions of Virginia are sufficiently sampled, portending difficulty in accurately predicting fish species occurrence in most regions. The sufficient number of samples is often large and varies among regions and spatial scales, but it can be substantially reduced by reducing errors of sampling omission and increasing the spatial coverage of samples. Many methods are used to predict species occurrence. In the second chapter of this dissertation I compare the accuracy of the predictions of occurrence of seven species in each of three regions using linear discriminant function, generalized linear, classification tree, and artificial neural network statistical models. I also assess the efficacy of stream classification methods for predicting species occurrence. No modeling method proved distinctly superior. Species occurrence data and predictor data quality and quantity limited the success of predictions of stream fish occurrence for all methods. How predictive models are built and applied may be more important than the statistical method used. The accuracy, generality (transferability), and resolution of predictions of species occurrence vary among species. The ability to anticipate and understand variation in prediction characteristics among species can facilitate the proper application of predictions of species occurrence. In the third chapter of this dissertation I describe some conservation implications of relations between predicted occurrence characteristics and species traits for fishes in the upper Tennessee River drainage. Usually weak relations and variation in the strength and direction of relations among families precludes the accurate prediction of predicted occurrence characteristics. Most predictions of species occurrence have insufficient accuracy and resolution to guide conservation decisions at fine spatial grains. Comparison of my results with alternative model predictions and the results of many models described in peer-reviewed journals suggests that this is a common problem. Predictions of species occurrence should be rigorously assessed and cautiously applied to conservation problems. Collectively, the three chapters of this dissertation demonstrate some important limitations of models that are used to predict species occurrence. Model predictions of species occurrence are often used in lieu of sufficient species occurrence data. However, regardless of the method used to predict species occurrence most predictions have relatively low accuracy, generality and resolution. Model predictions of species occurrence can facilitate management and conservation, but they should be rigorously assessed and applied cautiously.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

25

Julock, Gregory Alan. "The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/190.

Full text

Abstract:

Buffer Overflows are a common type of network intrusion attack that continue to plague the networked community. Unfortunately, this type of attack is not well detected with current data mining algorithms. This research investigated the use of Random Forests, an ensemble technique that creates multiple decision trees, and then votes for the best tree. The research Investigated Random Forests' effectiveness in detecting buffer overflows compared to other data mining methods such as CART and Naïve Bayes. Random Forests was used for variable reduction, cost sensitive classification was applied, and each method's detection performance compared and reported along with the receive operator characteristics. The experiment was able to show that Random Forests outperformed CART and Naïve Bayes in classification performance. Using a technique to obtain Buffer Overflow most important variables, Random Forests was also able to improve upon its Buffer Overflow classification performance.

APA, Harvard, Vancouver, ISO, and other styles

26

Moore, Cordelia Holly. "Defining and predicting species-environment relationships : understanding the spatial ecology of demersal fish communities." University of Western Australia. Faculty of Natural and Agricultural Sciences, 2009. http://theses.library.uwa.edu.au/adt-WU2010.0002.

Full text

Abstract:

[Truncated abstract] The aim of this research was to define key species-environment relationships to better understand the spatial ecology of demersal fish. To help understand these relationships a combination of multivariate analyses, landscape analysis and species distribution models were employed. Of particular interest was to establish the scale at which these species respond to their environment. With recent high resolution surveying and mapping of the benthos in five of Victoria's Marine National Parks (MNPs), full coverage bathymetry, terrain data and accurate predicted benthic habitat maps were available for each of these parks. This information proved invaluable to this research, providing detailed (1:25,000) benthic environmental data, which facilitated the development and implementation of a very targeted and robust sampling strategy for the demersal fish at Cape Howe MNP. The sampling strategy was designed to provide good spatial coverage of the park and to represent the park's dominant substrate types and benthic communities, whilst also satisfying the assumptions of the statistical and spatial analyses applied. The fish assemblage data was collected using baited remote underwater stereo-video systems (stereo- BRUVS), with a total of 237 one-hour drops collected. Analysis of the video footage identified 77 species belonging to 40 families with a total of 14,449 individual fish recorded. ... This research revealed that the statistical modelling techniques employed provided an accurate means for predicting species distributions. These predicted distributions will allow for more effective management of these species by providing a robust and spatially explicit map of their current distribution enabling the identification and prediction of future changes in these species distributions. This research demonstrated the importance of the benthic environment on the spatial distribution of demersal fish. The results revealed that different species responded to different scales of investigation and that all scales must be ix considered to establish the factors fish are responding to and the strength and nature of this response. Having individual, continuous and spatially explicit environmental measures provided a significant advantage over traditional measures that group environmental and biological factors into 'habitat type'. It enabled better identification of individual factors, or correlates, driving the distribution of demersal fish. The environmental and biological measures were found to be of ecological relevance to the species and the scale of investigation and offered a more informative description of the distributions of the species examined. The use of species distribution modelling provided a robust means for the characterisation of the nature and strength of these relationships. In addition, it enabled species distributions to be predicted accurately across unsampled locations. Outcomes of the project include a greater understanding of how the benthic environment influences the distribution of demersal fish and demonstrates a suite of robust and useful marine species distribution tools that may be used by researcher and managers to understand, monitor, manage and predict marine species distributions.

APA, Harvard, Vancouver, ISO, and other styles

27

Girard, Nathalie. "Vers une approche hybride mêlant arbre de classification et treillis de Galois pour de l'indexation d'images." Thesis, La Rochelle, 2013. http://www.theses.fr/2013LAROS402/document.

Full text

Abstract:

La classification d'images s'articule généralement autour des deux étapes que sont l'étape d'extraction de signatures suivie de l'étape d'analyse des données extraites, ces dernières étant généralement quantitatives. De nombreux modèles de classification ont été proposés dans la littérature, le choix du modèle le plus adapté est souvent guidé par les performances en classification ainsi que la lisibilité du modèle. L'arbre de classification et le treillis de Galois sont deux modèles symboliques connus pour leur lisibilité. Dans sa thèse [Guillas 2007], Guillas a utilisé efficacement les treillis de Galois pour la classification d'images, et des liens structurels forts avec les arbres de classification ont été mis en évidence. Les travaux présentés dans ce manuscrit font suite à ces résultats, et ont pour but de définir un modèle hybride entre ces deux modèles, qui réunissent leurs avantages (leur lisibilité respective, la robustesse du treillis et le faible espace mémoire de l'arbre). A ces fins, l'étude des liens existants entre les deux modèles a permis de mettre en avant leurs différences. Tout d'abord, le type de discrétisation, les arbres utilisent généralement une discrétisation locale tandis que les treillis, initialement définis pour des données binaires, utilisent une discrétisation globale. A partir d'une étude des propriétés des treillis dichotomiques (treillis définis après une discrétisation), nous proposons une discrétisation locale pour les treillis permettant d'améliorer ses performances en classification et de diminuer sa complexité structurelle. Puis, le processus de post-élagage mis en œuvre dans la plupart des arbres a pour objectif de diminuer la complexité de ces derniers, mais aussi d'augmenter leurs performances en généralisation. Les simplifications de la structure de treillis (exponentielle en la taille de données dans les pires cas), quant à elles, sont motivées uniquement par une diminution de la complexité structurelle. En combinant ces deux simplifications, nous proposons une simplification de la structure du treillis obtenue après notre discrétisation locale et aboutissant à un modèle de classification hybride qui profite de la lisibilité des deux modèles tout en étant moins complexe que le treillis mais aussi performant que celui-ci
Image classification is generally based on two steps namely the extraction of the image signature, followed by the extracted data analysis. Image signature is generally numerical. Many classification models have been proposed in the literature, among which most suitable choice is often guided by the classification performance and the model readability. Decision trees and Galois lattices are two symbolic models known for their readability. In her thesis {Guillas 2007}, Guillas efficiently used Galois lattices for image classification. Strong structural links between decision trees and Galois lattices have been highlighted. Accordingly, we are interested in comparing models in order to design a hybrid model between those two. The hybrid model will combine the advantages (robustness of the lattice, low memory space of the tree and readability of both). For this purpose, we study the links between the two models to highlight their differences. Firstly, the discretization type where decision trees generally use a local discretization while Galois lattices, originally defined for binary data, use a global discretization. From the study of the properties of dichotomic lattice (specific lattice defined after discretization), we propose a local discretization for lattice that allows us to improve its classification performances and reduces its structural complexity. Then, the process of post-pruning implemented in most of the decision trees aims to reduce the complexity of the latter, but also to improve their classification performances. Lattice filtering is solely motivated by a decrease in the structural complexity of the structures (exponential in the size of data in the worst case). By combining these two processes, we propose a simplification of the lattice structure constructed after our local discretization. This simplification leads to a hybrid classification model that takes advantage of both decision trees and Galois lattice. It is as readable as the last two, while being less complex than the lattice but also efficient

APA, Harvard, Vancouver, ISO, and other styles

28

Coelho, Fabrício Fernandes. "Comparação de métodos de mapeamento digital de solos através de variáveis geomorfométricas e sistemas de informações geográficas." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/25062.

Full text

Abstract:

Mapas pedológicos são fontes de informações primordiais para planejamento e manejo de uso do solo, porém apresentam altos custos de produção. A fim de produzir mapas de solos a partir de mapas existentes, o presente trabalho objetiva testar e comparar métodos de classificação em estágio único (regressões logísticas múltiplas multinomiais e Bayes) e em estágios múltiplos (CART, J48 e LMT) com utilização de sistemas de informações geográficas e de variáveis geomorfométricas para produção de mapas pedológicos com legenda original e simplificada. A base de dados foi gerenciada em ambiente ArcGis onde as variáveis e o mapa original foram relacionados através de amostras de treinamento para os algoritmos. O resultado dos algoritmos obtidos no software Weka foram implementados no ArcGis para a confecção dos mapas. Foram gerados matrizes de erros para análise de acurácias dos mapas. As variáveis geomorfométricas de declividade, perfil e plano de curvatura, elevação e índice de umidade topográfica são aquelas que melhor explicam a distribuição espacial das classes de solo. Os métodos de classificação em estágio múltiplo apresentaram sensíveis melhoras nas acurácias globais, porém significativas melhoras nos índices Kappa. A utilização de legenda simplificada aumentou significativamente as acurácias do produtor e do usuário, porém sensível melhora na acurácia global e índice Kappa.
Soil maps are sources of important information for land planning and management, but are expensive to produce. This study proposes testing and comparing single stage classification methods (multiple multinomial logistic regression and Bayes) and multiple stage classification methods (CART, J48 and LMT) using geographic information system and terrain parameters for producing soil maps with both original and simplified legend. In ArcGis environment terrain parameters and original soil map were sampled for training algoritms. The results from statistical software Weka were implemented in ArcGis environment to generate digital soil maps. Error matrices were genereted for analysis accuracies of the maps.The terrain parameters that best explained soil distribution were slope, profile and planar curvature, elevation, and topographic wetness index. The multiple stage classification methods showed small improvements in overall accuracies and large improvements in the Kappa index. Simplification of the original legend significantly increased the producer and user accuracies, however produced small improvements in overall accuracies and Kappa index.

APA, Harvard, Vancouver, ISO, and other styles

29

Vinnemeier, Christof David [Verfasser], Jürgen [Akademischer Betreuer] May, Uwe [Akademischer Betreuer] Groß, and Tim [Akademischer Betreuer] Friede. "Establishment of a clinical algorithm for the diagnosis of P. falciparum malaria in children from an endemic area using a Classification and Regression Tree (CART) model / Christof David Vinnemeier. Gutachter: Uwe Groß ; Tim Friede. Betreuer: Jürgen May." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2015. http://d-nb.info/1065882017/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Caetano, Mateus 1983. "Modelos de classificação : aplicações no setor bancário." [s.n.], 2015. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306286.

Full text

Abstract:

Orientadores: Antonio Carlos Moretti, Márcia Aparecida Gomes Ruggiero
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-26T18:03:59Z (GMT). No. of bitstreams: 1 Caetano_Mateus_M.pdf: 1249293 bytes, checksum: f8adb755363291250261872ea756f58c (MD5) Previous issue date: 2015
Resumo: Técnicas para solucionar problemas de classificação têm aplicações em diversas áreas, como concessão de crédito, reconhecimento de imagens, detecção de SPAM, entre outras. É uma área de intensa pesquisa, para a qual diversos métodos foram e continuam sendo desenvolvidos. Dado que não há um método que apresente o melhor desempenho para qualquer tipo de aplicação, diferentes métodos precisam ser comparados para que possamos encontrar o melhor ajuste para cada aplicação em particular. Neste trabalho estudamos seis diferentes métodos aplicados em problemas de classificação supervisionada (onde há uma resposta conhecida para o treinamento do modelo): Regressão Logística, Árvore de Decisão, Naive Bayes, KNN (k-Nearest Neighbors), Redes Neurais e Support Vector Machine. Aplicamos os métodos em três conjuntos de dados referentes à problemas de concessão de crédito e seleção de clientes para campanha de marketing bancário. Realizamos o pré-processamento dos dados para lidar com observações faltantes e classes desbalanceadas. Utilizamos técnicas de particionamento do conjunto de dados e diversas métricas, como acurácia, F1 e curva ROC, com o objetivo de avaliar os desempenhos dos métodos/técnicas. Comparamos, para cada problema, o desempenho dos diferentes métodos considerando as métricas selecionadas. Os resultados obtidos pelos melhores modelos de cada aplicação foram compatíveis com outros estudos que utilizaram os mesmos bancos de dados
Abstract: Techniques for classification problems have applications on many areas, such as credit risk evaluation, image recognition, SPAM detection, among others. It is an area of intense research, for which many methods were and continue to be developed. Given that there is not a method whose performance is better across any type of problems, different methods need to be compared in order to select the one that provides the best adjustment for each application in particular. In this work, we studied six different methods applied to supervised classification problems (when there is a known response for the model training): Logistic Regression, Decision Tree, Naive Bayes, KNN (k-Nearest Neighbors), Neural Networks and Support Vector Machine. We applied these methods on three data sets related to credit evaluation and customer selection for a banking marketing campaign. We made the data pre-processing to cope with missing data and unbalanced classes. We used data partitioning techniques and several metrics, as accuracy, F1 and ROC curve, in order to evaluate the methods/techniques performances. We compared, for each problem, the performances of the different methods using the selected metrics. The results obtained for the best models on each application were comparable to other studies that have used the same data sources
Mestrado
Matematica Aplicada
Mestra em Matemática Aplicada

APA, Harvard, Vancouver, ISO, and other styles

31

Sousa, Rogério Pereira de. "Classificação linear de bovinos: criação de um modelo de decisão baseado na conformação de tipo “true type” como auxiliar a tomada de decisão na seleção de bovinos leiteiros." Universidade do Vale do Rio dos Sinos, 2016. http://www.repositorio.jesuita.org.br/handle/UNISINOS/5896.

Full text

Abstract:

Submitted by Silvana Teresinha Dornelles Studzinski (sstudzinski) on 2016-11-01T15:54:48Z No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5)
Made available in DSpace on 2016-11-01T15:54:48Z (GMT). No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) Previous issue date: 2016-08-29
IFTO - Instituto Federal de Educação, Ciência e Tecnologia do Tocantins
A seleção de bovinos leiteiros, através da utilização do sistema de classificação com características lineares de tipo, reflete no ganho de produção, na vida produtiva do animal, na padronização do rebanho, entre outros. Esta pesquisa operacional obteve suas informações através de pesquisas bibliográficas e análise de base de dados de classificações reais. O presente estudo, objetivou a geração de um modelo de classificação de bovinos leiteiros baseado em “true type”, para auxiliar os avaliadores no processamento e análise dos dados, ajudando na tomada de decisão quanto a seleção da vaca para aptidão leiteira, tornando os dados seguros para futuras consultas. Nesta pesquisa, aplica-se métodos computacionais à classificação de vacas leiteiras mediante a utilização mineração de dados e lógica fuzzy. Para tanto, realizou-se a análise em uma base de dado com 144 registros de animais classificados entre as categorias boa e excelente. A análise ocorreu com a utilização da ferramenta WEKA para extração de regras de associação com o algoritmo apriori, utilizando como métricas objetivas, suporte / confiança, e lift para determinar o grau de dependência da regra. Para criação do modelo de decisão com lógica fuzzy, fez-se uso da ferramenta R utilizando o pacote sets. Por meio dos resultados obtidos na mineração de regras, foi possível identificar regras relevantes ao modelo de classificação com confiança acima de 90%, indicando que as características avaliadas (antecedente) implicam em outras características (consequente), com uma confiança alta. Quanto aos resultados obtidos pelo modelo de decisão fuzzy, observa-se que, o modelo de classificação baseado em avaliações subjetivas fica suscetível a erros de classificação, sugerindo então o uso de resultados obtidos por regras de associação como forma de auxílio objetivo na classificação final da vaca para aptidão leiteira.
The selection of dairy cattle through the use of the rating system with linear type traits, reflected in increased production, the productive life of the animal, the standardization of the flock, among others. This operational research obtained their information through library research and basic analysis of actual ratings data. This study aimed to generate a dairy cattle classification model based on "true type" to assist the evaluators in the processing and analysis of data, helping in decision making and the selection of the cow to milk fitness, making the data safe for future reference. In this research, applies computational methods to the classification of dairy cows by using data mining and fuzzy logic. Therefore, we conducted the analysis on a data base with 144 animals records classified between good and excellent categories. Analysis is made with the use of WEKA tool for extraction of association rules with Apriori algorithm, using as objective metrics, support / confidence and lift to determine the degree of dependency rule. To create the decision model with fuzzy logic, it was made use of R using the tool sets package. Through the results obtained in the mining rules, it was possible to identify the relevant rules with confidence classification model above 90%, indicating that the characteristics assessed (antecedent) imply other characteristics (consequent), with a high confidence. As for the results obtained by the fuzzy decision model, it is observed that the classification model based on subjective assessments is susceptible to misclassification, suggesting then the use of results obtained by association rules as a way to aid goal in the final classification cow for dairy fitness

APA, Harvard, Vancouver, ISO, and other styles

32

Lin, Shu-Chuan. "Robust estimation for spatial models and the skill test for disease diagnosis." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26681.

Full text

Abstract:

Thesis (Ph.D)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Lu, Jye-Chyi; Committee Co-Chair: Kvam, Paul; Committee Member: Mei, Yajun; Committee Member: Serban, Nicoleta; Committee Member: Vidakovic, Brani. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

33

Ataky, Steve Tsham Mpinda. "Análise de dados sequenciais heterogêneos baseada em árvore de decisão e modelos de Markov : aplicação na logística de transporte." Universidade Federal de São Carlos, 2015. https://repositorio.ufscar.br/handle/ufscar/7242.

Full text

Abstract:

Submitted by Bruna Rodrigues (bruna92rodrigues@yahoo.com.br) on 2016-09-16T12:52:39Z No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:28Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:34Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)
Made available in DSpace on 2016-09-16T19:59:41Z (GMT). No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5) Previous issue date: 2015-10-16
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Latterly, the development of data mining techniques has emerged in many applications’ fields with aim at analyzing large volumes of data which may be simple and / or complex. The logistics of transport, the railway setor in particular, is a sector with such a characteristic in that the data available in are of varied natures (classic variables such as top speed or type of train, symbolic variables such as the set of routes traveled by train, degree of tack, etc.). As part of this dissertation, one addresses the problem of classification and prediction of heterogeneous data; it is proposed to study through two main approaches. First, an automatic classification approach was implemented based on classification tree technique, which also allows new data to be efficiently integrated into partitions initialized beforehand. The second contribution of this work concerns the analysis of sequence data. It has been proposed to combine the above classification method with Markov models for obtaining a time series (temporal sequences) partition in homogeneous and significant groups based on probabilities. The resulting model offers good interpretation of classes built and allows us to estimate the evolution of the sequences of a particular vehicle. Both approaches were then applied onto real data from the a Brazilian railway information system company in the spirit of supporting the strategic management of planning and coherent prediction. This work is to initially provide a thinner type of planning to solve the problems associated with the existing classification in homogeneous circulations groups. Second, it sought to define a typology of train paths (sucession traffic of the same train) in order to provide or predict the next movement of statistical characteristics of a train carrying the same route. The general methodology provides a supportive environment for decision-making to monitor and control the planning organization. Thereby, a formula with two variants was proposed to calculate the adhesion degree between the track effectively carried out or being carried out with the planned one.
Nos últimos anos aflorou o desenvolvimento de técnicas de mineração de dados em muitos domínios de aplicação com finalidade de analisar grandes volumes de dados, os quais podendo ser simples e/ou complexos. A logística de transporte, o setor ferroviário em particular, é uma área com tal característica em que os dados disponíveis são muitos e de variadas naturezas (variáveis clássicas como velocidade máxima ou tipo de trem, variáveis simbólicas como o conjunto de vias percorridas pelo trem, etc). Como parte desta dissertação, aborda-se o problema de classificação e previsão de dados heterogêneos, propõe-se estudar através de duas abordagens principais. Primeiramente, foi utilizada uma abordagem de classificação automática com base na técnica por ´arvore de classificação, a qual também permite que novos dados sejam eficientemente integradas nas partições inicial. A segunda contribuição deste trabalho diz respeito à análise de dados sequenciais. Propôs-se a combinar o método de classificação anterior com modelos de Markov para obter uma participação de sequências temporais em grupos homogêneos e significativos com base nas probabilidades. O modelo resultante oferece uma boa interpretação das classes construídas e permite estimar a evolução das sequências de um determinado veículo. Ambas as abordagens foram então aplicadas nos dados do sistema de informação ferroviário, no espírito de dar apoio à gestão estratégica de planejamentos e previsões aderentes. Este trabalho consiste em fornecer inicialmente uma tipologia mais fina de planejamento para resolver os problemas associados com a classificação existente em grupos de circulações homogêneos. Em segundo lugar, buscou-se definir uma tipologia de trajetórias de trens (sucessão de circulações de um mesmo trem) para assim fornecer ou prever características estatísticas da próxima circulação mais provável de um trem realizando o mesmo percurso. A metodologia geral proporciona um ambiente de apoio à decisão para o monitoramento e controle da organização de planejamento. Deste fato, uma fórmula com duas variantes foi proposta para calcular o grau de aderência entre a trajetória efetivamente realizada ou em curso de realização com o planejado.

APA, Harvard, Vancouver, ISO, and other styles

34

Peroutka, Lukáš. "Návrh a implementace Data Mining modelu v technologii MS SQL Server." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-199081.

Full text

Abstract:

This thesis focuses on design and implementation of a data mining solution with real-world data. The task is analysed, processed and its results evaluated. The mined data set contains study records of students from University of Economics, Prague (VŠE) over the course of past three years. First part of the thesis focuses on theory of data mining, definition of the term, history and development of this particular field. Current best practices and meth-odology are described, as well as methods for determining the quality of data and methods for data pre-processing ahead of the actual data mining task. The most common data mining techniques are introduced, including their basic concepts, advantages and disadvantages. The theoretical basis is then used to implement a concrete data mining solution with educational data. The source data set is described, analysed and some of the data are chosen as input for created models. The solution is based on MS SQL Server data mining platform and it's goal is to find, describe and analyse potential as-sociations and dependencies in data. Results of respective models are evaluated, including their potential added value. Also mentioned are possible extensions and suggestions for further development of the solution.

APA, Harvard, Vancouver, ISO, and other styles

35

Cardoso, Diego Soares. "Política antitruste e sua consistência: uma análise das decisões do Sistema Brasileiro de Defesa da Concorrência relativas aos Atos de Concentração." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/2168.

Full text

Abstract:

Made available in DSpace on 2016-06-02T19:33:12Z (GMT). No. of bitstreams: 1 CARDOSO_Diego_2013.pdf: 1706794 bytes, checksum: 52ad0ebf4915ad86f6ac9a9529176b01 (MD5) Previous issue date: 2013-05-20
Financiadora de Estudos e Projetos
The goal of competition policy, also known as antitrust policy, is promoting the welfare and economic efficiency by preserving fair competition in markets. Merger control is one of the main responsibilities of antitrust institutions. Prohibitions and restrictions of merger operations affect market structures, thus making these decisions relevant to economic agents. This Master's thesis analyzes the decisions made by Brazilian antitrust institutions regarding merger processes. Data was collected from public documents issued from 2004 to 2011. Bivariate analysis, discrete choice models and classification decision trees show that these merger control decisions are consistent with Brazilian antitrust law. Consistent competition policy reduces uncertainty, aligns expectations and increases the efficiency of antitrust law enforcement. Therefore, this research contributes to better understanding Brazilian competition policy related to merger control and its decision drivers.
As políticas de defesa da concorrência, ou políticas antitruste, visam ao maior bem-estar social por meio da manutenção de ambientes concorrenciais que promovam a eficiência econômica. No Brasil, os órgãos que compõem o Sistema Brasileiro de Defesa da Concorrência são os responsáveis pelas decisões sobre os agentes econômicos a fim de atingir os objetivos das políticas antitruste. Nesse âmbito, as decisões que influenciam a estrutura de mercados por meio das restrições e vetos a processos como fusões e aquisições de empresas - os julgamentos de Atos de Concentração - apresentam elevada relevância. Este trabalho realiza uma avaliação das decisões do Sistema Brasileiro de Defesa da Concorrência relativas aos Atos de Concentração. Para tal, foram coletados dados a partir dos documentos públicos emitidos pelos órgãos antitruste no período entre 2004 e 2011. Por meio da aplicação de modelos de regressão de escolha discreta e árvores de decisão induzidas, verificou-se que tais decisões são consistentes com as regras antitruste brasileiras. A consistência com regras estabelecidas possibilita uma maior eficiência na aplicação das políticas de defesa da concorrência, uma vez que reduz as incertezas dos agentes econômicos, alinha as expectativas e facilita a condução dos processos. Nesse sentido, esta investigação contribui para uma melhor compreensão dos fatores que influenciam as decisões dos órgãos brasileiros de defesa da concorrência, oferecendo também indicativos que auxiliam na verificação da eficiência da aplicação de tais políticas.

APA, Harvard, Vancouver, ISO, and other styles

36

Sutton-Charani, Nicolas. "Apprentissage à partir de données et de connaissances incertaines : application à la prédiction de la qualité du caoutchouc." Thesis, Compiègne, 2014. http://www.theses.fr/2014COMP1835/document.

Full text

Abstract:

Pour l’apprentissage de modèles prédictifs, la qualité des données disponibles joue un rôle important quant à la fiabilité des prédictions obtenues. Ces données d’apprentissage ont, en pratique, l’inconvénient d’être très souvent imparfaites ou incertaines (imprécises, bruitées, etc). Ce travail de doctorat s’inscrit dans ce cadre où la théorie des fonctions de croyance est utilisée de manière à adapter des outils statistiques classiques aux données incertaines.Le modèle prédictif choisi est l’arbre de décision qui est un classifieur basique de l’intelligence artificielle mais qui est habituellement construit à partir de données précises. Le but de la méthodologie principale développée dans cette thèse est de généraliser les arbres de décision aux données incertaines (floues, probabilistes,manquantes, etc) en entrée et en sortie. L’outil central d’extension des arbres de décision aux données incertaines est une vraisemblance adaptée aux fonctions de croyance récemment proposée dans la littérature dont certaines propriétés sont ici étudiées de manière approfondie. De manière à estimer les différents paramètres d’un arbre de décision, cette vraisemblance est maximisée via l’algorithme E2M qui étend l’algorithme EM aux fonctions de croyance. La nouvelle méthodologie ainsi présentée, les arbres de décision E2M, est ensuite appliquée à un cas réel : la prédiction de la qualité du caoutchouc naturel. Les données d’apprentissage, essentiellement culturales et climatiques, présentent de nombreuses incertitudes qui sont modélisées par des fonctions de croyance adaptées à ces imperfections. Après une étude statistique standard de ces données, des arbres de décision E2M sont construits et évalués en comparaison d’arbres de décision classiques. Cette prise en compte des incertitudes des données permet ainsi d’améliorer très légèrement la qualité de prédiction mais apporte surtout des informations concernant certaines variables peu prises en compte jusqu’ici par les experts du caoutchouc
During the learning of predictive models, the quality of available data is essential for the reliability of obtained predictions. These learning data are, in practice very often imperfect or uncertain (imprecise, noised, etc). This PhD thesis is focused on this context where the theory of belief functions is used in order to adapt standard statistical tools to uncertain data.The chosen predictive model is decision trees which are basic classifiers in Artificial Intelligence initially conceived to be built from precise data. The aim of the main methodology developed in this thesis is to generalise decision trees to uncertain data (fuzzy, probabilistic, missing, etc) in input and in output. To realise this extension to uncertain data, the main tool is a likelihood adapted to belief functions,recently presented in the literature, whose behaviour is here studied. The maximisation of this likelihood provide estimators of the trees’ parameters. This maximisation is obtained via the E2M algorithm which is an extension of the EM algorithm to belief functions.The presented methodology, the E2M decision trees, is applied to a real case : the natural rubber quality prediction. The learning data, mainly cultural and climatic,contains many uncertainties which are modelled by belief functions adapted to those imperfections. After a simple descriptiv statistic study of the data, E2M decision trees are built, evaluated and compared to standard decision trees. The taken into account of the data uncertainty slightly improves the predictive accuracy but moreover, the importance of some variables, sparsely studied until now, is highlighted

APA, Harvard, Vancouver, ISO, and other styles

37

Fearer, Todd Matthew. "Evaluating Population-Habitat Relationships of Forest Breeding Birds at Multiple Spatial and Temporal Scales Using Forest Inventory and Analysis Data." Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/29243.

Full text

Abstract:

Multiple studies have documented declines of forest breeding birds in the eastern United States, but the temporal and spatial scales of most studies limit inference regarding large scale bird-habitat trends. A potential solution to this challenge is integrating existing long-term datasets such as the U.S. Forest Service Forest Inventory and Analysis (FIA) program and U.S. Geological Survey Breeding Bird Survey (BBS) that span large geographic regions. The purposes of this study were to determine if FIA metrics can be related to BBS population indices at multiple spatial and temporal scales and to develop predictive models from these relationships that identify forest conditions favorable to forest songbirds. I accumulated annual route-level BBS data for 4 species guilds (canopy nesting, ground and shrub nesting, cavity nesting, early successional), each containing a minimum of five bird species, from 1966-2004. I developed 41 forest variables describing forest structure at the county level using FIA data from for the 2000 inventory cycle within 5 physiographic regions in 14 states (AL, GA, IL, IN, KY, MD, NC, NY, OH, PA, SC, TN, VA, and WV). I examine spatial relationships between the BBS and FIA data at 3 hierarchical scales: 1) individual BBS routes, 2) FIA units, and 3) and physiographic sections. At the BBS route scale, I buffered each BBS route with a 100m, 1km, and 10km buffer, intersected these buffers with the county boundaries, and developed a weighted average for each forest variable within each buffer, with the weight being a function of the percent of area each county had within a given buffer. I calculated 28 variables describing landscape structure from 1992 NLCD imagery using Fragstats within each buffer size. I developed predictive models relating spatial variations in bird occupancy and abundance to changes in forest and landscape structure using logistic regression and classification and regression trees (CART). Models were developed for each of the 3 buffer sizes, and I pooled the variables selected for the individual models and used them to develop multiscale models with the BBS route still serving as the sample unit. At the FIA unit and physiographic section scales I calculated average abundance/route for each bird species within each FIA unit and physiographic section and extrapolated the plot-level FIA variables to the FIA unit and physiographic section levels. Landscape variables were recalculated within each unit and section using NCLD imagery resampled to a 400 m pixel size. I used regression trees (FIA unit scale) and general linear models (GLM, physiographic section scale) to relate spatial variations in bird abundance to the forest and landscape variables. I examined temporal relationships between the BBS and FIA data between 1966 and 2000. I developed 13 forest variables from statistical summary reports for 4 FIA inventory cycles (1965, 1975, 1989, and 2000) within NY, PA, MD, and WV. I used linear interpolation to estimate annual values of each FIA variable between successive inventory cycles and GLMs to relate annual variations in bird abundance to the forest variables. At the BBS route scale, the CART models accounted for > 50% of the variation in bird presence-absence and abundance. The logistic regression models had sensitivity and specificity rates > 0.50. By incorporating the variables selected for the models developed within each buffer (100m, 1km, and 10km) around the BBS routes into a multiscale model, I was able to further improve the performance of many of the models and gain additional insight regarding the contribution of multiscale influences on bird-habitat relationships. The majority of the best CART models tended to be the multiscale models, and many of the multiscale logistic models had greater sensitivity and specificity than their single-scale counter parts. The relatively fine resolution and extensive coverage of the BBS, FIA, and NLCD datasets coupled with the overlapping multiscale approach of these analyses allowed me to incorporate levels of variation in both habitat and bird occurrence and abundance into my models that likely represented a more comprehensive range of ecological variability in the bird-habitat relationships relative to studies conducted at smaller scales and/or using data at coarser resolutions. At the FIA unit and physiographic section scales, the regression trees accounted for an average of 54.1% of the variability in bird abundance among FIA units, and the GLMs accounted for an average of 66.3% of the variability among physiographic sections. However, increasing the observational and analytical scale to the FIA unit and physiographic section decreased the measurement resolution of the bird abundance and landscape variables. This limits the applicability and interpretive strength of the models developed at these scales, but they may serve as indices to those habitat components exerting the greatest influences on bird abundance at these broader scales. The GLMs relating average annual bird abundance to annual estimates of forest variables developed using statistical report data from the 1965, 1975, 1989, and 2000 FIA inventories explained an average of 62.0% of the variability in annual bird abundance estimates. However, these relationships were a function of both the general habitat characteristics and the trends in bird abundance specific to the 4-state region (MD, NY, PA, and WV) used for these analyses and may not be applicable to other states or regions. The small suite of variables available from the FIA statistical reports and multicollinearity among all forest variables further limited the applicability of these models. As with those developed at the FIA unit and physiographic sections scales, these models may serve as general indices to the habitat components exerting the greatest influences on bird abundance trends through time at regional scales. These results demonstrate that forest variables developed from the FIA, in conjunction with landscape variables, can explain variations in occupancy and abundance estimated from BBS data for forest bird species with a variety of habitat requirements across spatial and temporal scales.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

38

Hu, Wenbiao. "Applications of Spatio-temporal Analytical Methods in Surveillance of Ross River Virus Disease." Queensland University of Technology, 2005. http://eprints.qut.edu.au/16109/.

Full text

Abstract:

The incidence of many arboviral diseases is largely associated with social and environmental conditions. Ross River virus (RRV) is the most prevalent arboviral disease in Australia. It has long been recognised that the transmission pattern of RRV is sensitive to socio-ecological factors including climate variation, population movement, mosquito-density and vegetation types. This study aimed to assess the relationships between socio-environmental variability and the transmission of RRV using spatio-temporal analytic methods. Computerised data files of daily RRV disease cases and daily climatic variables in Brisbane, Queensland during 1985-2001 were obtained from the Queensland Department of Health and the Australian Bureau of Meteorology, respectively. Available information on other socio-ecological factors was also collected from relevant government agencies as follows: 1) socio-demographic data from the Australia Bureau of Statistics; 2) information on vegetation (littoral wetlands, ephemeral wetlands, open freshwater, riparian vegetation, melaleuca open forests, wet eucalypt, open forests and other bushland) from Brisbane City Council; 3) tidal activities from the Queensland Department of Transport; and 4) mosquito-density from Brisbane City Council. Principal components analysis (PCA) was used as an exploratory technique for discovering spatial and temporal pattern of RRV distribution. The PCA results show that the first principal component accounted for approximately 57% of the information, which contained the four seasonal rates and loaded highest and positively for autumn. K-means cluster analysis indicates that the seasonality of RRV is characterised by three groups with high, medium and low incidence of disease, and it suggests that there are at least three different disease ecologies. The variation in spatio-temporal patterns of RRV indicates a complex ecology that is unlikely to be explained by a single dominant transmission route across these three groupings. Therefore, there is need to explore socio-economic and environmental determinants of RRV disease at the statistical local area (SLA) level. Spatial distribution analysis and multiple negative binomial regression models were employed to identify the socio-economic and environmental determinants of RRV disease at both the city and local (ie, SLA) levels. The results show that RRV activity was primarily concentrated in the northeast, northwest and southeast areas in Brisbane. The negative binomial regression models reveal that RRV incidence for the whole of the Brisbane area was significantly associated with Southern Oscillation Index (SOI) at a lag of 3 months (Relative Risk (RR): 1.12; 95% confidence interval (CI): 1.06 - 1.17), the proportion of people with lower levels of education (RR: 1.02; 95% CI: 1.01 - 1.03), the proportion of labour workers (RR: 0.97; 95% CI: 0.95 - 1.00) and vegetation density (RR: 1.02; 95% CI: 1.00 - 1.04). However, RRV incidence for high risk areas (ie, SLAs with higher incidence of RRV) was significantly associated with mosquito density (RR: 1.01; 95% CI: 1.00 - 1.01), SOI at a lag of 3 months (RR: 1.48; 95% CI: 1.23 - 1.78), human population density (RR: 3.77; 95% CI: 1.35 - 10.51), the proportion of indigenous population (RR: 0.56; 95% CI: 0.37 - 0.87) and the proportion of overseas visitors (RR: 0.57; 95% CI: 0.35 - 0.92). It is acknowledged that some of these risk factors, while statistically significant, are small in magnitude. However, given the high incidence of RRV, they may still be important in practice. The results of this study suggest that the spatial pattern of RRV disease in Brisbane is determined by a combination of ecological, socio-economic and environmental factors. The possibility of developing an epidemic forecasting system for RRV disease was explored using the multivariate Seasonal Auto-regressive Integrated Moving Average (SARIMA) technique. The results of this study suggest that climatic variability, particularly precipitation, may have played a significant role in the transmission of RRV disease in Brisbane. This finding cannot entirely be explained by confounding factors such as other socio-ecological conditions because they have been unlikely to change dramatically on a monthly time scale in this city over the past two decades. SARIMA models show that monthly precipitation at a lag 2 months (=0.004,p=0.031) was statistically significantly associated with RRV disease. It suggests that there may be 50 more cases a year for an increase of 100 mm precipitation on average in Brisbane. The predictive values in the model were generally consistent with actual values (root-mean-square error (RMSE): 1.96). Therefore, this model may have applications as a decision support tool in disease control and risk-management planning programs in Brisbane. The Polynomial distributed lag (PDL) time series regression models were performed to examine the associations between rainfall, mosquito density and the occurrence of RRV after adjusting for season and auto-correlation. The PDL model was used because rainfall and mosquito density can affect not merely RRV occurring in the same month, but in several subsequent months. The rationale for the use of the PDL technique is that it increases the precision of the estimates. We developed an epidemic forecasting model to predict incidence of RRV disease. The results show that 95% and 85% of the variation in the RRV disease was accounted for by the mosquito density and rainfall, respectively. The predictive values in the model were generally consistent with actual values (RMSE: 1.25). The model diagnosis reveals that the residuals were randomly distributed with no significant auto-correlation. The results of this study suggest that PDL models may be better than SARIMA models (R-square increased and RMSE decreased). The findings of this study may facilitate the development of early warning systems for the control and prevention of this widespread disease. Further analyses were conducted using classification trees to identify major mosquito species of Ross River virus (RRV) transmission and explore the threshold of mosquito density for RRV disease in Brisbane, Australia. The results show that Ochlerotatus vigilax (RR: 1.028; 95% CI: 1.001 - 1.057) and Culex annulirostris (RR: 1.013, 95% CI: 1.003 - 1.023) were significantly associated with RRV disease cycles at a lag of 1 month. The presence of RRV was associated with average monthly mosquito density of 72 Ochlerotatus vigilax and 52 Culex annulirostris per light trap. These results may also have applications as a decision support tool in disease control and risk management planning programs. As RRV has significant impact on population health, industry, and tourism, it is important to develop an epidemic forecast system for this disease. The results of this study show the disease surveillance data can be integrated with social, biological and environmental databases. These data can provide additional input into the development of epidemic forecasting models. These attempts may have significant implications in environmental health decision-making and practices, and may help health authorities determine public health priorities more wisely and use resources more effectively and efficiently.

APA, Harvard, Vancouver, ISO, and other styles

39

Bretschneider, Jörg. "Ein wellenbasiertes stochastisches Modell zur Vorhersage der Erdbebenlast." Doctoral thesis, Technische Universität Dresden, 2006. https://tud.qucosa.de/id/qucosa%3A25000.

Full text

Abstract:

Starke Erdbeben stellen weltweit ein hohes Risiko für urbane Zentren dar, dem unter anderem durch Methoden der aseismischen Bauwerksbemessung begegnet wird. Grundlage hierfür bilden Annahmen und Erfahrungswissen über die lokale seismische Bodenbeschleunigung, Grenzen sind hingegen durch die zusätzlichen Kosten gesetzt. Die Schadensbilanz der Starkbeben der letzten Jahre, auch in den Industrieländern, verdeutlicht die Notwendigkeit, die Konzepte und Methoden des erdbebensicheren Bauens weiter zu verfeinern. In dieser Arbeit wird ein neuer Ansatz zur stochastischen seismischen Lastmodellierung vorgestellt, der über die übliche Annahme eines stationären, eindimensionalen Prozesses für die Bodenbeschleunigung hinausgeht. Ziel ist eine standort- und wellenspezifische räumliche Lastmodellierung, die durch Nutzung von Informationen über physikalische Invarianten eine transparente und kostengünstige aseismische Bauwerksbemessung erlaubt, zumindest aber das Risiko gegenüber gebräuchlichen Bemessungsmethoden reduziert. Solche seismischen und geotechnischen Invarianten sind die gesetzmäßige Struktur des seismischen Wellenfeldes sowie die Resonanzeigenschaften der Bodenschichtung am Standort. Das vorgeschlagene Lastmodell bildet das Wellenfeld am Standort als Komposition stochastischer evolutionärer Teilprozesse auf zeitveränderlichen Hauptachsen ab, die zu Wellenzügen mit jeweils spezifischer Lastcharakteristik korrespondieren. Diese Lastcharakteristik wird sowohl im Frequenz- und Zeitbereich als auch räumlich durch wellenspezifische Formfunktionen beschrieben, deren Parameter stark zu seismischen und geotechnischen Größen korrelieren. Schwerpunkt der Arbeit sind neuartige, korrelationsbasierte Schätzverfahren zur empirischen Spezifikation der Modellparameter für die Baupraxis. Das spektraladaptive Korrelations-Hauptachsenschätzverfahren (SAPCA) sichert die optimale Erfassung der räumlichen Wellenzüge durch Transformation der Messung auf Referenzkomponenten. Gleichzeitig liefert es - in Verbindung mit einem Korrekturverfahren für den Streichwinkel der Hauptachse - prägnante, assoziierte Hauptachsenverlaufsmuster, anhand derer Dominanzphasen für drei verallgemeinerte Wellenzüge zuverlässig identifiziert werden können. Innerhalb dieser Dominanzphasen werden die wellenzugspezifischen Parameter des Lastmodells bestimmt. Außerdem wird ein Algorithmus angegeben, um Rayleighwellen in Einzelmessungen zu identifizieren. Die Eignung des Modellansatzes und die Effizienz der Schätzverfahren werden anhand von Starkbebenmessungen des Northridge-Erdbebens 1994 verifiziert. Mit dem vorgestellten nichtstationären Modellansatz werden in herkömmlichen stochastischen Lastmodellen unterschätzte Lastanteile des Starkbebenwellenfeldes genauer abgebildet. Bisher unterschlagene oder pauschal modellierte Lastanteile werden erstmals der Analyse und Modellierung zugänglich gemacht. Das stochastische Modell wird bezüglich der wichtigsten lastgenerierenden Effekte physikalisch transparent und dadurch - trotz höherer Komplexität - in der Ingenieurpraxis besser handhabbar. Die Hauptachsenmethode (SAPCA) eignet sich auch für seismologische Analysen im Nahbereich, etwa zur Analyse von Bruchprozessen und topographischen Standorteffekten.
Strong earthquakes are a potential high risk for urban centres worldwide, which is, amongst others, confronted by methods of aseismic structural design. This is based on both assumptions and thorough knowledge about local seismic ground acceleration; limits are set, on the other side, by additional costs. Damage balance of recent strong quakes - also in industrialized countries - emphasize the need for further refinement of concepts and methods of earthquake resistant structural design. In this work, a new approach of stochastic seismic load modelling is presented, letting go the usual presupposition of a stationary, one-dimensional stochastic process for ground acceleration. The goal is site and wave specific load modelling, using information about physical and geotechnical invariants, which enables transparency and low cost approaches in aseismic structural design, but at least reduces seismic risk in comparison to common design methods. Those physical and geotechnical invariants are the structure of the seismic wave field according to physical laws as well as resonance properties of the soil strata at the local site. The proposed load model represents the local wave field as a composition of stochastic evolutionary sub-processes upon time-variant principal axes, which correspond to wave trains with specific load characteristics. Those load characteristics are described in the frequency and time as well as in the spatial domain by wave-specific shape functions, whose parameters strongly correlate to seismic and geotechnical entities. Main contributions of the work are newly developed estimation procedures based on correlation, which serve in the framework of empirical specification of the model parameters for the building practice. The Spectral-Adaptive Principal Correlation Axes (SAPCA) algorithm ensures an optimal covering of the spatial wave trains by transforming the recorded data onto Reference Components. At the same time - in connection with a correction algorithm for the strike angle of the principal axis - it delivers concise associated patterns in the course of the principal axis, which are in turn used to reliably identify dominance phases for three generalized wave trains. Within those wave dominance phases, the wave specific parameters of the load model are determined. Additionally, an algorithm is presented to identify Rayleigh waves in single site acceleration records. Adequacy of the modelling approach and efficiency of the estimation procedures are verified by means of strong motion records from the 1994 Northridge Earthquake The proposed non-stationary modelling approach describes with more accuracy load portions of the strong motion wave field underestimated in conventional stochastic load models. Load portions which are left out or lump-sum modelled so far are made available for analysis and modelling for the first time. The stochastic model gains physical transparency with respect to the most important load generating effects, and hence will be - despite higher complexity - easy to handle in engineering practice. The Principal Axis method will also be useful for seismological analyses in the near field, e.g., for the analysis of rupture processes and topographic site effects.
Des séismes forts sont un gros risque potentiel pour des centres urbains dans le monde entier, qui est, entre autres, confronté par des méthodes de conception aséismique de bâtiments. Ceci est fondé sur des hypothèses et la connaissance profonde au sujet de l'accélération séismique au sol locale. Limites sont placées, de l'autre côté, par des coûts additionnels. Les dommages des séismes forts récents, aussi dans les pays industrialisés, soulignent la nécessité de raffiner plus loin les concepts et les méthodes de conception aséismique de bâtiments. Dans cette oeuvre, une nouvelle approche à la modélisation stochastique de la charge séismique est présentée, qui renonce la présupposition habituelle d'un processus stationnaire et unidimensionnel pour l'accélération de sol. L'objectif est une modélisation spatiale de charge, spécifique d'ondes et de site, qui, par l'utilisation des informations sur des invariantes physiques, permet une mesure de bâtiment asismique transparente et économique, au moins toutefois réduit le risque par rapport aux méthodes de mesures courantes. De tels invariants séismiques et géotechniques sont la structure du champ des ondes séismiques déterminé par les lois de la physique et les qualités de résonance de la stratification de sol locale. Le modèle de charge proposé décrit le champ des ondes au site comme composition des sous-processes évolutionnaires stochastiques sur les axes principales variables dans le temps, qui correspondent aux trains des ondes qu'ont une caractéristique de charge respectivement spécifique. Cette caractéristique de charge est décrit dans le domaine temporel et de fréquence et aussi bien que spatial par les fonctions de forme spécifique d'ondes dont les paramètres corrèlent fortement à des dimensions séismiques et géotechniques. Une priorité d'oeuvre sont des nouvelles procédures d'estimation, pour la spécification empirique des paramètres de modèle pour la pratique de construction, qui se basent sur la sur la corrélation de croix de composante. La procédure adaptative spectrale d'estimation d'axes principals de corrélation (SAPCA) assure la saisie optimale des trains des ondes spatiaux par la transformation des enregistrements sur des composantes de référence. En même temps - en relation avec une procédure de correction d'angle égal d'axe prin¬ci¬pal - il livre des concises schémas associés de cours d'axes principals, au moyen de ceux peut être identifié fiable des phases de dominance pour trois trains généralisés des ondes. Dans ces phases de dominance, les paramètres du modèle de charge spécifiques pour chaque train des ondes sont déterminés. En outre, un algorithme est indiqué, pour identi¬fier des ondes de Rayleigh dans un enregistrement individuel de l'accélération de sol. La qualification de l'approche de modèle et l'efficience des procédures d'estimation sont vérifiées au moyen d'enregistrements de tremblement fort du séisme á Northridge 1994. Avec l'approche de modèle non-stationnaire présentée, tels des parts de charge du champ des ondes sismiques forts sont décrites plus précisément qui sont sous-estimées dans les modèles de charge stochastiques habituels. Des parts de charge q'ont été supprimés ou modelées forfaitairement jusqu'ici, sont rendues accessibles à l'analyse et à la modélisation pour la première fois. Le modèle stochastique devient physico-transparent concernant les effets les plus importants, produisants une charge sur le bâtiment, et ainsi - malgré la complexité plus élevée - mieux maniable en pratique d'ingénieur. La méthode d'axes principals adaptative spectrale (SAPCA) convient aussi pour des analyses sismologiques dans la proximité d'epicentre, par exemple à l'analyse des processus de rupture et des effets de site topographiques.
Por todo el mundo, los terremotos fuertes son un alto riesgo potencial para los centros urbanos, que está, entre otros, enfrentado por métodos de diseño estructural antisísmico. Estos métodos son basa en asunciones y conocimiento fundamentado sobre la aceleración de tierra sísmica local; los límites son fijados, en el otro lado, por costes adicionales. Balance de los daños de temblores fuertes recientes - también en países industrializados - acentúe la necesidad del refinamiento adicional de conceptos y de métodos de diseño estructural resistente del terremoto. En este trabajo, una nueva aproximación de modelar estocástico de la carga sísmica se presenta, superando la presuposición generalmente de un proceso estocástico unidimensional y estacionario para la aceleración de tierra. La meta avisada es un modelo de la carga específico del sitio y de las ondas que, con la información sobre las invariantes físicas y geotécnicas, permite las aproximaciones transparentes y económicas, en diseño estructural antisísmico; pero por lo menos reduce el riesgo sísmico en la comparación a los métodos usados de diseño. Esos invariantes son la estructura regular del campo de las ondas sísmicas, así como las características de la resonancia de los estratos del suelo en el sitio local. El modelo propuesto de la carga representa el campo local de las ondas sísmicas como composición de los procesos parciales evolutivos estocásticos sobre las hachas principales variables-temporales, que corresponden a los trenes de las ondas con características específicas de la carga. Esas características de la carga son descritas en el dominio de la frecuencia y del tiempo así como en el dominio espacial por las funciones de la forma, que parámetros son especificas por los trenos generalizados de la onda sísmica y correlacionan fuertemente a las entidades sísmicas y geotécnicas. La contribución principal de este trabajo son los procedimientos nuevamente desarrollados de la valoración basados en la correlación, que sirven en el contexto de la especificación empírica de los parámetros de modelo para la práctica de construcción. El algoritmo de las Ejes Mayor de la Correlación Espectral-Adaptante (SAPCA) asegura la recogida óptima de los trenes espaciales de la onda transformando los datos registrados sobre componentes de la referencia. En el mismo tiempo - en la conexión con un algoritmo de la corrección para el ángulo del acimut del eje mayor/principal – SAPCA entrega los patrones asociados concisos en el curso del eje principal, que después se utilizan para identificar confiablemente las fases de la dominación para tres trenes generalizados de la onda. Dentro de esas fases de la dominación de la onda, los parámetros específicos de la onda del modelo de la carga se determinan. Además, un algoritmo se presenta para identificar las ondas de Rayleigh en solos mensuras de la aceleración del sitio. La suficiencia del aproximación que modela y la eficacia de los procedimientos de la valoración se verifican por medio de los datos del terremoto catastrófico a Northridge 1994. La aproximación non-estacionaria que modela propuesto describe con más exactitud las porciones de la carga del campo de la onda del terremoto fuerte subestimado en modelos estocásticos convencionales de la carga. Cargue las porciones que se dejan hacia fuera o modelado global hasta ahora se hace disponible para el análisis y modelar para la primera vez. El modelo estocástico gana la transparencia física con respecto a la carga más importante que genera efectos, y por lo tanto será - a pesar de una complejidad más alta - fácil de dirigir en práctica de la ingeniería. El método principal del eje también será útil para los análisis sismológicos en el campo cercano, p. e., para el análisis de los procesos de la ruptura y de los efectos topográficos del sitio.
Сильные землетрясения всемирно являются потенциально высоким риском для ур¬банизированных центров. Для уменшения сейсмического риска развивются методы антисейсмичной структурной конструкции. Эти методы построены на предположениях, которые требуют тщательного эмпирического знания характеристик местного сейсмического ускорения грунта. Предел состоит, с другой стороны, в дополнительных стоимостях строительства. Убытки от недавних сильных землетрясений - также в индустриально развитых странах – подчеркивают потребность более глубокого уточнения прин¬ципиальных схем и методов антисейсмичного строительства. Эта работа представляет новый подход стохастического сейсмического моде¬лирования нагрузки, развивающий обычное предположениe стационарного, одномерного стохастического процессa нa сейсмическое ускорение грунта. Целью будет создание модели нагрузки, которая указана по отдельности для специфических характеристик и сейсмических волн и местного положения, делающей возможным, путём использования информации о физических и геотехничес¬ких инвариантностях, и прозрачных и недорогих подходов в антисейсмичной структурной конструкции, но по крайней мере уменьшающей сейсмический риск, по сравнению с общими методами антисейсмичного строительства. Эти инвариантности являются закономерной структурой волнового поля также, как и свойствами резонан¬са слоёв грунта в месте постройки.. Предложенная модель нагрузки представляет местное волновое поле как составляющая стохастических подпроцессов развития на главных осях зависящих от времени, которые соответствуют волновым пакетам со специфическими характе¬ристиками нагрузки. Те эти характеристики нагрузки описаны в диапазонах частоты и времени также, как трёхмерного объёма функциями формы, параметры которых указаны по отдельности для различных обобщанных волновых пакетов u сильно соотносят от сейсмичес¬ких и геотехнических величин. Главны вклад работы – это новые процедуры оценивания, основанные на корреляции, которые служат в рамках эмпирической спецификации модельных параметров для практики строительства. Новый Aлгоритм Спектрально-Приспо¬собительных Главных Oсей Kорреляции (SAPCA) обеспечивает оптимальное заволакивание трёхмерных волновых пакетов преобразованием записанных данных сейсмического ускорения грунта на калибровочные компоненты на этих главных осях. В то же самое время - в связи с алгоритмом коррекции для угла простирания главной оси - SAPCA поставляет сжатые связанные волновые картины в ходе временно-изменчивых глав¬ных осей, которые в свою очередь использованы для надежного определения доминантных фаз для трёх обобщенных волновых пакетов. В этих фазах засилья отдельного волного пакета, определёны волново-специфические параметры модели нагруз¬ки. Дополнительно, показан алгоритм для идентификации и определения волн типа Релея при одиночной регистрации сейсмической ускорении грунта. Адекватность моделированного подходa и эффективность процедур оценивания подтвержены посредством данных сильного землетрясения Northridge 1994. Предложенный нестационарный подход моделировании описывает с большей точностью части нагрузки волного поля сильных землетрясений недооцененных в обычных стохастических моделях нагрузки. Части нагрузки не рассматривающиеся или слишком обобщаемые до сих пор, при новым подходе впервые можно будет учитывать и анализировать. Стохастическая модель приобретает физической прозрачностей по отношению к самым важным влияниям, которые производят нагрузку, и следовательно будет – несмотря на более высокую сложность – легка для того, чтобы применять её в практике инженерных расчётов. Mетод Главных Oсей также будет полезенно для сейсмологических анализов в близком поле, например, для анализа процессов повреждения и топографических влиянии местного положения.

APA, Harvard, Vancouver, ISO, and other styles

40

Cabrol, Sébastien. "Les crises économiques et financières et les facteurs favorisant leur occurrence." Thesis, Paris 9, 2013. http://www.theses.fr/2013PA090019.

Full text

Abstract:

Cette étude vise à mettre en lumière les différences et similarités existant entre les principales crises économiques et financières ayant frappé un échantillon de 21 pays avancés depuis 1981. Nous analyserons plus particulièrement la crise des subprimes que nous rapprocherons avec des épisodes antérieurs. Nous étudierons à la fois les années du déclenchement des turbulences (analyse typologique) ainsi que celles les précédant (prévision). Cette analyse sera fondée sur l’utilisation de la méthode CART (Classification And Regression Trees). Cette technique non linéaire et non paramétrique permet de prendre en compte les effets de seuil et les interactions entre variables explicatives de façon à révéler plusieurs contextes distincts explicatifs d’un même événement. Dans le cadre d‘un modèle de prévision, l’analyse des années précédant les crises nous indique que les variables à surveiller sont : la variation et la volatilité du cours de l’once d’or, le déficit du compte courant en pourcentage du PIB et la variation de l’openness ratio et enfin la variation et la volatilité du taux de change. Dans le cadre de l’analyse typologique, l’étude des différentes variétés de crise (année du déclenchement de la crise) nous permettra d’identifier deux principaux types de turbulence d’un point de vue empirique. En premier lieu, nous retiendrons les crises globales caractérisées par un fort ralentissement ou une baisse de l’activité aux Etats-Unis et une faible croissance du PIB dans les pays touchés. D’autre part, nous mettrons en évidence des crises idiosyncratiques propres à un pays donné et caractérisées par une inflation et une volatilité du taux de change élevées
The aim of this thesis is to analyze, from an empirical point of view, both the different varieties of economic and financial crises (typological analysis) and the context’s characteristics, which could be associated with a likely occurrence of such events. Consequently, we analyze both: years seeing a crisis occurring and years preceding such events (leading contexts analysis, forecasting). This study contributes to the empirical literature by focusing exclusively on the crises in advanced economies over the last 30 years, by considering several theoretical types of crises and by taking into account a large number of both economic and financial explanatory variables. As part of this research, we also analyze stylized facts related to the 2007/2008 subprimes turmoil and our ability to foresee crises from an epistemological perspective. Our empirical results are based on the use of binary classification trees through CART (Classification And Regression Trees) methodology. This nonparametric and nonlinear statistical technique allows us to manage large data set and is suitable to identify threshold effects and complex interactions among variables. Furthermore, this methodology leads to characterize crises (or context preceding a crisis) by several distinct sets of independent variables. Thus, we identify as leading indicators of economic and financial crises: variation and volatility of both gold prices and nominal exchange rates, as well as current account balance (as % of GDP) and change in openness ratio. Regarding the typological analysis, we figure out two main different empirical varieties of crises. First, we highlight « global type » crises characterized by a slowdown in US economic activity (stressing the role and influence of the USA in global economic conditions) and low GDP growth in the countries affected by the turmoil. Second, we find that country-specific high level of both inflation and exchange rates volatility could be considered as evidence of « idiosyncratic type » crises

APA, Harvard, Vancouver, ISO, and other styles

41

Jakel, Roland. "Lineare und nichtlineare Analyse hochdynamischer Einschlagvorgänge mit Creo Simulate und Abaqus/Explicit." Universitätsbibliothek Chemnitz, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-171812.

Full text

Abstract:

Der Vortrag beschreibt wie sich mittels der unterschiedlichen Berechnungsverfahren zur Lösung dynamischer Strukturpobleme der Einschlag eines idealisierten Bruchstücks in eine Schutzwand berechnen lässt. Dies wird mittels zweier kommerzieller FEM-Programme beschrieben: a.) Creo Simulate nutzt zur Lösung die Methode der modalen Superposition, d.h., es können nur lineare dynamische Systeme mit rein modaler Dämpfung berechnet werden. Kontakt zwischen zwei Bauteilen lässt sich damit nicht erfassen. Die unbekannte Kraft-Zeit-Funktion des Einschlagvorganges muss also geeignet abgeschätzt und als äußere Last auf die Schutzwand aufgebracht werden. Je dynamischer der Einschlagvorgang, desto eher wird der Gültigkeitsbereich des zugrunde liegenden linearen Modells verlassen. b.) Abaqus/Explicit nutzt ein direktes Zeitintegrationsverfahren zur schrittweisen Lösung der zugrunde liegenden Differentialgleichung, die keine tangentiale Steifigkeitsmatrix benötigt. Damit können sowohl Materialnichtlinearitäten als auch Kontakt geeignet erfasst und damit die Kraft-Zeit-Funktion des Einschlages ermittelt werden. Auch bei extrem hochdynamischen Vorgängen liefert diese Methode ein gutes Ergebnis. Es müssen dafür jedoch weit mehr Werkstoffdaten bekannt sein, um das nichtlineare elasto-plastische Materialverhalten mit Schädigungseffekten korrekt zu beschreiben. Die Schwierigkeiten der Werkstoffdatenbestimmung werden in den Grundlagen erläutert
The presentation describes how to analyze the impact of an idealized fragment into a stell protective panel with different dynamic analysis methods. Two different commercial Finite Element codes are used for this: a.) Creo Simulate: This code uses the method of modal superposition for analyzing the dynamic response of linear dynamic systems. Therefore, only modal damping and no contact can be used. The unknown force-vs.-time curve of the impact event cannot be computed, but must be assumed and applied as external force to the steel protective panel. As more dynamic the impact, as sooner the range of validity of the underlying linear model is left. b.) Abaqus/Explicit: This code uses a direct integration method for an incremental (step by step) solution of the underlying differential equation, which does not need a tangential stiffness matrix. In this way, matieral nonlinearities as well as contact can be obtained as one result of the FEM analysis. Even for extremely high-dynamic impacts, good results can be obtained. But, the nonlinear elasto-plastic material behavior with damage initiation and damage evolution must be characterized with a lot of effort. The principal difficulties of the material characterization are described

APA, Harvard, Vancouver, ISO, and other styles

42

von, Wenckstern Michael. "Web applications using the Google Web Toolkit." Master's thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2013. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-115009.

Full text

Abstract:

This diploma thesis describes how to create or convert traditional Java programs to desktop-like rich internet applications with the Google Web Toolkit. The Google Web Toolkit is an open source development environment, which translates Java code to browser and device independent HTML and JavaScript. Most of the GWT framework parts, including the Java to JavaScript compiler as well as important security issues of websites will be introduced. The famous Agricola board game will be implemented in the Model-View-Presenter pattern to show that complex user interfaces can be created with the Google Web Toolkit. The Google Web Toolkit framework will be compared with the JavaServer Faces one to find out which toolkit is the right one for the next web project
Diese Diplomarbeit beschreibt die Erzeugung desktopähnlicher Anwendungen mit dem Google Web Toolkit und die Umwandlung klassischer Java-Programme in diese. Das Google Web Toolkit ist eine Open-Source-Entwicklungsumgebung, die Java-Code in browserunabhängiges als auch in geräteübergreifendes HTML und JavaScript übersetzt. Vorgestellt wird der Großteil des GWT Frameworks inklusive des Java zu JavaScript-Compilers sowie wichtige Sicherheitsaspekte von Internetseiten. Um zu zeigen, dass auch komplizierte graphische Oberflächen mit dem Google Web Toolkit erzeugt werden können, wird das bekannte Brettspiel Agricola mittels Model-View-Presenter Designmuster implementiert. Zur Ermittlung der richtigen Technologie für das nächste Webprojekt findet ein Vergleich zwischen dem Google Web Toolkit und JavaServer Faces statt

APA, Harvard, Vancouver, ISO, and other styles

43

Chen, Pu. "Classification tree models for predicting cancer status." 2009. http://digital.library.duq.edu/u?/etd,109505.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Mistry, Pritesh, Daniel Neagu, Paul R. Trundle, and J. D. Vessey. "Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology." 2015. http://hdl.handle.net/10454/7545.

Full text

Abstract:

yes
Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.

APA, Harvard, Vancouver, ISO, and other styles

45

Manwani, Naresh. "Supervised Learning of Piecewise Linear Models." Thesis, 2012. http://hdl.handle.net/2005/3244.

Full text

Abstract:

Supervised learning of piecewise linear models is a well studied problem in machine learning community. The key idea in piecewise linear modeling is to properly partition the input space and learn a linear model for every partition. Decision trees and regression trees are classic examples of piecewise linear models for classification and regression problems. The existing approaches for learning decision/regression trees can be broadly classified in to two classes, namely, fixed structure approaches and greedy approaches. In the fixed structure approaches, tree structure is fixed before hand by fixing the number of non leaf nodes, height of the tree and paths from root node to every leaf node of the tree. Mixture of experts and hierarchical mixture of experts are examples of fixed structure approaches for learning piecewise linear models. Parameters of the models are found using, e.g., maximum likelihood estimation, for which expectation maximization(EM) algorithm can be used. Fixed structure piecewise linear models can also be learnt using risk minimization under an appropriate loss function. Learning an optimal decision tree using fixed structure approach is a hard problem. Constructing an optimal binary decision tree is known to be NP Complete. On the other hand, greedy approaches do not assume any parametric form or any fixed structure for the decision tree classifier. Most of the greedy approaches learn tree structured piecewise linear models in a top down fashion. These are built by binary or multi-way recursive partitioning of the input space. The main issues in top down decision tree induction is to choose an appropriate objective function to rate the split rules. The objective function should be easy to optimize. Top-down decision trees are easy to implement and understand, but there are no optimality guarantees due to their greedy nature. Regression trees are built in the similar way as decision trees. In regression trees, every leaf node is associated with a linear regression function. All piece wise linear modeling techniques deal with two main tasks, namely, partitioning of the input space and learning a linear model for every partition. However, Partitioning of the input space and learning linear models for different partitions are not independent problems. Simultaneous optimal estimation of partitions and learning linear models for every partition, is a combinatorial problem and hence computationally hard. However, piecewise linear models provide better insights in to the classification or regression problem by giving explicit representation of the structure in the data. The information captured by piecewise linear models can be summarized in terms of simple rules, so that, they can be used to analyze the properties of the domain from which the data originates. These properties make piecewise linear models, like decision trees and regression trees, extremely useful in many data mining applications and place them among top data mining algorithms. In this thesis, we address the problem of supervised learning of piecewise linear models for classification and regression. We propose novel algorithms for learning piecewise linear classifiers and regression functions. We also address the problem of noise tolerant learning of classifiers in presence of label noise. We propose a novel algorithm for learning polyhedral classifiers which are the simplest form of piecewise linear classifiers. Polyhedral classifiers are useful when points of positive class fall inside a convex region and all the negative class points are distributed outside the convex region. Then the region of positive class can be well approximated by a simple polyhedral set. The key challenge in optimally learning a fixed structure polyhedral classifier is to identify sub problems, where each sub problem is a linear classification problem. This is a hard problem and identifying polyhedral separability is known to be NP complete. The goal of any polyhedral learning algorithm is to efficiently handle underlying combinatorial problem while achieving good classification accuracy. Existing methods for learning a fixed structure polyhedral classifier are based on solving non convex constrained optimization problems. These approaches do not efficiently handle the combinatorial aspect of the problem and are computationally expensive. We propose a method of model based estimation of posterior class probability to learn polyhedral classifiers. We solve an unconstrained optimization problem using a simple two step algorithm (similar to EM algorithm) to find the model parameters. To the best of our knowledge, this is the first attempt to form an unconstrained optimization problem for learning polyhedral classifiers. We then modify our algorithm to find the number of required hyperplanes also automatically. We experimentally show that our approach is better than the existing polyhedral learning algorithms in terms of training time, performance and the complexity. Most often, class conditional densities are multimodal. In such cases, each class region may be represented as a union of polyhedral regions and hence a single polyhedral classifier is not sufficient. To handle such situation, a generic decision tree is required. Learning optimal fixed structure decision tree is a computationally hard problem. On the other hand, top-down decision trees have no optimality guarantees due to the greedy nature. However, top-down decision tree approaches are widely used as they are versatile and easy to implement. Most of the existing top-down decision tree algorithms (CART,OC1,C4.5, etc.) use impurity measures to assess the goodness of hyper planes at each node of the tree. These measures do not properly capture the geometric structures in the data. We propose a novel decision tree algorithm that ,at each node, selects hyperplanes based on an objective function which takes into consideration geometric structure of the class regions. The resulting optimization problem turns out to be a generalized eigen value problem and hence is efficiently solved. We show through empirical studies that our approach leads to smaller size trees and better performance compared to other top-down decision tree approaches. We also provide some theoretical justification for the proposed method of learning decision trees. Piecewise linear regression is similar to the corresponding classification problem. For example, in regression trees, each leaf node is associated with a linear regression model. Thus the problem is once again that of (simultaneous) estimation of optimal partitions and learning a linear model for each partition. Regression trees, hinge hyperplane method, mixture of experts are some of the approaches to learn continuous piecewise linear regression models. Many of these algorithms are computationally intensive. We present a method of learning piecewise linear regression model which is computationally simple and is capable of learning discontinuous functions as well. The method is based on the idea of K plane regression that can identify a set of linear models given the training data. K plane regression is a simple algorithm motivated by the philosophy of k means clustering. However this simple algorithm has several problems. It does not give a model function so that we can predict the target value for any given input. Also, it is very sensitive to noise. We propose a modified K plane regression algorithm which can learn continuous as well as discontinuous functions. The proposed algorithm still retains the spirit of k means algorithm and after every iteration it improves the objective function. The proposed method learns a proper Piece wise linear model that can be used for prediction. The algorithm is also more robust to additive noise than K plane regression. While learning classifiers, one normally assumes that the class labels in the training data set are noise free. However, in many applications like Spam filtering, text classification etc., the training data can be mislabeled due to subjective errors. In such cases, the standard learning algorithms (SVM, Adaboost, decision trees etc.) start over fitting on the noisy points and lead to poor test accuracy. Thus analyzing the vulnerabilities of classifiers to label noise has recently attracted growing interest from the machine learning community. The existing noise tolerant learning approaches first try to identify the noisy points and then learn classifier on remaining points. In this thesis, we address the issue of developing learning algorithms which are inherently noise tolerant. An algorithm is inherently noise tolerant if, the classifier it learns with noisy samples would have the same performance on test data as that learnt from noise free samples. Algorithms having such robustness (under suitable assumption on the noise) are attractive for learning with noisy samples. Here, we consider non uniform label noise which is a generic noise model. In non uniform label noise, the probability of the class label for an example being incorrect, is a function of the feature vector of the example.(We assume that this probability is less than 0.5 for all feature vectors.) This can account for most cases of noisy data sets. There is no provably optimal algorithm for learning noise tolerant classifiers in presence of non uniform label noise. We propose a novel characterization of noise tolerance of an algorithm. We analyze noise tolerance properties of risk minimization frame work as risk minimization is a common strategy for classifier learning. We show that risk minimization under 01 loss has the best noise tolerance properties. None of the other convex loss functions have such noise tolerance properties. Empirical risk minimization under 01 loss is a hard problem as 01 loss function is not differentiable. We propose a gradient free stochastic optimization technique to minimize risk under 01 loss function for noise tolerant learning of linear classifiers. We show (under some conditions) that the algorithm converges asymptotically to the global minima of the risk under 01 loss function. We illustrate the noise tolerance of our algorithm through simulations experiments. We demonstrate the noise tolerance of the algorithm through simulations.

APA, Harvard, Vancouver, ISO, and other styles

46

Liu, Hsin-hsien, and 劉欣憲. "A Study of the Application of the Procedure Analysis Method, the Classification Tree Method, and Artificial Neural Network Method to Construct the Authentication Models of the Roadway Accidents." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/15976300103018019953.

Full text

Abstract:

碩士
逢甲大學
交通工程與管理所
94
The roadway traffic accidents are increasing yearly, and the clients of the traffic accident want to protect their own rights, so that the cases of traffic accidents need to be authenticated are increasing simultaneously. However, the Local Traffic Authentication Committee (LTAAC) is lack of manpower, and the quoted authentication criteria are inconsistent by the different LTAAC. Therefore, it results a delay and decreasing the quality of the authentication case. This study uses three methods such as the Procedural Authentication Method (PAM), the Classification Tree Method (CTM), and the Artificial Neural Network (ANN) to construct those authentication models, so that we can use these models to predict the responsibilities of the clients in a traffic accident. Also this study mainly focuses on the two-vehicle collision which doesn’t include pedestrian and bicyclist. The total data includes 2,634 cases and 5,268 clients. First, the PAM uses literature review and brainstorming to find the authentication criteria. Second, the CTM uses the cross table analysis to pick up the major factors as the input variables, and then sets up the different end-node numbers. Finally, the CTM produces 30 sub-models for validating. Third, the ANN method also uses the cross table analysis to pick up the major factors as the input variables, and sets up the different neuron numbers in the hidden layer. Finally, the ANN method also produces 30 sub-models. There are three collision types: car/car、car/motorcycle、and motorcycle /motorcycle. Both the CTM and the ANN methods will use 80 percentages of cases in database for training, and 20 percentages of cases for validating. This study shows that under the existing criteria, the PAM has the better results than the CTM and ANN method such as the accuracy percentage of the PAM are 74.1%, the accuracy percentage of the CTM are 71.92%, and the accuracy percentage of the ANN method are 67.17%. However, if we include the total client data, the accuracy percentage of the PAM will reduce into 62.5%.

APA, Harvard, Vancouver, ISO, and other styles

47

Chang, Shou-Chih, and 張守智. "Model Trees for Hybrid Data Type Classification." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/92670263836027181618.

Full text

Abstract:

碩士
國立臺灣科技大學
資訊工程系
92
Classification problem is a fundamental task in data mining. Many classification learning algorithms have been successfully developed for the different purposes. Most classification learning algorithms are only suitable for some specified attribute type of data. However, in the real world applications, we often encounter the hybrid datasets which are composed of nominal and numerical attributes. To apply the learning algorithms, some attributes must be transformed into the appropriate data types. This procedure could change the nature of datasets. In this thesis, we propose a new approach, model trees, to integrate the available learning algorithms that have the different properties in nature to solve those cases. Here, we employ the decision trees as the classification framework and incorporate the support vector machines into the construction process of decision trees to replace the discretization procedure and to provide the multivariate decision. Finally, we perform experiments to show our purposed method has the better performance than other competing methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Jhang, Zao-Shih, and 張造時. "Variable Selection of Regression Trees and Node Model of Classification Trees." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/75500215236232826871.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Wang, Chien-Jen, and 王建仁. "An Automatic Classification Model for Electronic Commerce Websites Utilizing Decision Tree." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/68134476733931044580.

Full text

Abstract:

碩士
國立勤益科技大學
企業管理系
97
Due to the rise of Web 2.0 and the increase of Internet users in Taiwan, e-commerce has begun to prevail again in Taiwan after the dot-com crisis. Now, more and more consumers are willing to shop online, resulting in a huge growth of online shopping websites. In practice, administrators of these websites seldom sufficiently analyze the consumer behavior of their members and classify them only by a few criteria. Without accurate classification of members, marketing cannot effectively reach potential consumers. As a result, members’ repurchase rate can hardly be improved, further affecting business performance. For administrators of online shopping websites, if they can quickly find core customers from the member database and adopt effective marketing strategies based on their attributes, they can enhance their sales performance, increase their market share, and double the effect of the adopted marketing strategies. The objectives of this study were as follows: 1. To explore the current member management mechanisms and development of e-commerce in Taiwan. 2. To use decision tree to construct an automatic member classification model for e-commerce service providers in Taiwan. 3. To investigate the classification accuracy of the proposed model. Based on member attributes of a simulated member database, such as hours of using Internet, expenditure, and shopping frequency, four e-commerce member classification models were constructed using decision tree C4.5 algorithm. The models were further modified to enhance the efficiency of automatic classification. These models could assist e-commerce service providers to automatically classify their members and more efficiently find core customer groups, so as to set up effective marketing strategies and thus enhance their competitiveness. The research findings included: 1. Through application of decision tree, important attributes could be derived to classify members of e-commerce websites. 2. An automatic member classification model could be constructed by using decision tree. 3. Through trimming of the decision tree, complex of the decision rules could be reduced, but accuracy of the classification model could be decrease.

APA, Harvard, Vancouver, ISO, and other styles

50

Lin, Guan-An, and 林冠安. "Sources of Volatility in Stock Returns: Application of Classification and Regression Tree Model." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/f6nb24.

Full text

Abstract:

碩士
輔仁大學
金融與國際企業學系金融碩士班
104
This paper researches the relationship between the volatility of TAIEX's returns and the variables in macroeconomics、stock market and investors sentiment. We use the classification and regression tree by Breiman et. al in 1982 to find the key variables to impact the volatility of stock returns.Then further explore is to compare the result of regression analysis by using the regression tree's factor and all of the variables in this research.The data which we used is from January, 2001 to December, 2015 in Taiwan and total of 180 data. We use the autoregressive and moving average model to analyze the monthly stock returns and then take the residuals as the proxy of the volatility. The empirical results have two parts. Firstly, the most important variable which is selected from the regression tree is dividend yield and the second is gold returns. And the macroeconomic variable has the best performance in the three variables. Secondly, business cycle and interest rate are selected as the first two key variables from regression tree. And compared with traditional linear regression considering all variables, it is better to establish linear regression with the variables which is selected by CART.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Classification tree models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles