Dissertations / Theses on the topic 'Classification tree models'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Classification tree models.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Liu, Dan. "Tree-based Models for Longitudinal Data." Bowling Green State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1399972118.
Full textKeller-Schmidt, Stephanie. "Stochastic Tree Models for Macroevolution." Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-96504.
Full textShafi, Ghufran. "Development of roadway link screening criteria for microscale carbon monoxide and particulate matter conformity analyses through application of classification tree model." Thesis, Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/28222.
Full textCommittee Chair: Guensler, Randall; Committee Member: Rodgers, Michael; Committee Member: Russell, Armistead.
Victors, Mason Lemoyne. "A Classification Tool for Predictive Data Analysis in Healthcare." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/5639.
Full textShew, Cameron Hunter. "TRANSFERABILITY AND ROBUSTNESS OF PREDICTIVE MODELS TO PROACTIVELY ASSESS REAL-TIME FREEWAY CRASH RISK." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/863.
Full textMotloung, Rethabile Frangenie. "Understanding current and potential distribution of Australian acacia species in southern Africa." Diss., University of Pretoria, 2014. http://hdl.handle.net/2263/79720.
Full textDissertation (MSc)--University of Pretoria, 2014.
National Research Foundation (NRF)
Zoology and Entomology
MSc (Zoology)
Unrestricted
Mugodo, James, and n/a. "Plant species rarity and data restriction influence the prediction success of species distribution models." University of Canberra. Resource, Environmental & Heritage Sciences, 2002. http://erl.canberra.edu.au./public/adt-AUC20050530.112801.
Full textLazaridès, Ariane. "Classification trees for acoustic models : variations on a theme." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0016/MQ37139.pdf.
Full textLöwe, Rakel, and Ida Schneider. "Automatic Differential Diagnosis Model of Patients with Parkinsonian Syndrome : A model using multiple linear regression and classification tree learning." Thesis, Uppsala universitet, Tillämpad kärnfysik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413638.
Full textPurcell, Terence S. "The use of classification trees to characterize the attrition process for Army manpower models." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1997. http://handle.dtic.mil/100.2/ADA336747.
Full textSilva, Jesús, Palma Hugo Hernández, Núẽz William Niebles, Alex Ruiz-Lazaro, and Noel Varela. "Natural Language Explanation Model for Decision Trees." Institute of Physics Publishing, 2020. http://hdl.handle.net/10757/652131.
Full textUdaya, Kumar Magesh Kumar. "Classification of Parkinson’s Disease using MultiPass Lvq,Logistic Model Tree,K-Star for Audio Data set : Classification of Parkinson Disease using Audio Dataset." Thesis, Högskolan Dalarna, Datateknik, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:du-5596.
Full textPienaar, Neil Deon. "Using the classification and regression tree (CART) model for stock selection on the S&P 700." Master's thesis, University of Cape Town, 2016. http://hdl.handle.net/11427/20728.
Full textLim, Steven. "Recommending TEE-based Functions Using a Deep Learning Model." Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/104999.
Full textMaster of Science
Improving the security of software systems has become critically important. A trusted execution environment (TEE) is an emerging technology that can help secure software that uses or stores confidential information. To make use of this technology, developers need to identify which pieces of code handle confidential information and should thus be placed in a TEE. However, this process is costly and laborious because it requires the developers to understand the code well enough to make the appropriate changes in order to incorporate a TEE. This process can become challenging for large software that contains millions of lines of code. To help reduce the cost incurred in the process of identifying which pieces of code should be placed within a TEE, this thesis presents ML-TEE, a recommendation system that uses a deep learning model to help reduce the number of lines of code a developer needs to inspect. Our results show that the recommendation system achieves high accuracy as well as a good balance between precision and recall. In addition, we conducted a pilot study and found that participants from the intervention group who used the output from the recommendation system managed to achieve a higher average accuracy and perform the assigned task faster than the participants in the control group.
Truong, Alfred Kar Yin. "Fast growing and interpretable oblique trees via logistic regression models." Thesis, University of Oxford, 2009. http://ora.ox.ac.uk/objects/uuid:e0de0156-da01-4781-85c5-8213f5004f10.
Full textLinkevicius, Edgaras. "Single Tree Level Simulator for Lituanian Pine Forests." Doctoral thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-150330.
Full textObjectives In Lithuania, during the most recent decades, the leading theory in forest management and planning combined optimization of forest stand density and maximal productivity at every time point of stand development. Thus, great effort was spent in creating stand level models that are highly effective in managing even-aged monocultures of pine or spruce forests. But these models produce significant errors in mixed or converted forests. In order to meet the requirements of contemporary forestry, appropriate forest management tools are required that would be capable to predict the growth and yield of more structured forests. Thus, the overall objective for this study was to re-parameterise the single tree level simulator BWINPro-S (developed for forests in Saxony/Germany) for Lithuanian pine forests that grow on mineral sites. To reach this goal, the following tasks were set: • To create, and to evaluate, a database for modelling. • To estimate the impact of competition for growing space on diameter, basal area and height growth of trees. • To develop a tree diameter model, and re-parameterise basal area and height growth models. • To assess natural tree mortality induced by competition between trees for growing space. • To develop the first approach of STLS for pine in Lithuania. Hypotheses 1. Site quality is the most important factor that affects forest growth and yield. 2. Distance dependent Competition Indices had higher partial correlation with tree basal area and height increment than distance independent Competition Indices. 3. The re-parameterised model based on Lithuanian data fits better under Lithuanian conditions (regarding diameter, basal area, height increment and mortality) than the original model BWINPro-S. 4. A single tree level simulator provides valuable support for decision makers and forest managers to improve forest management in Lithuania. Materials and methods To reach the main goals of this study, the research was structured to four sections: 1) Database completion, 2) Analysis of competition, 3) Modelling tree growth, 4) Validation of developed models. The database consisted of analytical data from 18 permanent experimental plots (PEPs) and 2 Validation Plots (VP) that were used only for the validation of the models. All plots (PEPs and VP) represent mainly naturally regenerated, single layer pine stands that grow on very typical pine sites. Database completion involved (a) establishment of the initial database, (b) modelling of missing data values and (c) evaluation of the complete database, which focused on: • Sample size and estimation of the population’s mean • Estimation of potential site productivity • Estimation of relationship between potential site productivity and forest yield In order to estimate the impact of competition for growing space on diameter, basal area and height growth of trees the following methods were used. To select the competitors, this study focuses on three separate positions for setting the inverse cone: a) at the height of the crown base, b) at the height of widest crown width, and c) at the stem base. The opening angle of the search cone was either 60 or 80 degrees. To estimate the competition, the study by partial correlation analysis evaluated a total of 20 competition indices, of which six distance dependent and two distance independent CIs were applied in the research programme. Modelling of tree growth was divided into three parts: a) development of an original tree diameter increment model, b) re-parameterisation of basal area and height increment models, and c) development of new natural mortality models and re-parameterisation of natural mortality models. Simple linear regression models were evaluated by estimating each model’s statistical significance and coefficient of determination. Statistical analysis of multiple linear regression models was enlarged by conducting further tests: statistical significance was checked for each independent variable: regression assumptions (concerning normal distribution and homogeneity of variance of the models’s residuals, and multicollinearity of the independent variables) were checked. Simple nonlinear regression models were evaluated mainly by adjusted coefficient of determination. For multiple nonlinear regression models, regression assumptions were also checked by producing normal Q-Q plots and by checking homogeneity of variance of model’s residuals. Multiple logistic regression models were evaluated by estimating each model’s statistical significance with Pearson’s chi square statistics and the statistical significance of each model’s parameters with Wald statistics. Goodness of fit was estimated by using log likelihood function values, Cox-Snell and Nagelkerkle’s coefficients of determination, classification tables and ROC curves. The re-parameterised basal area and height increment models were validated by plotting each model’s predicted values against observed values. Also each model’s residuals were plotted against predicted values. Bias, relative bias, precision, relative precision, accuracy and relative accuracy when comparing predicted and observed values were estimated as well. Results and Conclusions The growth models used in the BWINPro-S simulator were successfully re-parameterised for Lithuanian growth conditions. Thus the study may state these conclusions: 1. The accumulated standing volumes and overall productivity of pine stands only partially depends on the productivity potential of sites. Site quality defines the growth potential that could be reached in a stand. The realization of growth potential largely depends on the growing regime in the stand that is defined by the beginning, frequency and intensity of thinning. 2. In pure pine stands, distance dependent competition indices show greater capabilities to predict mean annual basal area increment than distance independent indices. Competition index (coded as CI4 in this study) proposed by BIGING & DOBBERTIN (1992) combined with the selection method height to crown base with opening angle of 80 degrees is recommended as the most efficient for describing the individual diameter growth of trees. 3. HEGYI\\\'S (1974) distance independent competition index scored the highest partial correlation coefficients and produced slightly better results than distance dependent competition indices in predicting mean annual height increment for individual trees. Yet, the generally poor performance of competition indices to predict height increment of individual pine trees was also recorded. 4. Competition has a purely negative impact on tree diameter growth. Increasing competition leads to steady decreases in diameter increment. Nevertheless, although a small amount of competition does stimulate tree height growth, stronger competition has a lasting negative impact on tree height growth. 5. The nonlinear diameter increment model, developed by this study, has high capabilities to predict growth of pine trees. The model’s coefficient of determination value was equal to 0.483. The distribution of the model’s residuals fulfilled the requirements of regression assumptions. 6. The re-parameterisation of the BWINPro-S basal area and height increment models for use in Lithuanian permanent experimental plots, increased their performance. During the first validation procedure, based on 30 years growth simulation, the re-parameterised models produced reliable results. 7. Two individual mortality models, developed by this study, showed very high capabilities to predict the natural mortality of pine trees. The distance dependent natural mortality model scored slightly better results. Both models managed to correctly classify dead and living trees, slightly more than 83% of the time. The re-parameterisation of the BWINPro-S natural mortality model increased its ability to predict the natural mortality of pine trees in Lithuania. Correctly classifying growing and dead trees increased by 6%, from 77 to 83%. 8. BWINPro-S simulator with re-parameterised growth models for Lithuanian conditions is a valuable support tool for decision makers and forest managers in Lithuania
Darbo tikslai Lietuvoje ilgą laiką ūkininkavimas miškuose buvo grindžiamas medynų tankumo optimizavimu ir maksimalaus medynų produktyvumo siekimu visose medynų vystymosi stadijose. Mokslininkai dėjo daug pastangų kurdami medyno lygmens našumo modelius. Šie modeliai buvo patikimi ūkininkaujant vienaamžiuose medynuose. Tačiau jie yra sunkiai pritaikomi mišriuose medynuose. Siekiant patenkinti šiuolaikinio miškininkavimo poreikius, kai vis didesnis dėmesys skiriamas mišrių medynų su keliais ardais auginimui, reikalingi nauji modeliai, kurie sėkmingai prognozuotų mišrių medynų augimą, jų našumą bei reakcijas į įvairias ūkines priemones. Todėl pagrindinis šio darbo tikslas yra parametrizuoti iš naujo BWINPro-S medžio lygio stimuliatorių sukurtą Vokietijos rytinėje žemėje Saksonijoje taip pritaikant jį Lietuvos sąlygoms. Šiam tikslui pasiekti, buvo suformuluoti sekantys uždaviniai: • Paruošti ir įvertinti duomenų bazę reikalingą modeliavimui. • Įvertinti medžių tarpusavio konkurencijos įtaką medžių skersmens, skerspločių sumos ir aukščio prieaugiui. • Sukurti naują medžio skersmens prieaugio modelį ir parametrizuoti iš naujo skerspločių sumos bei aukščio modelius. • Įvertinti pušynų savaiminio retinimosi dėsningumus atsižvelgiant į medžių tarpusavio konkurenciją dėl augimo erdvės. Tikrintinos hipotezės: 1. Medyno augavietė yra svarbiausias veiksnys, lemiantis medynų našumą ir produktyvumą. 2. Konkurencijos indeksai, įvertinantys atstumą tarp medžių, turi didesnes dalinės koreliacijos reikšmes su medžių skerspločių sumos, skersmens ir aukščio prieaugiais lyginant su konkurencijos indeksais, neįvertinančiais atstumo tarp medžių. 3. Parametrizuoti naujai, panaudojant Lietuvoje augančių pušynų duomenis, modeliai geriau tinka Lietuvos sąlygoms (pagal skersmens, skerspločių sumos ir aukščio prieaugį bei savaiminį retinimąsi) lyginant su modeliais, sukurtais Vokietijos sąlygoms. 4. Medžio lygmens augimo simuliatorius yra naudinga priemonė miškų valdytojams siekiant pagerinti ūkininkavimo kokybę Lietuvoje. Darbo metodai Šis darbas buvo suskirstytas į keturias pagrindines dalis: 1) duomenų bazės suformavimas, 2) konkurencijos indeksų analizė, 3) medžių augimo modeliavimas, 4) augimo modelių patikrinimas. Duomenų bazę sudarė 20 pastovių tyrimo barelių, iš kurių 18 buvo skirti modelių kūrimui ir 2 modelių patikrinimui. Tyrimo bareliai buvo įsteigti natūraliai atsikūrusiuose vienaardžiuose pušynuose, augančiuose tipingose pušiai augavietėse. Duomenų bazės įvertinimas buvo atliekamas tokiais etapais: (a) pirminės duomenų bazės suformavimas, (b) trūkstamų matavimų modeliavimas ir (c) duomenų bazės įvertinimas yra grindžiamas: • Imties dydžiu ir populiacijos vidurkio nustatymo tikslumu. • Potencialaus medynų našumo įvertinimu. • Ryšių tarp potencialaus medynų našumo ir medynų našumo bei produktyvumo įvertinimu. Vertinant konkurencijos įtaką medžių skersmens, skerspločių sumos ir aukščio prieaugiui, buvo naudoti konkurentų parinkimo ir konkurencijos įvertinimo metodai. Konkuruojantys medžiai buvo atrenkami pagal apversto kūgio viršūnę, sutapatintą su tiriamojo medžio a) lajos pradžia, b) plačiausia lajos vieta, ir c) medžio šaknies kakleliu. Kūgio kampas buvo keičiamas nuo 60 iki 80 laipsnių. Iš viso buvo tiriama dvidešimt konkurencijos indeksų (du konkurencijos indeksai, nepriklausantys nuo atstumo tarp medžių ir aštuoniolika konkurencijos indeksų, priklausančių nuo atstumo tarp medžių). Konkurencijos indeksai vertinti taikant dalinės koreliacijos metodus. Medžių augimo modeliavimas buvo atliekamas trim etapais: a) originalaus medžių skersmens prieaugio modelio sukūrimas, b) medžių skerspločių sumos ir medžių aukščio prieaugio modelių parametrizavimas naujai, c) sukūrimas originalių ir parametrizavimas naujai jau esamų natūralaus retinimosi modelių. Paprastieji tiesinės regresijos modeliai buvo vertinami naudojant jų statistinį reikšmingumą ir skaičiuojant determinacijos koeficientą. Daugialypių tiesinės regresijos modelių statistinė analizė buvo išplėsta papildomais testais: statistinis reikšmingumas tiriamas kiekvienam nepriklausomam kintamajam, taip pat vertinama ar modelis tenkina pagrindines regresijos sąlygas (nepriklausomi kintamieji nėra tarpusavyje susieti, modelio liekanos turi normalųjį skirstinį, yra tolygiai išsidėstę). Paprastieji netiesinės regresijos modeliai buvo vertinami skaičiuojant koreguotąjį determinacijos koeficientą. Atliekant daugialypių netiesinės regresijos modelių analizę taip pat buvo tikrinama ar tenkinamos regresijos sąlygos. Logistiniai savaiminio retinimosi modeliai buvo vertinami naudojant šiuos statistinius parametrus: modelio X2 suderinamumo kriterijų, Voldo kriterijų, didžiausio tikėtinumo funkcijos vertę, Kokso-Snelo ir Nagelkerkės pseudodeterminacijos koeficientus, klasifikavimo lenteles ir klasifikatoriaus jautrumo ir specifiškumo (ROC) kreives. Parametrizuoti naujai medžių skerspločių sumos ir medžių aukščio prieaugių modeliai buvo tikrinami lyginant modeliuotas medžių skersmens ir aukščio reikšmes su realiai išmatuotomis reikšmėmis analizuojamo periodo pabaigoje. Taip pat buvo tiriamas modelių liekanų išsidėstymas modeliuojamų verčių atžvilgiu. Galiausiai, poslinkio, santykinio poslinkio, tikslumo, santykinio tikslumo, tikslumo be poslinkio ir santykinio tikslumo be poslinkio buvo naudojami vertinant modelių prognozes. Rezultatai ir išvados Augimo modeliai, naudojami BWINPro-S medžio lygio simuliatoriuje, buvo sėkmingai parametrizuoti naujai ir pritaikyti Lietuvos sąlygoms. Remiantis šio darbo rezultatais, buvo gautos sekančios išvados: 1. Sukauptas tūris ir bendras medynų našumas pušynuose tik dalinai priklauso nuo potencialaus augaviečių derlingumo. Augavietės sąlygos lemia tik potencialų medynų našumą kuris gali būti pasiektas medyne. Ar potencialus augavietės našumas bus realizuotas priklauso nuo medžių auginimo rėžimo, kuris apibūdinamas ugdomųjų kirtimų pradžia, kartojimų dažnumu ir jų intensyvumu. 2. Grynuose pušynuose, konkurencijos indeksai, įvertinantys atstumą tarp medžių turi didesnes galimybes prognozuoti skerspločių sumos prieaugį negu konkurencijos indeksai, neįvertinantys atstumo tarp medžių. Konkurencijos indeksas CI4, pasiūlytas BIGING & DOBBERTIN (1992), grindžiamas konkurentų parinkimu pagal apverstą 80 laipsnių kūgį, kurio viršūnė yra sutapatinama su medžių lajos pradžia yra rekomenduojamas kaip pats efektyviausias modeliuojant medžių skersmens prieaugį. 3. HEGYI (1974) konkurencijos indeksas, neįvertinantis atstumo tarp medžių tiriant konkurencijos indeksų įtaką medžių aukščio prieaugiui, parodė kiek geresnius dalinės koreliacijos rezultatus negu kad konkurencijos indeksai, įvertinantys atstumą tarp medžių. Tyrimų rezultatai parodė gana silpną konkurencijos indeksų galimybę prognozuoti medžių aukščio prieaugį. 4. Konkurencija turi išskirtinai neigiamą įtaką medžių skersmens prieaugiui. Didėjanti konkurencija lemia mažėjantį skersmens prieaugį. Nedidelė konkurencija padidina medžių aukščio prieaugį. Tačiau stipresnė konkurencija taip pat turi neigiamą įtaką medžių aukščio prieaugiui. 5. Originalus skersmens prieaugio modelis turi geras galimybes prognozuoti pušies medžių augimą. Šio modelio determinacijos koeficientas buvo lygus 0.483. Modelio liekanos turėjo normalųjį skirstinį ir buvo tolygiai pasiskirsčiusios modeliuojamų verčių atžvilgiu. 6. Parametrizuoti naujai BWINPro-S medžių skerspločių sumos ir medžių aukščio prieaugio modeliai, panaudojant Lietuvos pušynų pastovių tyrimo barelių duomenis, padidino jų prognozavimo galimybes. Pirmieji modelių tikrinimo rezultatai pagrįsti trisdešimties metų augimo prognozėmis, parodė, kad šie modeliai yra patikimi. 7. Du originaliai sukurti pušynų savaiminio retinimosi modeliai pasižymi geromis galimybėmis prognozuoti pušynų savaiminį išsiretinimą. Savaiminio retinimosi modelis, atsižvelgiantis į atstumą tarp medžių pasižymi geresnėmis galimybėmis prognozuoti pušynų savaiminį retinimąsi negu savaiminio retinimosi modelis, neatsižvelgiantis į atstumą tarp medžių. Abu modeliai teisingai klasifikavo daugiau negu 83% augančių ir savaime išsiretinančių medžių. BWINPro-S savaiminio retinimosi modelio parametrizavimas naujai padidino jo teisingai prognozuojamų augančių ir savaime išsiretinančių medžių dalį šešiais procentais, nuo 77 iki 83%. 8. Medžio lygio augimo simuliatorius BWINPro-S su parametrizuotais naujai augimo modeliais yra naudingas įrankis Lietuvos miškų augintojams
Araya, Yeheyies. "Detecting Swiching Points and Mode of Transport from GPS Tracks." Thesis, Linköpings universitet, Kommunikations- och transportsystem, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-91320.
Full textLecuyer, Jean-Francois. "Comparison of classification trees and logistic regression to model the severity of collisions involving elderly drivers in Canada." Thesis, University of Ottawa (Canada), 2008. http://hdl.handle.net/10393/27700.
Full textHuang, Xuan. "Balance-guaranteed optimized tree with reject option for live fish recognition." Thesis, University of Edinburgh, 2014. http://hdl.handle.net/1842/9779.
Full textSantos, Ernani Possato dos. "Análise de crédito com segmentação da carteira, modelos de análise discriminante, regressão logística e classification and regression trees (CART)." Universidade Presbiteriana Mackenzie, 2015. http://tede.mackenzie.br/jspui/handle/tede/970.
Full textThe credit claims to be one of the most important tools to trigger and move the economic wheel. Once it is well used it will bring benefits on a large scale to society; although if it is used without any balance it might bring loss to the banks, companies, to governments and also to the population. In relation to this context it becomes fundamental to evaluate models of credit capable of anticipating processses of default with an adequate degree of accuracy so as to avoid or at least to reduce the risk of credit. This study also aims to evaluate three credit risk models, being two parametric models, discriminating analysis and logistic regression, and one non-parametric, decision tree, aiming to check the accuracy of them, before and after the segmentation of such sample through the criteria of costumer s size. This research relates to an applied study about Industry BASE.
O crédito se configura em uma das mais importantes ferramentas para alavancar negócios e girar a roda da economia. Se bem utilizado, trará benefícios em larga escala à sociedade, porém, se utilizado sem equilíbrio, poderá trazer prejuízos, também em larga escala, a bancos, a empresas, aos governos e aos cidadãos. Em função deste contexto, é precípuo avaliar modelos de crédito capazes de prever, com grau adequado de acurácia, processos de default, a fim de se evitar ou, pelo menos, reduzir o risco de crédito. Este estudo tem como finalidade avaliar três modelos de análise do risco de crédito, sendo dois modelos paramétricos, análise discriminante e regressão logística, e um não-paramétrico, árvore de decisão, em que se avaliou a acurácia destes modelos, antes e após a segmentação da amostra desta pesquisa por meio do critério de porte dos clientes. Esta pesquisa se refere a um estudo aplicado sobre a Indústria BASE.
Rusch, Thomas, Ilro Lee, Kurt Hornik, Wolfgang Jank, and Achim Zeileis. "Influencing elections with statistics: targeting voters with logistic regression trees." Institute of Mathematical Statistics (IMS), 2013. http://epub.wu.ac.at/3979/1/AOAS648.pdf.
Full textRusch, Thomas, Ilro Lee, Kurt Hornik, Wolfgang Jank, and Achim Zeileis. "Influencing Elections with Statistics: Targeting Voters with Logistic Regression Trees." WU Vienna University of Economics and Business, 2012. http://epub.wu.ac.at/3458/1/Report117.pdf.
Full textSeries: Research Report Series / Department of Statistics and Mathematics
Meira, Carlos Alberto Alves. "Processo de descoberta de conhecimento em bases de dados para a analise e o alerta de doenças de culturas agricolas e sua aplicação na ferrugem do cafeeiro." [s.n.], 2008. http://repositorio.unicamp.br/jspui/handle/REPOSIP/257023.
Full textTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Agricola
Made available in DSpace on 2018-08-11T10:02:19Z (GMT). No. of bitstreams: 1 Meira_CarlosAlbertoAlves_D.pdf: 2588338 bytes, checksum: 869cc28d2c71dbc901870285cc32d8f9 (MD5) Previous issue date: 2008
Resumo: Sistemas de alerta de doenças de plantas permitem racionalizar o uso de agrotóxicos, mas são pouco utilizados na prática. Complexidade dos modelos, dificuldade de obtenção dos dados necessários e custos para o agricultor estão entre as razões que inibem o seu uso. Entretanto, o desenvolvimento tecnológico recente - estações meteoro lógicas automáticas, bancos de dados, monitoramento agrometeorológico na Web e técnicas avançadas de análise de dados - permite se pensar em um sistema de acesso simples e gratuito. Uma instância do processo de descoberta de conhecimento em bases de dados foi realizada com o objetivo de avaliar o uso de classificação e de indução de árvores de decisão na análise e no alerta da ferrugem do cafeeiro causada por Hemileia vastatrix. Taxas de infecção calculadas a partir de avaliações mensais de incidência da ferrugem foram agrupadas em três classes: TXl - redução ou estagnação; TX2 - crescimento moderado (até 5 p.p.); e TX3 - crescimento acelerado (acima de 5 p.p.). Dados meteorológicos, carga pendente de frutos do cafeeiro (Coffea arabica) e espaçamento entre plantas foram as variáveis independentes. O conjunto de treinamento totalizou 364 exemplos, preparados a partir de dados coletados em lavouras de café em produção, de outubro de 1998 a outubro de 2006. Uma árvore de decisão foi desenvolvida para analisar a epidemia da ferrugem do cafeeiro. Ela demonstrou seu potencial como modelo simbólico e interpretável, permitindo a identificação das fronteiras de decisão e da lógica contidas nos dados, allf'iliando na compreensão de quais variáveis e como as interações dessas variáveis condicionaram o progresso da doença no campo. As variáveis explicativas mais importantes foram a temperatura média nos períodos de molhamento foliar, a carga pendente de frutos, a média das temperaturas máximas diárias no período de inG:!Jbação e a umidade relativa do ar. Os modelos de alerta foram deserivolvtdos considerando taxas de infecção binárias, segundo os limites de 5 p.p e 10 p.p. (classe- '1' para taxas maiores ou iguais ao limite; classe 'O', caso contrário). Os modelos são específicos para lavouras com alta carga pendente ou para lavouras com baixa carga. Os primeiros tiveram melhor desempenho na avaliação. A estimativa de acurácia, por validação cruzada, foi de até 83%, considerando o alerta a partir de 5 p.p. Houve ainda equilíbrio entre a acurácia e medidas importantes como sensitividade, especificidade e confiabilidade positiva ou negativa. Considerando o alerta a partir de 10 p.p., a acurácia foi de 79%. Para lavouras com baixa carga pendente, os modelos considerando o alerta a partir de 5 p.p. tiveram acurácia de até 72%. Os modelos para a taxa de infecção mais elevada (a partir de 10 p.p.) tiveram desempenho fraco. Os modelos mais bem avaliados mostraram ter potencial para servir como apoio na tomada de decisão referente à adoção de medidas de controle da ferrugem do cafeeiro. O processo de descoberta de conhecimento em bases de dados foi caracterizado, com a intenção de que possa vir a ser útil em aplicações semelhantes para outras culturas agrícolas ou para a própria cultura do café, no caso de outras doenças ou pragas
Abstract: Plant disease warning systems can contribute for diminishing the use of chemicals in agriculture, but they have received limited acceptance in practice. Complexity of models, difficulties in obtaining the required data and costs for the growers are among the reasons that inhibit their use. However, recent technological advance - automatic weather stations, databases, Web based agrometeorological monitoring and advanced techniques of data analysis - allows the development of a system with simple and free access. A process .instance of knowledge discovery in databases has been realized to evaluate the use of classification and decision tree induction in the analysis and warning of coffee rust caused by Hemileia vastatrix. Infection rates calculated from monthly assessments of rust incidence were grouped into three classes: TXl - reduction or stagnation; TX2 - moderate growth (up to 5 pp); and TX3 - accelerated growth (above 5 pp). Meteorological data, expected yield and space between plants were used as independent variables. The training data set contained 364 examples prepared from data collected in coffee-growing areas between October 1998 and October 2006. A decision tree has been developed to analyse the coffee rust epidemics. The decision tree demonstrated its potential as a symbolic and interpretable model. Its mo deI representation identified the existing decision boundaries in the data and the logic underlying them, helping to understand which variables, and interactions between these variables, led to, coffee rust epidemics in the field. The most important explanatory variables were mean temperature during leaf wetness periods, expected yield, mean of maximum temperatures during the incubation period and relative air humidity. The warning models have been developed considering binary infection rates, according to the 5 pp and 10 pp thresholds, (class '1' for rates greater than or equal the threshold; class 'O;, otherwise). These models are specific for growing are as with high expected yield or areas with low expected yield. The former had best performance in the evaluation. The estimated accuracy by cross-validation was up to 83%, considering the waming for 5 pp and higher. There was yet equivalence between accuracy and such important measures like sensitivity, specificity a~d positive or negative reliability. Considering the waming for 10 pp and higher, the accuracy was 79%. For growing areas with low expected yield, the accuracy of the models considering the waming for 5 pp and higher was up to 72%. The models for the higher infection rate (10 pp and higher) had low performance. The best evaluated models showed potential to be used in decision making about coffee rust disease control. The process of knowledge discovery in databases was characterized in such a way it can be employed in similar problems of the application domain with other crops or other coffee diseases or pests
Doutorado
Planejamento e Desenvolvimento Rural Sustentável
Doutor em Engenharia Agrícola
Krueger, Kirk L. "Effects of Sampling Sufficiency and Model Selection on Predicting the Occurrence of Stream Fish Species at Large Spatial Extents." Diss., Virginia Tech, 2009. http://hdl.handle.net/10919/26214.
Full textPh. D.
Julock, Gregory Alan. "The Effectiveness of a Random Forests Model in Detecting Network-Based Buffer Overflow Attacks." NSUWorks, 2013. http://nsuworks.nova.edu/gscis_etd/190.
Full textMoore, Cordelia Holly. "Defining and predicting species-environment relationships : understanding the spatial ecology of demersal fish communities." University of Western Australia. Faculty of Natural and Agricultural Sciences, 2009. http://theses.library.uwa.edu.au/adt-WU2010.0002.
Full textGirard, Nathalie. "Vers une approche hybride mêlant arbre de classification et treillis de Galois pour de l'indexation d'images." Thesis, La Rochelle, 2013. http://www.theses.fr/2013LAROS402/document.
Full textImage classification is generally based on two steps namely the extraction of the image signature, followed by the extracted data analysis. Image signature is generally numerical. Many classification models have been proposed in the literature, among which most suitable choice is often guided by the classification performance and the model readability. Decision trees and Galois lattices are two symbolic models known for their readability. In her thesis {Guillas 2007}, Guillas efficiently used Galois lattices for image classification. Strong structural links between decision trees and Galois lattices have been highlighted. Accordingly, we are interested in comparing models in order to design a hybrid model between those two. The hybrid model will combine the advantages (robustness of the lattice, low memory space of the tree and readability of both). For this purpose, we study the links between the two models to highlight their differences. Firstly, the discretization type where decision trees generally use a local discretization while Galois lattices, originally defined for binary data, use a global discretization. From the study of the properties of dichotomic lattice (specific lattice defined after discretization), we propose a local discretization for lattice that allows us to improve its classification performances and reduces its structural complexity. Then, the process of post-pruning implemented in most of the decision trees aims to reduce the complexity of the latter, but also to improve their classification performances. Lattice filtering is solely motivated by a decrease in the structural complexity of the structures (exponential in the size of data in the worst case). By combining these two processes, we propose a simplification of the lattice structure constructed after our local discretization. This simplification leads to a hybrid classification model that takes advantage of both decision trees and Galois lattice. It is as readable as the last two, while being less complex than the lattice but also efficient
Coelho, Fabrício Fernandes. "Comparação de métodos de mapeamento digital de solos através de variáveis geomorfométricas e sistemas de informações geográficas." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2010. http://hdl.handle.net/10183/25062.
Full textSoil maps are sources of important information for land planning and management, but are expensive to produce. This study proposes testing and comparing single stage classification methods (multiple multinomial logistic regression and Bayes) and multiple stage classification methods (CART, J48 and LMT) using geographic information system and terrain parameters for producing soil maps with both original and simplified legend. In ArcGis environment terrain parameters and original soil map were sampled for training algoritms. The results from statistical software Weka were implemented in ArcGis environment to generate digital soil maps. Error matrices were genereted for analysis accuracies of the maps.The terrain parameters that best explained soil distribution were slope, profile and planar curvature, elevation, and topographic wetness index. The multiple stage classification methods showed small improvements in overall accuracies and large improvements in the Kappa index. Simplification of the original legend significantly increased the producer and user accuracies, however produced small improvements in overall accuracies and Kappa index.
Vinnemeier, Christof David [Verfasser], Jürgen [Akademischer Betreuer] May, Uwe [Akademischer Betreuer] Groß, and Tim [Akademischer Betreuer] Friede. "Establishment of a clinical algorithm for the diagnosis of P. falciparum malaria in children from an endemic area using a Classification and Regression Tree (CART) model / Christof David Vinnemeier. Gutachter: Uwe Groß ; Tim Friede. Betreuer: Jürgen May." Göttingen : Niedersächsische Staats- und Universitätsbibliothek Göttingen, 2015. http://d-nb.info/1065882017/34.
Full textCaetano, Mateus 1983. "Modelos de classificação : aplicações no setor bancário." [s.n.], 2015. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306286.
Full textDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-26T18:03:59Z (GMT). No. of bitstreams: 1 Caetano_Mateus_M.pdf: 1249293 bytes, checksum: f8adb755363291250261872ea756f58c (MD5) Previous issue date: 2015
Resumo: Técnicas para solucionar problemas de classificação têm aplicações em diversas áreas, como concessão de crédito, reconhecimento de imagens, detecção de SPAM, entre outras. É uma área de intensa pesquisa, para a qual diversos métodos foram e continuam sendo desenvolvidos. Dado que não há um método que apresente o melhor desempenho para qualquer tipo de aplicação, diferentes métodos precisam ser comparados para que possamos encontrar o melhor ajuste para cada aplicação em particular. Neste trabalho estudamos seis diferentes métodos aplicados em problemas de classificação supervisionada (onde há uma resposta conhecida para o treinamento do modelo): Regressão Logística, Árvore de Decisão, Naive Bayes, KNN (k-Nearest Neighbors), Redes Neurais e Support Vector Machine. Aplicamos os métodos em três conjuntos de dados referentes à problemas de concessão de crédito e seleção de clientes para campanha de marketing bancário. Realizamos o pré-processamento dos dados para lidar com observações faltantes e classes desbalanceadas. Utilizamos técnicas de particionamento do conjunto de dados e diversas métricas, como acurácia, F1 e curva ROC, com o objetivo de avaliar os desempenhos dos métodos/técnicas. Comparamos, para cada problema, o desempenho dos diferentes métodos considerando as métricas selecionadas. Os resultados obtidos pelos melhores modelos de cada aplicação foram compatíveis com outros estudos que utilizaram os mesmos bancos de dados
Abstract: Techniques for classification problems have applications on many areas, such as credit risk evaluation, image recognition, SPAM detection, among others. It is an area of intense research, for which many methods were and continue to be developed. Given that there is not a method whose performance is better across any type of problems, different methods need to be compared in order to select the one that provides the best adjustment for each application in particular. In this work, we studied six different methods applied to supervised classification problems (when there is a known response for the model training): Logistic Regression, Decision Tree, Naive Bayes, KNN (k-Nearest Neighbors), Neural Networks and Support Vector Machine. We applied these methods on three data sets related to credit evaluation and customer selection for a banking marketing campaign. We made the data pre-processing to cope with missing data and unbalanced classes. We used data partitioning techniques and several metrics, as accuracy, F1 and ROC curve, in order to evaluate the methods/techniques performances. We compared, for each problem, the performances of the different methods using the selected metrics. The results obtained for the best models on each application were comparable to other studies that have used the same data sources
Mestrado
Matematica Aplicada
Mestra em Matemática Aplicada
Sousa, Rogério Pereira de. "Classificação linear de bovinos: criação de um modelo de decisão baseado na conformação de tipo “true type” como auxiliar a tomada de decisão na seleção de bovinos leiteiros." Universidade do Vale do Rio dos Sinos, 2016. http://www.repositorio.jesuita.org.br/handle/UNISINOS/5896.
Full textMade available in DSpace on 2016-11-01T15:54:48Z (GMT). No. of bitstreams: 1 Rogério Pereira de Sousa_.pdf: 946780 bytes, checksum: ceb6c981273e15ecc58fe661bd02a34a (MD5) Previous issue date: 2016-08-29
IFTO - Instituto Federal de Educação, Ciência e Tecnologia do Tocantins
A seleção de bovinos leiteiros, através da utilização do sistema de classificação com características lineares de tipo, reflete no ganho de produção, na vida produtiva do animal, na padronização do rebanho, entre outros. Esta pesquisa operacional obteve suas informações através de pesquisas bibliográficas e análise de base de dados de classificações reais. O presente estudo, objetivou a geração de um modelo de classificação de bovinos leiteiros baseado em “true type”, para auxiliar os avaliadores no processamento e análise dos dados, ajudando na tomada de decisão quanto a seleção da vaca para aptidão leiteira, tornando os dados seguros para futuras consultas. Nesta pesquisa, aplica-se métodos computacionais à classificação de vacas leiteiras mediante a utilização mineração de dados e lógica fuzzy. Para tanto, realizou-se a análise em uma base de dado com 144 registros de animais classificados entre as categorias boa e excelente. A análise ocorreu com a utilização da ferramenta WEKA para extração de regras de associação com o algoritmo apriori, utilizando como métricas objetivas, suporte / confiança, e lift para determinar o grau de dependência da regra. Para criação do modelo de decisão com lógica fuzzy, fez-se uso da ferramenta R utilizando o pacote sets. Por meio dos resultados obtidos na mineração de regras, foi possível identificar regras relevantes ao modelo de classificação com confiança acima de 90%, indicando que as características avaliadas (antecedente) implicam em outras características (consequente), com uma confiança alta. Quanto aos resultados obtidos pelo modelo de decisão fuzzy, observa-se que, o modelo de classificação baseado em avaliações subjetivas fica suscetível a erros de classificação, sugerindo então o uso de resultados obtidos por regras de associação como forma de auxílio objetivo na classificação final da vaca para aptidão leiteira.
The selection of dairy cattle through the use of the rating system with linear type traits, reflected in increased production, the productive life of the animal, the standardization of the flock, among others. This operational research obtained their information through library research and basic analysis of actual ratings data. This study aimed to generate a dairy cattle classification model based on "true type" to assist the evaluators in the processing and analysis of data, helping in decision making and the selection of the cow to milk fitness, making the data safe for future reference. In this research, applies computational methods to the classification of dairy cows by using data mining and fuzzy logic. Therefore, we conducted the analysis on a data base with 144 animals records classified between good and excellent categories. Analysis is made with the use of WEKA tool for extraction of association rules with Apriori algorithm, using as objective metrics, support / confidence and lift to determine the degree of dependency rule. To create the decision model with fuzzy logic, it was made use of R using the tool sets package. Through the results obtained in the mining rules, it was possible to identify the relevant rules with confidence classification model above 90%, indicating that the characteristics assessed (antecedent) imply other characteristics (consequent), with a high confidence. As for the results obtained by the fuzzy decision model, it is observed that the classification model based on subjective assessments is susceptible to misclassification, suggesting then the use of results obtained by association rules as a way to aid goal in the final classification cow for dairy fitness
Lin, Shu-Chuan. "Robust estimation for spatial models and the skill test for disease diagnosis." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26681.
Full textCommittee Chair: Lu, Jye-Chyi; Committee Co-Chair: Kvam, Paul; Committee Member: Mei, Yajun; Committee Member: Serban, Nicoleta; Committee Member: Vidakovic, Brani. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Ataky, Steve Tsham Mpinda. "Análise de dados sequenciais heterogêneos baseada em árvore de decisão e modelos de Markov : aplicação na logística de transporte." Universidade Federal de São Carlos, 2015. https://repositorio.ufscar.br/handle/ufscar/7242.
Full textApproved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:28Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-16T19:59:34Z (GMT) No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5)
Made available in DSpace on 2016-09-16T19:59:41Z (GMT). No. of bitstreams: 1 DissSATM.pdf: 3079104 bytes, checksum: 51b46ffeb4387370e30fb92e31771606 (MD5) Previous issue date: 2015-10-16
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Latterly, the development of data mining techniques has emerged in many applications’ fields with aim at analyzing large volumes of data which may be simple and / or complex. The logistics of transport, the railway setor in particular, is a sector with such a characteristic in that the data available in are of varied natures (classic variables such as top speed or type of train, symbolic variables such as the set of routes traveled by train, degree of tack, etc.). As part of this dissertation, one addresses the problem of classification and prediction of heterogeneous data; it is proposed to study through two main approaches. First, an automatic classification approach was implemented based on classification tree technique, which also allows new data to be efficiently integrated into partitions initialized beforehand. The second contribution of this work concerns the analysis of sequence data. It has been proposed to combine the above classification method with Markov models for obtaining a time series (temporal sequences) partition in homogeneous and significant groups based on probabilities. The resulting model offers good interpretation of classes built and allows us to estimate the evolution of the sequences of a particular vehicle. Both approaches were then applied onto real data from the a Brazilian railway information system company in the spirit of supporting the strategic management of planning and coherent prediction. This work is to initially provide a thinner type of planning to solve the problems associated with the existing classification in homogeneous circulations groups. Second, it sought to define a typology of train paths (sucession traffic of the same train) in order to provide or predict the next movement of statistical characteristics of a train carrying the same route. The general methodology provides a supportive environment for decision-making to monitor and control the planning organization. Thereby, a formula with two variants was proposed to calculate the adhesion degree between the track effectively carried out or being carried out with the planned one.
Nos últimos anos aflorou o desenvolvimento de técnicas de mineração de dados em muitos domínios de aplicação com finalidade de analisar grandes volumes de dados, os quais podendo ser simples e/ou complexos. A logística de transporte, o setor ferroviário em particular, é uma área com tal característica em que os dados disponíveis são muitos e de variadas naturezas (variáveis clássicas como velocidade máxima ou tipo de trem, variáveis simbólicas como o conjunto de vias percorridas pelo trem, etc). Como parte desta dissertação, aborda-se o problema de classificação e previsão de dados heterogêneos, propõe-se estudar através de duas abordagens principais. Primeiramente, foi utilizada uma abordagem de classificação automática com base na técnica por ´arvore de classificação, a qual também permite que novos dados sejam eficientemente integradas nas partições inicial. A segunda contribuição deste trabalho diz respeito à análise de dados sequenciais. Propôs-se a combinar o método de classificação anterior com modelos de Markov para obter uma participação de sequências temporais em grupos homogêneos e significativos com base nas probabilidades. O modelo resultante oferece uma boa interpretação das classes construídas e permite estimar a evolução das sequências de um determinado veículo. Ambas as abordagens foram então aplicadas nos dados do sistema de informação ferroviário, no espírito de dar apoio à gestão estratégica de planejamentos e previsões aderentes. Este trabalho consiste em fornecer inicialmente uma tipologia mais fina de planejamento para resolver os problemas associados com a classificação existente em grupos de circulações homogêneos. Em segundo lugar, buscou-se definir uma tipologia de trajetórias de trens (sucessão de circulações de um mesmo trem) para assim fornecer ou prever características estatísticas da próxima circulação mais provável de um trem realizando o mesmo percurso. A metodologia geral proporciona um ambiente de apoio à decisão para o monitoramento e controle da organização de planejamento. Deste fato, uma fórmula com duas variantes foi proposta para calcular o grau de aderência entre a trajetória efetivamente realizada ou em curso de realização com o planejado.
Peroutka, Lukáš. "Návrh a implementace Data Mining modelu v technologii MS SQL Server." Master's thesis, Vysoká škola ekonomická v Praze, 2012. http://www.nusl.cz/ntk/nusl-199081.
Full textCardoso, Diego Soares. "Política antitruste e sua consistência: uma análise das decisões do Sistema Brasileiro de Defesa da Concorrência relativas aos Atos de Concentração." Universidade Federal de São Carlos, 2013. https://repositorio.ufscar.br/handle/ufscar/2168.
Full textFinanciadora de Estudos e Projetos
The goal of competition policy, also known as antitrust policy, is promoting the welfare and economic efficiency by preserving fair competition in markets. Merger control is one of the main responsibilities of antitrust institutions. Prohibitions and restrictions of merger operations affect market structures, thus making these decisions relevant to economic agents. This Master's thesis analyzes the decisions made by Brazilian antitrust institutions regarding merger processes. Data was collected from public documents issued from 2004 to 2011. Bivariate analysis, discrete choice models and classification decision trees show that these merger control decisions are consistent with Brazilian antitrust law. Consistent competition policy reduces uncertainty, aligns expectations and increases the efficiency of antitrust law enforcement. Therefore, this research contributes to better understanding Brazilian competition policy related to merger control and its decision drivers.
As políticas de defesa da concorrência, ou políticas antitruste, visam ao maior bem-estar social por meio da manutenção de ambientes concorrenciais que promovam a eficiência econômica. No Brasil, os órgãos que compõem o Sistema Brasileiro de Defesa da Concorrência são os responsáveis pelas decisões sobre os agentes econômicos a fim de atingir os objetivos das políticas antitruste. Nesse âmbito, as decisões que influenciam a estrutura de mercados por meio das restrições e vetos a processos como fusões e aquisições de empresas - os julgamentos de Atos de Concentração - apresentam elevada relevância. Este trabalho realiza uma avaliação das decisões do Sistema Brasileiro de Defesa da Concorrência relativas aos Atos de Concentração. Para tal, foram coletados dados a partir dos documentos públicos emitidos pelos órgãos antitruste no período entre 2004 e 2011. Por meio da aplicação de modelos de regressão de escolha discreta e árvores de decisão induzidas, verificou-se que tais decisões são consistentes com as regras antitruste brasileiras. A consistência com regras estabelecidas possibilita uma maior eficiência na aplicação das políticas de defesa da concorrência, uma vez que reduz as incertezas dos agentes econômicos, alinha as expectativas e facilita a condução dos processos. Nesse sentido, esta investigação contribui para uma melhor compreensão dos fatores que influenciam as decisões dos órgãos brasileiros de defesa da concorrência, oferecendo também indicativos que auxiliam na verificação da eficiência da aplicação de tais políticas.
Sutton-Charani, Nicolas. "Apprentissage à partir de données et de connaissances incertaines : application à la prédiction de la qualité du caoutchouc." Thesis, Compiègne, 2014. http://www.theses.fr/2014COMP1835/document.
Full textDuring the learning of predictive models, the quality of available data is essential for the reliability of obtained predictions. These learning data are, in practice very often imperfect or uncertain (imprecise, noised, etc). This PhD thesis is focused on this context where the theory of belief functions is used in order to adapt standard statistical tools to uncertain data.The chosen predictive model is decision trees which are basic classifiers in Artificial Intelligence initially conceived to be built from precise data. The aim of the main methodology developed in this thesis is to generalise decision trees to uncertain data (fuzzy, probabilistic, missing, etc) in input and in output. To realise this extension to uncertain data, the main tool is a likelihood adapted to belief functions,recently presented in the literature, whose behaviour is here studied. The maximisation of this likelihood provide estimators of the trees’ parameters. This maximisation is obtained via the E2M algorithm which is an extension of the EM algorithm to belief functions.The presented methodology, the E2M decision trees, is applied to a real case : the natural rubber quality prediction. The learning data, mainly cultural and climatic,contains many uncertainties which are modelled by belief functions adapted to those imperfections. After a simple descriptiv statistic study of the data, E2M decision trees are built, evaluated and compared to standard decision trees. The taken into account of the data uncertainty slightly improves the predictive accuracy but moreover, the importance of some variables, sparsely studied until now, is highlighted
Fearer, Todd Matthew. "Evaluating Population-Habitat Relationships of Forest Breeding Birds at Multiple Spatial and Temporal Scales Using Forest Inventory and Analysis Data." Diss., Virginia Tech, 2006. http://hdl.handle.net/10919/29243.
Full textPh. D.
Hu, Wenbiao. "Applications of Spatio-temporal Analytical Methods in Surveillance of Ross River Virus Disease." Queensland University of Technology, 2005. http://eprints.qut.edu.au/16109/.
Full textBretschneider, Jörg. "Ein wellenbasiertes stochastisches Modell zur Vorhersage der Erdbebenlast." Doctoral thesis, Technische Universität Dresden, 2006. https://tud.qucosa.de/id/qucosa%3A25000.
Full textStrong earthquakes are a potential high risk for urban centres worldwide, which is, amongst others, confronted by methods of aseismic structural design. This is based on both assumptions and thorough knowledge about local seismic ground acceleration; limits are set, on the other side, by additional costs. Damage balance of recent strong quakes - also in industrialized countries - emphasize the need for further refinement of concepts and methods of earthquake resistant structural design. In this work, a new approach of stochastic seismic load modelling is presented, letting go the usual presupposition of a stationary, one-dimensional stochastic process for ground acceleration. The goal is site and wave specific load modelling, using information about physical and geotechnical invariants, which enables transparency and low cost approaches in aseismic structural design, but at least reduces seismic risk in comparison to common design methods. Those physical and geotechnical invariants are the structure of the seismic wave field according to physical laws as well as resonance properties of the soil strata at the local site. The proposed load model represents the local wave field as a composition of stochastic evolutionary sub-processes upon time-variant principal axes, which correspond to wave trains with specific load characteristics. Those load characteristics are described in the frequency and time as well as in the spatial domain by wave-specific shape functions, whose parameters strongly correlate to seismic and geotechnical entities. Main contributions of the work are newly developed estimation procedures based on correlation, which serve in the framework of empirical specification of the model parameters for the building practice. The Spectral-Adaptive Principal Correlation Axes (SAPCA) algorithm ensures an optimal covering of the spatial wave trains by transforming the recorded data onto Reference Components. At the same time - in connection with a correction algorithm for the strike angle of the principal axis - it delivers concise associated patterns in the course of the principal axis, which are in turn used to reliably identify dominance phases for three generalized wave trains. Within those wave dominance phases, the wave specific parameters of the load model are determined. Additionally, an algorithm is presented to identify Rayleigh waves in single site acceleration records. Adequacy of the modelling approach and efficiency of the estimation procedures are verified by means of strong motion records from the 1994 Northridge Earthquake The proposed non-stationary modelling approach describes with more accuracy load portions of the strong motion wave field underestimated in conventional stochastic load models. Load portions which are left out or lump-sum modelled so far are made available for analysis and modelling for the first time. The stochastic model gains physical transparency with respect to the most important load generating effects, and hence will be - despite higher complexity - easy to handle in engineering practice. The Principal Axis method will also be useful for seismological analyses in the near field, e.g., for the analysis of rupture processes and topographic site effects.
Des séismes forts sont un gros risque potentiel pour des centres urbains dans le monde entier, qui est, entre autres, confronté par des méthodes de conception aséismique de bâtiments. Ceci est fondé sur des hypothèses et la connaissance profonde au sujet de l'accélération séismique au sol locale. Limites sont placées, de l'autre côté, par des coûts additionnels. Les dommages des séismes forts récents, aussi dans les pays industrialisés, soulignent la nécessité de raffiner plus loin les concepts et les méthodes de conception aséismique de bâtiments. Dans cette oeuvre, une nouvelle approche à la modélisation stochastique de la charge séismique est présentée, qui renonce la présupposition habituelle d'un processus stationnaire et unidimensionnel pour l'accélération de sol. L'objectif est une modélisation spatiale de charge, spécifique d'ondes et de site, qui, par l'utilisation des informations sur des invariantes physiques, permet une mesure de bâtiment asismique transparente et économique, au moins toutefois réduit le risque par rapport aux méthodes de mesures courantes. De tels invariants séismiques et géotechniques sont la structure du champ des ondes séismiques déterminé par les lois de la physique et les qualités de résonance de la stratification de sol locale. Le modèle de charge proposé décrit le champ des ondes au site comme composition des sous-processes évolutionnaires stochastiques sur les axes principales variables dans le temps, qui correspondent aux trains des ondes qu'ont une caractéristique de charge respectivement spécifique. Cette caractéristique de charge est décrit dans le domaine temporel et de fréquence et aussi bien que spatial par les fonctions de forme spécifique d'ondes dont les paramètres corrèlent fortement à des dimensions séismiques et géotechniques. Une priorité d'oeuvre sont des nouvelles procédures d'estimation, pour la spécification empirique des paramètres de modèle pour la pratique de construction, qui se basent sur la sur la corrélation de croix de composante. La procédure adaptative spectrale d'estimation d'axes principals de corrélation (SAPCA) assure la saisie optimale des trains des ondes spatiaux par la transformation des enregistrements sur des composantes de référence. En même temps - en relation avec une procédure de correction d'angle égal d'axe prin¬ci¬pal - il livre des concises schémas associés de cours d'axes principals, au moyen de ceux peut être identifié fiable des phases de dominance pour trois trains généralisés des ondes. Dans ces phases de dominance, les paramètres du modèle de charge spécifiques pour chaque train des ondes sont déterminés. En outre, un algorithme est indiqué, pour identi¬fier des ondes de Rayleigh dans un enregistrement individuel de l'accélération de sol. La qualification de l'approche de modèle et l'efficience des procédures d'estimation sont vérifiées au moyen d'enregistrements de tremblement fort du séisme á Northridge 1994. Avec l'approche de modèle non-stationnaire présentée, tels des parts de charge du champ des ondes sismiques forts sont décrites plus précisément qui sont sous-estimées dans les modèles de charge stochastiques habituels. Des parts de charge q'ont été supprimés ou modelées forfaitairement jusqu'ici, sont rendues accessibles à l'analyse et à la modélisation pour la première fois. Le modèle stochastique devient physico-transparent concernant les effets les plus importants, produisants une charge sur le bâtiment, et ainsi - malgré la complexité plus élevée - mieux maniable en pratique d'ingénieur. La méthode d'axes principals adaptative spectrale (SAPCA) convient aussi pour des analyses sismologiques dans la proximité d'epicentre, par exemple à l'analyse des processus de rupture et des effets de site topographiques.
Por todo el mundo, los terremotos fuertes son un alto riesgo potencial para los centros urbanos, que está, entre otros, enfrentado por métodos de diseño estructural antisísmico. Estos métodos son basa en asunciones y conocimiento fundamentado sobre la aceleración de tierra sísmica local; los límites son fijados, en el otro lado, por costes adicionales. Balance de los daños de temblores fuertes recientes - también en países industrializados - acentúe la necesidad del refinamiento adicional de conceptos y de métodos de diseño estructural resistente del terremoto. En este trabajo, una nueva aproximación de modelar estocástico de la carga sísmica se presenta, superando la presuposición generalmente de un proceso estocástico unidimensional y estacionario para la aceleración de tierra. La meta avisada es un modelo de la carga específico del sitio y de las ondas que, con la información sobre las invariantes físicas y geotécnicas, permite las aproximaciones transparentes y económicas, en diseño estructural antisísmico; pero por lo menos reduce el riesgo sísmico en la comparación a los métodos usados de diseño. Esos invariantes son la estructura regular del campo de las ondas sísmicas, así como las características de la resonancia de los estratos del suelo en el sitio local. El modelo propuesto de la carga representa el campo local de las ondas sísmicas como composición de los procesos parciales evolutivos estocásticos sobre las hachas principales variables-temporales, que corresponden a los trenes de las ondas con características específicas de la carga. Esas características de la carga son descritas en el dominio de la frecuencia y del tiempo así como en el dominio espacial por las funciones de la forma, que parámetros son especificas por los trenos generalizados de la onda sísmica y correlacionan fuertemente a las entidades sísmicas y geotécnicas. La contribución principal de este trabajo son los procedimientos nuevamente desarrollados de la valoración basados en la correlación, que sirven en el contexto de la especificación empírica de los parámetros de modelo para la práctica de construcción. El algoritmo de las Ejes Mayor de la Correlación Espectral-Adaptante (SAPCA) asegura la recogida óptima de los trenes espaciales de la onda transformando los datos registrados sobre componentes de la referencia. En el mismo tiempo - en la conexión con un algoritmo de la corrección para el ángulo del acimut del eje mayor/principal – SAPCA entrega los patrones asociados concisos en el curso del eje principal, que después se utilizan para identificar confiablemente las fases de la dominación para tres trenes generalizados de la onda. Dentro de esas fases de la dominación de la onda, los parámetros específicos de la onda del modelo de la carga se determinan. Además, un algoritmo se presenta para identificar las ondas de Rayleigh en solos mensuras de la aceleración del sitio. La suficiencia del aproximación que modela y la eficacia de los procedimientos de la valoración se verifican por medio de los datos del terremoto catastrófico a Northridge 1994. La aproximación non-estacionaria que modela propuesto describe con más exactitud las porciones de la carga del campo de la onda del terremoto fuerte subestimado en modelos estocásticos convencionales de la carga. Cargue las porciones que se dejan hacia fuera o modelado global hasta ahora se hace disponible para el análisis y modelar para la primera vez. El modelo estocástico gana la transparencia física con respecto a la carga más importante que genera efectos, y por lo tanto será - a pesar de una complejidad más alta - fácil de dirigir en práctica de la ingeniería. El método principal del eje también será útil para los análisis sismológicos en el campo cercano, p. e., para el análisis de los procesos de la ruptura y de los efectos topográficos del sitio.
Сильные землетрясения всемирно являются потенциально высоким риском для ур¬банизированных центров. Для уменшения сейсмического риска развивются методы антисейсмичной структурной конструкции. Эти методы построены на предположениях, которые требуют тщательного эмпирического знания характеристик местного сейсмического ускорения грунта. Предел состоит, с другой стороны, в дополнительных стоимостях строительства. Убытки от недавних сильных землетрясений - также в индустриально развитых странах – подчеркивают потребность более глубокого уточнения прин¬ципиальных схем и методов антисейсмичного строительства. Эта работа представляет новый подход стохастического сейсмического моде¬лирования нагрузки, развивающий обычное предположениe стационарного, одномерного стохастического процессa нa сейсмическое ускорение грунта. Целью будет создание модели нагрузки, которая указана по отдельности для специфических характеристик и сейсмических волн и местного положения, делающей возможным, путём использования информации о физических и геотехничес¬ких инвариантностях, и прозрачных и недорогих подходов в антисейсмичной структурной конструкции, но по крайней мере уменьшающей сейсмический риск, по сравнению с общими методами антисейсмичного строительства. Эти инвариантности являются закономерной структурой волнового поля также, как и свойствами резонан¬са слоёв грунта в месте постройки.. Предложенная модель нагрузки представляет местное волновое поле как составляющая стохастических подпроцессов развития на главных осях зависящих от времени, которые соответствуют волновым пакетам со специфическими характе¬ристиками нагрузки. Те эти характеристики нагрузки описаны в диапазонах частоты и времени также, как трёхмерного объёма функциями формы, параметры которых указаны по отдельности для различных обобщанных волновых пакетов u сильно соотносят от сейсмичес¬ких и геотехнических величин. Главны вклад работы – это новые процедуры оценивания, основанные на корреляции, которые служат в рамках эмпирической спецификации модельных параметров для практики строительства. Новый Aлгоритм Спектрально-Приспо¬собительных Главных Oсей Kорреляции (SAPCA) обеспечивает оптимальное заволакивание трёхмерных волновых пакетов преобразованием записанных данных сейсмического ускорения грунта на калибровочные компоненты на этих главных осях. В то же самое время - в связи с алгоритмом коррекции для угла простирания главной оси - SAPCA поставляет сжатые связанные волновые картины в ходе временно-изменчивых глав¬ных осей, которые в свою очередь использованы для надежного определения доминантных фаз для трёх обобщенных волновых пакетов. В этих фазах засилья отдельного волного пакета, определёны волново-специфические параметры модели нагруз¬ки. Дополнительно, показан алгоритм для идентификации и определения волн типа Релея при одиночной регистрации сейсмической ускорении грунта. Адекватность моделированного подходa и эффективность процедур оценивания подтвержены посредством данных сильного землетрясения Northridge 1994. Предложенный нестационарный подход моделировании описывает с большей точностью части нагрузки волного поля сильных землетрясений недооцененных в обычных стохастических моделях нагрузки. Части нагрузки не рассматривающиеся или слишком обобщаемые до сих пор, при новым подходе впервые можно будет учитывать и анализировать. Стохастическая модель приобретает физической прозрачностей по отношению к самым важным влияниям, которые производят нагрузку, и следовательно будет – несмотря на более высокую сложность – легка для того, чтобы применять её в практике инженерных расчётов. Mетод Главных Oсей также будет полезенно для сейсмологических анализов в близком поле, например, для анализа процессов повреждения и топографических влиянии местного положения.
Cabrol, Sébastien. "Les crises économiques et financières et les facteurs favorisant leur occurrence." Thesis, Paris 9, 2013. http://www.theses.fr/2013PA090019.
Full textThe aim of this thesis is to analyze, from an empirical point of view, both the different varieties of economic and financial crises (typological analysis) and the context’s characteristics, which could be associated with a likely occurrence of such events. Consequently, we analyze both: years seeing a crisis occurring and years preceding such events (leading contexts analysis, forecasting). This study contributes to the empirical literature by focusing exclusively on the crises in advanced economies over the last 30 years, by considering several theoretical types of crises and by taking into account a large number of both economic and financial explanatory variables. As part of this research, we also analyze stylized facts related to the 2007/2008 subprimes turmoil and our ability to foresee crises from an epistemological perspective. Our empirical results are based on the use of binary classification trees through CART (Classification And Regression Trees) methodology. This nonparametric and nonlinear statistical technique allows us to manage large data set and is suitable to identify threshold effects and complex interactions among variables. Furthermore, this methodology leads to characterize crises (or context preceding a crisis) by several distinct sets of independent variables. Thus, we identify as leading indicators of economic and financial crises: variation and volatility of both gold prices and nominal exchange rates, as well as current account balance (as % of GDP) and change in openness ratio. Regarding the typological analysis, we figure out two main different empirical varieties of crises. First, we highlight « global type » crises characterized by a slowdown in US economic activity (stressing the role and influence of the USA in global economic conditions) and low GDP growth in the countries affected by the turmoil. Second, we find that country-specific high level of both inflation and exchange rates volatility could be considered as evidence of « idiosyncratic type » crises
Jakel, Roland. "Lineare und nichtlineare Analyse hochdynamischer Einschlagvorgänge mit Creo Simulate und Abaqus/Explicit." Universitätsbibliothek Chemnitz, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:ch1-qucosa-171812.
Full textThe presentation describes how to analyze the impact of an idealized fragment into a stell protective panel with different dynamic analysis methods. Two different commercial Finite Element codes are used for this: a.) Creo Simulate: This code uses the method of modal superposition for analyzing the dynamic response of linear dynamic systems. Therefore, only modal damping and no contact can be used. The unknown force-vs.-time curve of the impact event cannot be computed, but must be assumed and applied as external force to the steel protective panel. As more dynamic the impact, as sooner the range of validity of the underlying linear model is left. b.) Abaqus/Explicit: This code uses a direct integration method for an incremental (step by step) solution of the underlying differential equation, which does not need a tangential stiffness matrix. In this way, matieral nonlinearities as well as contact can be obtained as one result of the FEM analysis. Even for extremely high-dynamic impacts, good results can be obtained. But, the nonlinear elasto-plastic material behavior with damage initiation and damage evolution must be characterized with a lot of effort. The principal difficulties of the material characterization are described
von, Wenckstern Michael. "Web applications using the Google Web Toolkit." Master's thesis, Technische Universitaet Bergakademie Freiberg Universitaetsbibliothek "Georgius Agricola", 2013. http://nbn-resolving.de/urn:nbn:de:bsz:105-qucosa-115009.
Full textDiese Diplomarbeit beschreibt die Erzeugung desktopähnlicher Anwendungen mit dem Google Web Toolkit und die Umwandlung klassischer Java-Programme in diese. Das Google Web Toolkit ist eine Open-Source-Entwicklungsumgebung, die Java-Code in browserunabhängiges als auch in geräteübergreifendes HTML und JavaScript übersetzt. Vorgestellt wird der Großteil des GWT Frameworks inklusive des Java zu JavaScript-Compilers sowie wichtige Sicherheitsaspekte von Internetseiten. Um zu zeigen, dass auch komplizierte graphische Oberflächen mit dem Google Web Toolkit erzeugt werden können, wird das bekannte Brettspiel Agricola mittels Model-View-Presenter Designmuster implementiert. Zur Ermittlung der richtigen Technologie für das nächste Webprojekt findet ein Vergleich zwischen dem Google Web Toolkit und JavaServer Faces statt
Chen, Pu. "Classification tree models for predicting cancer status." 2009. http://digital.library.duq.edu/u?/etd,109505.
Full textMistry, Pritesh, Daniel Neagu, Paul R. Trundle, and J. D. Vessey. "Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology." 2015. http://hdl.handle.net/10454/7545.
Full textDrug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.
Manwani, Naresh. "Supervised Learning of Piecewise Linear Models." Thesis, 2012. http://hdl.handle.net/2005/3244.
Full textLiu, Hsin-hsien, and 劉欣憲. "A Study of the Application of the Procedure Analysis Method, the Classification Tree Method, and Artificial Neural Network Method to Construct the Authentication Models of the Roadway Accidents." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/15976300103018019953.
Full text逢甲大學
交通工程與管理所
94
The roadway traffic accidents are increasing yearly, and the clients of the traffic accident want to protect their own rights, so that the cases of traffic accidents need to be authenticated are increasing simultaneously. However, the Local Traffic Authentication Committee (LTAAC) is lack of manpower, and the quoted authentication criteria are inconsistent by the different LTAAC. Therefore, it results a delay and decreasing the quality of the authentication case. This study uses three methods such as the Procedural Authentication Method (PAM), the Classification Tree Method (CTM), and the Artificial Neural Network (ANN) to construct those authentication models, so that we can use these models to predict the responsibilities of the clients in a traffic accident. Also this study mainly focuses on the two-vehicle collision which doesn’t include pedestrian and bicyclist. The total data includes 2,634 cases and 5,268 clients. First, the PAM uses literature review and brainstorming to find the authentication criteria. Second, the CTM uses the cross table analysis to pick up the major factors as the input variables, and then sets up the different end-node numbers. Finally, the CTM produces 30 sub-models for validating. Third, the ANN method also uses the cross table analysis to pick up the major factors as the input variables, and sets up the different neuron numbers in the hidden layer. Finally, the ANN method also produces 30 sub-models. There are three collision types: car/car、car/motorcycle、and motorcycle /motorcycle. Both the CTM and the ANN methods will use 80 percentages of cases in database for training, and 20 percentages of cases for validating. This study shows that under the existing criteria, the PAM has the better results than the CTM and ANN method such as the accuracy percentage of the PAM are 74.1%, the accuracy percentage of the CTM are 71.92%, and the accuracy percentage of the ANN method are 67.17%. However, if we include the total client data, the accuracy percentage of the PAM will reduce into 62.5%.
Chang, Shou-Chih, and 張守智. "Model Trees for Hybrid Data Type Classification." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/92670263836027181618.
Full text國立臺灣科技大學
資訊工程系
92
Classification problem is a fundamental task in data mining. Many classification learning algorithms have been successfully developed for the different purposes. Most classification learning algorithms are only suitable for some specified attribute type of data. However, in the real world applications, we often encounter the hybrid datasets which are composed of nominal and numerical attributes. To apply the learning algorithms, some attributes must be transformed into the appropriate data types. This procedure could change the nature of datasets. In this thesis, we propose a new approach, model trees, to integrate the available learning algorithms that have the different properties in nature to solve those cases. Here, we employ the decision trees as the classification framework and incorporate the support vector machines into the construction process of decision trees to replace the discretization procedure and to provide the multivariate decision. Finally, we perform experiments to show our purposed method has the better performance than other competing methods.
Jhang, Zao-Shih, and 張造時. "Variable Selection of Regression Trees and Node Model of Classification Trees." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/75500215236232826871.
Full textWang, Chien-Jen, and 王建仁. "An Automatic Classification Model for Electronic Commerce Websites Utilizing Decision Tree." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/68134476733931044580.
Full text國立勤益科技大學
企業管理系
97
Due to the rise of Web 2.0 and the increase of Internet users in Taiwan, e-commerce has begun to prevail again in Taiwan after the dot-com crisis. Now, more and more consumers are willing to shop online, resulting in a huge growth of online shopping websites. In practice, administrators of these websites seldom sufficiently analyze the consumer behavior of their members and classify them only by a few criteria. Without accurate classification of members, marketing cannot effectively reach potential consumers. As a result, members’ repurchase rate can hardly be improved, further affecting business performance. For administrators of online shopping websites, if they can quickly find core customers from the member database and adopt effective marketing strategies based on their attributes, they can enhance their sales performance, increase their market share, and double the effect of the adopted marketing strategies. The objectives of this study were as follows: 1. To explore the current member management mechanisms and development of e-commerce in Taiwan. 2. To use decision tree to construct an automatic member classification model for e-commerce service providers in Taiwan. 3. To investigate the classification accuracy of the proposed model. Based on member attributes of a simulated member database, such as hours of using Internet, expenditure, and shopping frequency, four e-commerce member classification models were constructed using decision tree C4.5 algorithm. The models were further modified to enhance the efficiency of automatic classification. These models could assist e-commerce service providers to automatically classify their members and more efficiently find core customer groups, so as to set up effective marketing strategies and thus enhance their competitiveness. The research findings included: 1. Through application of decision tree, important attributes could be derived to classify members of e-commerce websites. 2. An automatic member classification model could be constructed by using decision tree. 3. Through trimming of the decision tree, complex of the decision rules could be reduced, but accuracy of the classification model could be decrease.
Lin, Guan-An, and 林冠安. "Sources of Volatility in Stock Returns: Application of Classification and Regression Tree Model." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/f6nb24.
Full text輔仁大學
金融與國際企業學系金融碩士班
104
This paper researches the relationship between the volatility of TAIEX's returns and the variables in macroeconomics、stock market and investors sentiment. We use the classification and regression tree by Breiman et. al in 1982 to find the key variables to impact the volatility of stock returns.Then further explore is to compare the result of regression analysis by using the regression tree's factor and all of the variables in this research.The data which we used is from January, 2001 to December, 2015 in Taiwan and total of 180 data. We use the autoregressive and moving average model to analyze the monthly stock returns and then take the residuals as the proxy of the volatility. The empirical results have two parts. Firstly, the most important variable which is selected from the regression tree is dividend yield and the second is gold returns. And the macroeconomic variable has the best performance in the three variables. Secondly, business cycle and interest rate are selected as the first two key variables from regression tree. And compared with traditional linear regression considering all variables, it is better to establish linear regression with the variables which is selected by CART.