Dissertations / Theses on the topic 'Logistic regression'

To see the other types of publications on this topic, follow the link: Logistic regression.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Logistic regression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Kazemi, Seyed Mehran. "Relational logistic regression." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/50091.

Full text
Abstract:
Aggregation is a technique for representing conditional probability distributions as an analytic function of parents. Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this thesis, we consider extending logistic regression to directed relational models, where there are objects and relations among them, and we want to model varying populations and interactions among parents. We first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. We compare our relational logistic regression to Markov logic networks and represent their analogies and differences. Finally, we show how other well-known aggregators can be represented using relational logistic regression.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
2

Nargis, Suraiya, and n/a. "Robust methods in logistic regression." University of Canberra. Information Sciences & Engineering, 2005. http://erl.canberra.edu.au./public/adt-AUC20051111.141200.

Full text
Abstract:
My Masters research aims to deepen our understanding of the behaviour of robust methods in logistic regression. Logistic regression is a special case of Generalized Linear Modelling (GLM), which is a powerful and popular technique for modelling a large variety of data. Robust methods are useful in reducing the effect of outlying values in the response variable on parameter estimates. A literature survey shows that we are still at the beginning of being able to detect extreme observations in logistic regression analyses, to apply robust methods in logistic regression and to present informatively the results of logistic regression analyses. In Chapter 1 I have made a basic introduction to logistic regression, with an example, and to robust methods in general. In Chapters 2 through 4 of the thesis I have described traditional methods and some relatively new methods for presenting results of logistic regression using powerful visualization techniques as well as the concepts of outliers in binomial data. I have used different published data sets for illustration, such as the Prostate Cancer data set, the Damaged Carrots data set and the Recumbent Cow data set. In Chapter 4 I summarize and report on the modem concepts of graphical methods, such as central dimension reduction, and the use of graphics as pioneered by Cook and Weisberg (1999). In Section 4.6 I have then extended the work of Cook and Weisberg to robust logistic regression. In Chapter 5 I have described simulation studies to investigate the effects of outlying observations on logistic regression (robust and non-robust). In Section 5.2 I have come to the conclusion that, in the case of classical or robust multiple logistic regression with no outliers, robust methods do not necessarily provide more reasonable estimates of the parameters for the data that contain no st~ong outliers. In Section 5.4 I have looked into the cases where outliers are present and have come to the conclusion that either the breakdown method or a sensitivity analysis provides reasonable parameter estimates in that situation. Finally, I have identified areas for further study.
APA, Harvard, Vancouver, ISO, and other styles
3

Rashid, Mamunur. "Inference on Logistic Regression Models." Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1214165101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Williams, Ulyana P. "On Some Ridge Regression Estimators for Logistic Regression Models." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Full text
Abstract:
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
APA, Harvard, Vancouver, ISO, and other styles
5

Mak, Carmen. "Polychotomous logistic regression via the Lasso." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0004/NQ41227.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Yin. "Application of logistic regression in biostatistics." Thesis, McGill University, 1993. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68201.

Full text
Abstract:
The primary objective of this paper is a focused introduction to the logistic regression model and its use in methods for modeling the relationship between a dichotomous outcome variable and a set of covariates. The approach we will take is to develop the model from a regression analysis point of view. Also in this paper, an estimator of the common odds ratio in one-to-one matched case-control studies is proposed. The connection between this estimator and the James-Stein estimating procedure is highlighted through the argument of estimating functions. Comparisons are made between this estimator, the conditional maximum likelihood estimator, and the estimator ignoring the matching.
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Sarraf, Z. J. "Some problems connected with logistic regression." Thesis, Brunel University, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.374301.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Olsén, Johan. "Logistic regression modelling for STHR analysis." Thesis, KTH, Matematisk statistik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-148971.

Full text
Abstract:
Coronary artery heart disease (CAD) is a common condition which can impair the quality of life and lead to cardiac infarctions. Traditional criteria during exercise tests are good but far from perfect. A lot of patients with inconclusive tests are referred to radiological examinations. By finding better evaluation criteria during the exercise test we can save a lot of money and let the patients avoid unnecessary examinations. Computers record amounts of numerical data during the exercise test. In this retrospective study 267 patients with inconclusive exercise test and performed radiological examinations were included. The purpose was to use clinical considerations as-well as mathematical statistics to be able to find new diagnostic criteria. We created a few new parameters and evaluated them together with previously used parameters. For women we found some interesting univariable results where new parameters discriminated better than the formerly used. However, the number of females with observed CAD was small (14) which made it impossible to obtain strong significance. For men we computed a multivariable model, using logistic regression, which discriminates way better than the traditional parameters for these patients. The area under the ROC curve was 0:90 (95 % CI: 0.83-0.97) which is excellent to outstanding discrimination in a group initially included due to their inconclusive results. If the model can be proved to hold for another population it could contribute a lot to the diagnostics of this common medical conditions
APA, Harvard, Vancouver, ISO, and other styles
9

Batchelor, John Stephen. "Trauma scoring models using logistic regression." Thesis, University College London (University of London), 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

MOREIRA, RODRIGO PINTO. "SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2008. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=13437@1.

Full text
Abstract:
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
FUNDAÇÃO DE APOIO À PESQUISA DO ESTADO DO RIO DE JANEIRO
Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos parâmetros da parte linear passa a ser feita por Máxima Verossimilhança. Assim o modelo, que é paramétrico não-linear e estruturado por árvore de decisão, onde cada nó terminal representa um regime os quais têm seus parâmetros estimados da mesma forma que em uma Regressão Logística, é denominado Smooth Transition Logistic Regression-Tree (STLR-Tree). A inclusão dos regimes, determinada pela divisão dos nós da árvore, é feita baseada em testes do tipo Multiplicadores de Lagrange, que em sua forma para o caso Gaussiano utiliza a Soma dos Quadrados dos Resíduos em suas estatísticas de teste, aqui são substituídas pela Função Desvio (Deviance), que é equivalente para o caso dos modelos não Gaussianos, cuja distribuição da variável dependente pertença à família exponencial. Na aplicação a dados reais selecionou-se dois conjuntos das variáveis explicativas de cada uma das duas bases utilizadas, que resultaram nas melhores taxas de acerto, verificadas através de Tabelas de Classificação (Matrizes de Confusão). Esses conjuntos de variáveis foram usados com outros métodos de classificação existentes, são eles: Generalized Additive Models (GAM), Regressão Logística, Redes Neurais, Análise Discriminante, k-Nearest Neighbor (K-NN) e Classification and Regression Trees (CART).
The main goal of this work is to adapt the STR-Tree model, which is the combination of a Smooth Transition with Regression model with Classi cation and Regression Tree (CART), in order to use it in Classification. Some changes were made in its structural form and in the estimation. Due to the fact we are doing binary dependent variables classification, is necessary to use the techniques employed in Logistic Regression, so the estimation of the linear part will be made by Maximum Likelihood. Thus the model, which is nonlinear parametric and structured by a decision tree, where each terminal node represents a regime that have their parameters estimated in the same way as in a Logistic Regression, is called Smooth Transition Logistic Regression Tree (STLR-Tree). The inclusion of the regimes, determined by the splitting of the tree's nodes, is based on Lagrange Multipliers tests, which for the Gaussian cases uses the Residual Sum-of-squares in their test statistic, here are replaced by the Deviance function, which is equivalent to the case of non-Gaussian models, that has the distribution of the dependent variable in the exponential family. After applying the model in two datasets chosen from the bibliography comparing with other methods of classi cation such as: Generalized Additive Models (GAM), Logistic Regression, Neural Networks, Discriminant Analyses, k-Nearest Neighbor (k-NN) and Classification and Regression Trees (CART). It can be seen, verifying in the Classification Tables (Confusion Matrices) that STLR-Tree showed the second best result for the overall rate of correct classification in three of the four applications shown, being in all of them, behind only from GAM.
APA, Harvard, Vancouver, ISO, and other styles
11

Emfevid, Lovisa, and Hampus Nyquist. "Financial Risk Profiling using Logistic Regression." Thesis, KTH, Matematisk statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229821.

Full text
Abstract:
As automation in the financial service industry continues to advance, online investment advice has emerged as an exciting new field. Vital to the accuracy of such service is the determination of the individual investors’ ability to bear financial risk. To do so, the statistical method of logistic regression is used. The aim of this thesis is to identify factors which are significant in determining a financial risk profile of a retail investor. In other words, the study seeks to map out the relationship between several socioeconomic- and psychometric variables to develop a predictive model able to determine the risk profile. The analysis is based on survey data from respondents living in Sweden. The main findings are that variables such as income, consumption rate, experience of a financial bear market, and various psychometric variables are significant in determining a financial risk profile.
I samband med en ökad automatiseringstrend har digital investeringsrådgivning dykt upp som ett nytt fenomen. Av central betydelse är tjänstens förmåga att bedöma en investerares förmåga till att bära finansiell risk. Logistik regression tillämpas för att bedöma en icke- professionell investerares vilja att bära finansiell risk. Målet med uppsatsen är således att identifiera ett antal faktorer med signifikant förmåga till att bedöma en icke-professionell investerares riskprofil. Med andra ord, så syftar denna uppsats till att studera förmågan hos ett antal socioekonomiska- och psykometriska variabler. För att därigenom utveckla en prediktiv modell som kan skatta en individs finansiella riskprofil. Analysen genomförs med hjälp av en enkätstudie hos respondenter bosatta i Sverige. Den huvudsakliga slutsatsen är att en individs inkomst, konsumtionstakt, tidigare erfarenheter av abnorma marknadsförhållanden, och diverse psykometriska komponenter besitter en betydande förmåga till att avgöra en individs finansiella risktolerans
APA, Harvard, Vancouver, ISO, and other styles
12

Lo, Sau Yee. "Measurement error in logistic regression model /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?MATH%202004%20LO.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 82-83). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
13

Widman, Linnea. "Regression då data utgörs av urval av ranger." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-60664.

Full text
Abstract:
För alpina skidåkare mäter man prestationer i så kallad FIS-ranking. Vi undersöker några metoder för hur man kan analysera data där responsen består av ranger som dessa. Vid situationer då responsdata utgörs av urval av ranger finns ingen självklar analysmetod. Det vi undersöker är skillnaderna vid användandet av olika regressionsanpassningar så som linjär, logistisk och ordinal logistisk regression för att analysera data av denna typ. Vidare används bootstrap för att bilda konfidensintervall. Det visar sig att för våra datamaterial ger metoderna liknande resultat när det gäller att hitta betydelsefulla förklarande variabler. Man kan därmed utgående från denna undersökning, inte se några skäl till varför man ska använda de mer avancerade modellerna.
Alpine skiers measure their performance in FIS ranking. We will investigate some methods on how to analyze data where response data is based on ranks like this. In situations where response data is based on ranks there is no obvious method of analysis. Here, we examine differences in the use of linear, logistic and ordinal logistic regression to analyze data of this type. Bootstrap is used to make confidence intervals. For our data these methods give similar results when it comes to finding important explanatory variables. Based on this survey we cannot see any reason why one should use the more advanced models.
APA, Harvard, Vancouver, ISO, and other styles
14

Thorleifsson, Alexander. "Stochastic Gradient Descent for Efficient Logistic Regression." Thesis, Stockholms universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-132366.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Webster, Gregg. "Bayesian logistic regression models for credit scoring." Thesis, Rhodes University, 2011. http://hdl.handle.net/10962/d1005538.

Full text
Abstract:
The Bayesian approach to logistic regression modelling for credit scoring is useful when there are data quantity issues. Data quantity issues might occur when a bank is opening in a new location or there is change in the scoring procedure. Making use of prior information (available from the coefficients estimated on other data sets, or expert knowledge about the coefficients) a Bayesian approach is proposed to improve the credit scoring models. To achieve this, a data set is split into two sets, “old” data and “new” data. Priors are obtained from a model fitted on the “old” data. This model is assumed to be a scoring model used by a financial institution in the current location. The financial institution is then assumed to expand into a new economic location where there is limited data. The priors from the model on the “old” data are then combined in a Bayesian model with the “new” data to obtain a model which represents all the available information. The predictive performance of this Bayesian model is compared to a model which does not make use of any prior information. It is found that the use of relevant prior information improves the predictive performance when the size of the “new” data is small. As the size of the “new” data increases, the importance of including prior information decreases
APA, Harvard, Vancouver, ISO, and other styles
16

Hallett, David C. "Goodness of fit tests in logistic regression." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/mq45403.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Wang, Jie. "Incorporating survey weights into logistic regression models." Digital WPI, 2013. https://digitalcommons.wpi.edu/etd-theses/267.

Full text
Abstract:
Incorporating survey weights into likelihood-based analysis is a controversial issue because the sampling weights are not simply equal to the reciprocal of selection probabilities but they are adjusted for various characteristics such as age, race, etc. Some adjustments are based on nonresponses as well. This adjustment is accomplished using a combination of probability calculations. When we build a logistic regression model to predict categorical outcomes with survey data, the sampling weights should be considered if the sampling design does not give each individual an equal chance of being selected in the sample. We rescale these weights to sum to an equivalent sample size because the variance is too small with the original weights. These new weights are called the adjusted weights. The old method is to apply quasi-likelihood maximization to make estimation with the adjusted weights. We develop a new method based on the correct likelihood for logistic regression to include the adjusted weights. In the new method, the adjusted weights are further used to adjust for both covariates and intercepts. We explore the differences and similarities between the quasi-likelihood and the correct likelihood methods. We use both binary logistic regression model and multinomial logistic regression model to estimate parameters and apply the methods to body mass index data from the Third National Health and Nutrition Examination Survey. The results show some similarities and differences between the old and new methods in parameter estimates, standard errors and statistical p-values.
APA, Harvard, Vancouver, ISO, and other styles
18

Raner, Max. "On logistic regression and a medical application." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-420680.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Hu, ChungLynn. "Nonignorable nonresponse in the logistic regression analysis /." The Ohio State University, 1998. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487950153601414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Salaro, Rossana <1994&gt. "Multinomial Logistic Regression with High Dimensional Data." Master's Degree Thesis, Università Ca' Foscari Venezia, 2018. http://hdl.handle.net/10579/13814.

Full text
Abstract:
This thesis investigates multinomial logistic regression in presence of high-dimensional data. Multinomial logistic regression has been widely used to model categorical data in a variety of fields, including health, physical and social sciences. In this thesis we apply to multinomial logistic regression three different kind of dimensionality reduction techniques, namely ridge regression, lasso and principal components regression. These methods reduce the dimensions of the design matrix used to build the multinomial logistic regression model by selecting those explanatory variables that most affect the response variable. We carry out an extensive simulation study to compare and contrast the three reduction methods. Moreover, we illustrate the multinomial regression model on different case studies that allow to highlight benefits and limits of the different approaches.
APA, Harvard, Vancouver, ISO, and other styles
21

Ishikawa, Noemi Ichihara. "Uso de transformações em modelos de regressão logística." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-05062007-202656/.

Full text
Abstract:
Modelos para dados binários são bastante utilizados em várias situações práticas. Transformações em Análise de Regressão podem ser aplicadas para linearizar ou simplificar o modelo e também para corrigir desvios de suposições. Neste trabalho, descrevemos o uso de transformações nos modelos de regressão logística para dados binários e apresentamos modelos envolvendo parâmetros adicionais de modo a obter um ajuste mais adequado. Posteriormente, analisamos o custo da estimação quando são adicionados parâmetros aos modelos e apresentamos os testes de hipóteses relativos aos parâmetros do modelo de regressão logística de Box-Cox. Finalizando, apresentamos alguns métodos de diagnóstico para avaliar a influência das observações nas estimativas dos parâmetros de transformação da covariável, com aplicação a um conjunto de dados reais.
Binary data models have a lot of utilities in many practical situations. In Regrssion Analisys, transformations can be applied to linearize or simplify the model and correct deviations of the suppositions. In this dissertation, we show the use of the transformations in logistic models to binary data models and models involving additional parameters to obtain more appropriate fits. We also present the cost of the estimation when parameters are added to models, hypothesis tests of the parameters in the Box-Cox logistic regression model and finally, diagnostics methods to evaluate the influence of the observations in the estimation of the transformation covariate parameters with their applications to a real data set.
APA, Harvard, Vancouver, ISO, and other styles
22

Pan, Tianshu. "Using the multivariate multilevel logistic regression model to detect DIF a comparison with HGLM and logistic regression DIF detection methods /." Diss., Connect to online resource - MSU authorized users, 2008.

Find full text
Abstract:
Thesis (PH. D.)--Michigan State University. Measurement and Quantitative Methods, 2008.
Title from PDF t.p. (viewed on Sept. 8, 2009) Includes bibliographical references (p. 85-89). Also issued in print.
APA, Harvard, Vancouver, ISO, and other styles
23

Kim, Hyun Sun. "Topics in ordinal logistic regression and its applications." Diss., Texas A&M University, 2004. http://hdl.handle.net/1969.1/1120.

Full text
Abstract:
Sample size calculation methods for ordinal logistic regression are proposed to test statistical hypotheses. The author was motivated to do this work by the need for statistical analysis of the red imported fire ants data. The proposed methods use the concept of approximation by the moment-generating function. Some correction methods are also suggested. When a prior data set is available, an empirical method is explored. Application of the proposed methodology to the fire ant mating flight data is demonstrated. The proposed sample size and power calculation methods are applied in the hypothesis testing problems. Simulation studies are also conducted to illustrate their performance and to compare them with existing methods.
APA, Harvard, Vancouver, ISO, and other styles
24

Birkenes, Øystein. "A Framework for Speech Recognition using Logistic Regression." Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599.

Full text
Abstract:

Although discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.

In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion.

Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.

A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.

APA, Harvard, Vancouver, ISO, and other styles
25

Liu, Ying. "On goodness-of-fit of logistic regression model." Diss., Manhattan, Kan. : Kansas State University, 2007. http://hdl.handle.net/2097/530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Roberts, Brook R. "Measuring NAVSPASUR sensor performance using logistic regression models." Thesis, Monterey, California. Naval Postgraduate School, 1992. http://hdl.handle.net/10945/23952.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Weng, Yu. "Maximum Likelihood Estimation of Logistic Sinusoidal Regression Models." Thesis, University of North Texas, 2013. https://digital.library.unt.edu/ark:/67531/metadc407796/.

Full text
Abstract:
We consider the problem of maximum likelihood estimation of logistic sinusoidal regression models and develop some asymptotic theory including the consistency and joint rates of convergence for the maximum likelihood estimators. The key techniques build upon a synthesis of the results of Walker and Song and Li for the widely studied sinusoidal regression model and on making a connection to a result of Radchenko. Monte Carlo simulations are also presented to demonstrate the finite-sample performance of the estimators
APA, Harvard, Vancouver, ISO, and other styles
28

Mo, Lijia. "Examining the reliability of logistic regression estimation software." Diss., Kansas State University, 2010. http://hdl.handle.net/2097/7059.

Full text
Abstract:
Doctor of Philosophy
Department of Agricultural Economics
Allen M. Featherstone
Bryan W. Schurle
The reliability of nine software packages using the maximum likelihood estimator for the logistic regression model were examined using generated benchmark datasets and models. Software packages tested included: SAS (Procs Logistic, Catmod, Genmod, Surveylogistic, Glimmix, and Qlim), Limdep (Logit, Blogit), Stata (Logit, GLM, Binreg), Matlab, Shazam, R, Minitab, Eviews, and SPSS for all available algorithms, none of which have been previously tested. This study expands on the existing literature in this area by examination of Minitab 15 and SPSS 17. The findings indicate that Matlab, R, Eviews, Minitab, Limdep (BFGS), and SPSS provided consistently reliable results for both parameter and standard error estimates across the benchmark datasets. While some packages performed admirably, shortcomings did exist. SAS maximum log-likelihood estimators do not always converge to the optimal solution and stop prematurely depending on starting values, by issuing a ``flat" error message. This drawback can be dealt with by rerunning the maximum log-likelihood estimator, using a closer starting point, to see if the convergence criteria are actually satisfied. Although Stata-Binreg provides reliable parameter estimates, there is no way to obtain standard error estimates in Stata-Binreg as of yet. Limdep performs relatively well, but did not converge due to a weakness of the algorithm. The results show that solely trusting the default settings of statistical software packages may lead to non-optimal, biased or erroneous results, which may impact the quality of empirical results obtained by applied economists. Reliability tests indicate severe weaknesses in SAS Procs Glimmix and Genmod. Some software packages fail reliability tests under certain conditions. The finding indicates the need to use multiple software packages to solve econometric models.
APA, Harvard, Vancouver, ISO, and other styles
29

Jun, Shi. "Frequentist Model Averaging For Functional Logistic Regression Model." Thesis, Uppsala universitet, Statistiska institutionen, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-352519.

Full text
Abstract:
Frequentist model averaging as a newly emerging approach provides us a way to overcome the uncertainty caused by traditional model selection in estimation. It acknowledges the contribution of multiple models, instead of making inference and prediction purely based on one single model. Functional logistic regression is also a burgeoning method in studying the relationship between functional covariates and a binary response. In this paper, the frequentist model averaging approach is applied to the functional logistic regression model. A simulation study is implemented to compare its performance with model selection. The analysis shows that when conditional probability is taken as the focus parameter, model averaging is superior to model selection based on BIC. When the focus parameter is the intercept and slopes, model selection performs better.
APA, Harvard, Vancouver, ISO, and other styles
30

Heise, Mark A. "Optimal designs for a bivariate logistic regression model." Diss., Virginia Tech, 1993. http://hdl.handle.net/10919/38538.

Full text
Abstract:
In drug-testing experiments the primary responses of interest are efficacy and toxicity. These can be modeled as a bivariate quantal response using the Gumbel model for bivariate logistic regression. D-optimal and Q-optimal experimental designs are developed for this model The Q-optimal design minimizes the average asymptotic prediction variance of p(l,O;d), the probability of efficacy without toxicity at dose d, over a desired range of doses. In addition, a new optimality criterion, T -optimality, is developed which minimizes the asymptotic variance of the estimate of the therapeutic index. Most experimenters will be less familiar with the Gumbel bivariate logistic regression model than with the univariate logistic regression models which comprise its marginals. Therefore, the optimal designs based on the Gumbel model are evaluated based on univariate logistic regression D-efficiencies; conversely, designs derived from the univariate logistic regression model are evaluated with respect to the Gumbel optimality criteria. Further practical considerations motivate an exploration of designs providing a maximum compromise between the three Gumbel-based criteria D, Q and T. Finally, 5-point designs which can be generated by fitted equations are proposed as a practical option for experimental use.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
31

Richmond, James Howard. "Bayesian Logistic Regression Models for Software Fault Localization." Case Western Reserve University School of Graduate Studies / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=case1326658577.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Byrne, Evan Michael. "Sparse Multinomial Logistic Regression via Approximate Message Passing." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1437416281.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Shi, Shujing. "Tuning Parameter Selection in L1 Regularized Logistic Regression." VCU Scholars Compass, 2012. http://scholarscompass.vcu.edu/etd/2940.

Full text
Abstract:
Variable selection is an important topic in regression analysis and is intended to select the best subset of predictors. Least absolute shrinkage and selection operator (Lasso) was introduced by Tibshirani in 1996. This method can serve as a tool for variable selection because it shrinks some coefficients to exact zero by a constraint on the sum of absolute values of regression coefficients. For logistic regression, Lasso modifies the traditional parameter estimation method, maximum log likelihood, by adding the L1 norm of the parameters to the negative log likelihood function, so it turns a maximization problem into a minimization one. To solve this problem, we first need to give the value for the parameter of the L1 norm, called tuning parameter. Since the tuning parameter affects the coefficients estimation and variable selection, we want to find the optimal value for the tuning parameter to get the most accurate coefficient estimation and best subset of predictors in the L1 regularized regression model. There are two popular methods to select the optimal value of the tuning parameter that results in a best subset of predictors, Bayesian information criterion (BIC) and cross validation (CV). The objective of this paper is to evaluate and compare these two methods for selecting the optimal value of tuning parameter in terms of coefficients estimation accuracy and variable selection through simulation studies.
APA, Harvard, Vancouver, ISO, and other styles
34

Blahut, Steven Albert. "Latent Class Logistic Regression with Complex Sample Survey Data." College Park, Maryland : University of Maryland, 2004. http://hdl.handle.net/1903/2000.

Full text
Abstract:
Thesis (Ph. D.) -- University of Maryland, College Park, 2004.
Thesis research directed by: Measurement, Statistics and Evaluation. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.
APA, Harvard, Vancouver, ISO, and other styles
35

Lund, Anton. "Two-Stage Logistic Regression Models for Improved Credit Scoring." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-160551.

Full text
Abstract:
This thesis has investigated two-stage regularized logistic regressions applied on the credit scoring problem. Credit scoring refers to the practice of estimating the probability that a customer will default if given credit. The data was supplied by Klarna AB, and contains a larger number of observations than many other research papers on credit scoring. In this thesis, a two-stage regression refers to two staged regressions were the some kind of information from the first regression is used in the second regression to improve the overall performance. In the best performing models, the first stage was trained on alternative labels, payment status at earlier dates than the conventional. The predictions were then used as input to, or to segment, the second stage. This gave a gini increase of approximately 0.01. Using conventional scorecutoffs or distance to a decision boundary to segment the population did not improve performance.
Denna uppsats har undersökt tvåstegs regulariserade logistiska regressioner för att estimera credit score hos konsumenter. Credit score är ett mått på kreditvärdighet och mäter sannolikheten att en person inte betalar tillbaka sin kredit. Data kommer från Klarna AB och innehåller fler observationer än mycket annan forskning om kreditvärdighet. Med tvåstegsregressioner menas i denna uppsats en regressionsmodell bestående av två steg där information från det första steget används i det andra steget för att förbättra den totala prestandan. De bäst presterande modellerna använder i det första steget en alternativ förklaringsvariabel, betalningsstatus vid en tidigare tidpunkt än den konventionella, för att segmentera eller som variabel i det andra steget. Detta gav en giniökning på approximativt 0,01. Användandet av enklare segmenteringsmetoder så som score-gränser eller avstånd till en beslutsgräns visade sig inte förbättra prestandan.
APA, Harvard, Vancouver, ISO, and other styles
36

Lin, Shan. "Simultaneous confidence bands for linear and logistic regression models." Thesis, University of Southampton, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

ROCHA, PEDRO ANTONIO CYRNE DA. "PUBLIC COMPANIES BANKRUPTCY PREDICTION IN BRAZIL WITH LOGISTIC REGRESSION." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2017. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30720@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
Desde a década de 1930, a tentativa de previsão de falência de empresas chama a atenção dos acadêmicos, e diversas técnicas já foram empregadas para o desenvolvimento de modelos preditivos compostos por variáveis financeiras, tais como análise estatística, modelos teóricos e de inteligência artificial. Posto isso, o referido estudo compõe um modelo de regressão logística para a previsão de falência de empresas de capital aberto no Brasil com um ano de antecedência. Para tal, apresenta uma revisão literária com as principais técnicas usadas na área, para fundamentar a escolha metodológica e as variáveis integrantes do estudo. Ademais, o modelo é testado com uma nova amostra; comparado com resultados obtidos através de outras técnicas e executado com dados anteriores a um ano do momento de falência - de tal forma que sua capacidade preditiva seja atestada.
Since the thirties, academicians try to forecast bankruptcy and have been applying several techniques, such as: statistical, artificial intelligence and theoretical using financial ratios to do so. Therefore, this study presents a logistic regression model to forecast public companies bankruptcy in Brazil one year before failure. Hence, it presents a literature review with the main models used so far in order to support its methodological choice and financial ratios applied. In addition, the model is tested with a new sample, compared with another techniques results and executed with data older than one year before failure, so its predictive capacity is attested.
APA, Harvard, Vancouver, ISO, and other styles
38

Wei, Jinglun. "Classification of Bone Cements Using Multinomial Logistic Regression Method." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/520.

Full text
Abstract:
Bone cement surgery is a new technique widely used in medical field nowadays. In this thesis I analyze 48 bone cement types using their content of 20 elements. My goal is to ?find a method to classify new found bone cement sample into these 48 categories. Here I will use multinomial logistic regression method to see whether it works or not. Due to the lack of observations, I generate enough data by adding white noise in proper scales to the original data again and again, and then I get a data set of over 100 times as many points as the original one. Then I use purposeful variable selection method to pick the covariates I need, rather than stepwise selection. There are 15 covariates left after the selection, and then I use my new data set to fit such a multinomial logistic regression model. The model doesn't perform that good in goodness of ?fit test, but the result is still acceptable, and the diagnostic statistics also indicate a good performance. Combined with clinical experience and prior conditions, this model is helpful in this classification case.
APA, Harvard, Vancouver, ISO, and other styles
39

Badi, Nuri H. Salem. "Mis-specification and goodness-of-fit in logistic regression." Thesis, University of Newcastle upon Tyne, 2014. http://hdl.handle.net/10443/2376.

Full text
Abstract:
The logistic regression model has become a standard model for binary outcomes in many areas of application and is widely used in medical statistics. Much work has been carried out to examine the asymptotic behaviour of the distribution of Maximum Likelihood Estimates (MLE) for the logistic regression model, although the most widely known properties apply only if the assumed model is correct. There has been much work on goodness-of- t tests to address the last point. The rst part of this thesis investigates the behaviour of the asymptotic distribution of the (MLE) under a form of model mis-speci cation, namely when covariates from the true model are omitted from the tted model. When the incorrect model is tted the maximum likelihood estimates converge to the least false values. In this work, key integrals cannot be evaluated explicitly but we use properties of the skew-Normal distribution and the approximation of the Logit by a suitable Probit function to obtain a good approximation for the least false values. The second part of the thesis investigates the assessment of a particular goodness-of- t test namely the information matrix test (IM) test as applied to binary data models. Kuss (2002), claimed that the IM test has reasonable power compared with other statistics. In this part of the thesis we investigate this claim, consider the distribution of the moments of the IM statistic and the asymptotic distribution of the IM test (IMT) statistic. We had di culty in reproducing the results claimed by Kuss (2002) and considered that this was probably due to the near singularity of the variance of IMT. We de ne a new form of the IMT statistic, IMTR, which addresses this issue.
APA, Harvard, Vancouver, ISO, and other styles
40

Namburi, Sruthi. "Logistic regression with conjugate gradient descent for document classification." Kansas State University, 2016. http://hdl.handle.net/2097/32658.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
William H. Hsu
Logistic regression is a model for function estimation that measures the relationship between independent variables and a categorical dependent variable, and by approximating a conditional probabilistic density function using a logistic function, also known as a sigmoidal function. Multinomial logistic regression is used to predict categorical variables where there can be more than two categories or classes. The most common type of algorithm for optimizing the cost function for this model is gradient descent. In this project, I implemented logistic regression using conjugate gradient descent (CGD). I used the 20 Newsgroups data set collected by Ken Lang. I compared the results with those for existing implementations of gradient descent. The conjugate gradient optimization methodology outperforms existing implementations.
APA, Harvard, Vancouver, ISO, and other styles
41

Lindroth, Henriksson Amelia, and Simon Koller. "Logistic Regression Analysis of Patent Approval Rate in Sweden." Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230143.

Full text
Abstract:
This thesis was conducted to investigate what factors impact the outcome of a patent application for the Swedish market. The method used was logistic regression and the data was extracted from the database of The Swedish Patent and Registration Offi ce, PRV. The analysis in this thesis started with 47 covariates, including the 35 IPO technical fields, resulting in a model consisting of five covariates. The most important covariates were determined to be the number of notices issued by PRV, whether or not a patent attorney was used and applicant type. The number of notices had a positive impact on the probability of the success of a patent application. Being a company and hiring a patent attorney also increase the chances of the patent being granted. The derived final model showed a high predictive ability and provides insight of significant factors of a successful patent application.
Denna avhandling utfördes för att undersöka vilka faktorer som påverkar utfallen av patentansökningar för den svenska marknaden. Metoden som användes var logistisk re- gression, och datan är hämtad från Patent- och Registreringsverkets, PRVs, databas. Analysen i avhandlingen utfördes på 47 kovariat, inklusive IPOs 35 teknikområden. Detta resulterade i en modell som består av fem kovariat. De viktigaste kovariaten beräknades vara antalet skick mellan PRV och sökanden, huruvida man nyttjat sig av ett patentombud eller ej samt om sökande var en privatperson eller juridisk person. Antalet skick hade en positiv påverkan på sannolikheten för en godkänd patentansökan. Företag och sökanden som använde sig av ett patentombud hade också högre sannolikhet att få sina patent godkända. Den härledda slutgiltiga modellen visade sig ha hög förutsägningsförmåga och ger en insikt om signifikanta faktorer för en framgångsrik patentansökan.
APA, Harvard, Vancouver, ISO, and other styles
42

SINGH, KEVIN. "Comparing Variable Selection Algorithms On Logistic Regression – A Simulation." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-446090.

Full text
Abstract:
When we try to understand why some schools perform worse than others, if Covid-19 has struck harder on some demographics or whether income correlates with increased happiness, we may turn to regression to better understand how these variables are correlated. To capture the true relationship between variables we may use variable selection methods in order to ensure that the variables which have an actual effect have been included in the model. Choosing the right model for variable selection is vital. Without it there is a risk of including variables which have little to do with the dependent variable or excluding variables that are important. Failing to capture the true effects would paint a picture disconnected from reality and it would also give a false impression of what reality really looks like. To mitigate this risk a simulation study has been conducted to find out what variable selection algorithms to apply in order to make more accurate inference. The different algorithms being tested are stepwise regression, backward elimination and lasso regression. Lasso performed worst when applied to a small sample but performed best when applied to larger samples. Backward elimination and stepwise regression had very similar results.
APA, Harvard, Vancouver, ISO, and other styles
43

Jia, Yan. "Optimal experimental designs for two-variable logistic regression models." Diss., This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-06062008-152028/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Brabcová, Hana. "Využití logistické regrese ve výzkumu trhu." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-19234.

Full text
Abstract:
The aim of this work is to decide the real usage of logistic regression in the market research tasks respecting the needs of final users of research results. The main argument for the final decision is the comparison of its output to the output of an alternative classification method used in practice -- a classification tree method. The topic is divided into three parts. The first part describes the theoretical framework and approaches linked to logistic regression (chapter 2 and 3). The second part analyses the experience with the usage of logistic regression in Czech market research companies (chapter 4) and the topic is closed by applying the method on real data and comparing the output to the classification tree output (chapter 5 and 6).
APA, Harvard, Vancouver, ISO, and other styles
45

Guo, Ruijuan. "Sample comparisons using microarrays: - Application of False Discovery Rate and quadratic logistic regression." Digital WPI, 2008. https://digitalcommons.wpi.edu/etd-theses/28.

Full text
Abstract:
In microarray analysis, people are interested in those features that have different characters in diseased samples compared to normal samples. The usual p-value method of selecting significant genes either gives too many false positives or cannot detect all the significant features. The False Discovery Rate (FDR) method controls false positives and at the same time selects significant features. We introduced Benjamini's method and Storey's method to control FDR, applied the two methods to human Meningioma data. We found that Benjamini's method is more conservative and that, after the number of the tests exceeds a threshold, increase in number of tests will lead to decrease in number of significant genes. In the second chapter, we investigate ways to search interesting gene expressions that cannot be detected by linear models as t-test or ANOVA. We propose a novel approach to use quadratic logistic regression to detect genes in Meningioma data that have non-linear relationship within phenotypes. By using quadratic logistic regression, we can find genes whose expression correlates to their phenotypes both linearly and quadratically. Whether these genes have clinical significant is a very interesting question, since these genes most likely be neglected by traditional linear approach.
APA, Harvard, Vancouver, ISO, and other styles
46

Caster, Ola. "Mining the WHO Drug Safety Database Using Lasso Logistic Regression." Thesis, Uppsala University, Department of Mathematics, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-120981.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Ozturk, Olcay. "Bayesian Semiparametric Models For Nonignorable Missing Datamechanisms In Logistic Regression." Master's thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613241/index.pdf.

Full text
Abstract:
In this thesis, Bayesian semiparametric models for the missing data mechanisms of nonignorably missing covariates in logistic regression are developed. In the missing data literature, fully parametric approach is used to model the nonignorable missing data mechanisms. In that approach, a probit or a logit link of the conditional probability of the covariate being missing is modeled as a linear combination of all variables including the missing covariate itself. However, nonignorably missing covariates may not be linearly related with the probit (or logit) of this conditional probability. In our study, the relationship between the probit of the probability of the covariate being missing and the missing covariate itself is modeled by using a penalized spline regression based semiparametric approach. An efficient Markov chain Monte Carlo (MCMC) sampling algorithm to estimate the parameters is established. A WinBUGS code is constructed to sample from the full conditional posterior distributions of the parameters by using Gibbs sampling. Monte Carlo simulation experiments under different true missing data mechanisms are applied to compare the bias and efficiency properties of the resulting estimators with the ones from the fully parametric approach. These simulations show that estimators for logistic regression using semiparametric missing data models maintain better bias and efficiency properties than the ones using fully parametric missing data models when the true relationship between the missingness and the missing covariate has a nonlinear form. They are comparable when this relationship has a linear form.
APA, Harvard, Vancouver, ISO, and other styles
48

Kannanthanathu, Amal Francis. "Wavelet Transform and Ensemble Logistic Regression for Driver Drowsiness Detection." Thesis, California State University, Long Beach, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10639615.

Full text
Abstract:

Drowsy driving has become a serious concern over the last few decades. The rise in the number of automobiles as well as the stress and fatigue induced due to lifestyle factors have been major contributors to this problem. Accidents due to drowsy driving have caused innumerable deaths and losses to the state. Therefore, detecting drowsiness accurately and within a short period of time before it impairs the driver has become a major challenge. Previous researchers have found that the Electrocardiogram (ECG/EKG) is an important parameter to detect drowsiness. Incorporating machine learning (ML) algorithms like Logistic Regression (LR) can help in detecting drowsiness accurately to some extent. Accuracy in LR can be increased with a larger data set and more features for a robust machine learning model. However, having a larger dataset and more features increases detection time, which can be fatal if the driver is drowsy. Reducing the dataset size for faster detection causes the problem of overfitting, in which the model performs well with training data than test data.

In this thesis, we increased the accuracy, reduced detection time, and solved the problem of overfitting using a machine learning model based on Ensemble Logistic Regression (ELR). The ECG signal after filtering was first converted from the time domain to the frequency domain using Wavelet Transform (WT) instead of the traditional Short Term Fourier Transform (STFT). Frequency features were then extracted and an ensemble based logistic regression model was trained to detect drowsiness. The model was then tested on twenty-five male and female subjects who varied between 20 and 60 years of age. The results were compared with traditional methods for accuracy and detection time.

The model outputs the probability of drowsiness. Its accuracy is between 90% and 95% within a detection time of 20 to 30 seconds. A successful implementation of the above system can significantly reduce road accidents due to drowsy driving.

APA, Harvard, Vancouver, ISO, and other styles
49

Thompson, Gavin Kenneth. "Logistic regression for the modeling of low-dose radiation effects." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp03/MQ57164.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Truong, Alfred Kar Yin. "Fast growing and interpretable oblique trees via logistic regression models." Thesis, University of Oxford, 2009. http://ora.ox.ac.uk/objects/uuid:e0de0156-da01-4781-85c5-8213f5004f10.

Full text
Abstract:
The classification tree is an attractive method for classification as the predictions it makes are more transparent than most other classifiers. The most widely accepted approaches to tree-growth use axis-parallel splits to partition continuous attributes. Since the interpretability of a tree diminishes as it grows larger, researchers have sought ways of growing trees with oblique splits as they are better able to partition observations. The focus of this thesis is to grow oblique trees in a fast and deterministic manner and to propose ways of making them more interpretable. Finding good oblique splits is a computationally difficult task. Various authors have proposed ways of doing this by either performing stochastic searches or by solving problems that effectively produce oblique splits at each stage of tree-growth. A new approach to finding such splits is proposed that restricts attention to a small but comprehensive set of splits. Empirical evidence shows that good oblique splits are found in most cases. When observations come from a small number of classes, empirical evidence shows that oblique trees can be grown in a matter of seconds. As interpretability is the main strength of classification trees, it is important for oblique trees that are grown to be interpretable. As the proposed approach to finding oblique splits makes use of logistic regression, well-founded variable selection techniques are introduced to classification trees. This allows concise oblique splits to be found at each stage of tree-growth so that oblique trees that are more interpretable can be directly grown. In addition to this, cost-complexity pruning ideas which were developed for axis-parallel trees have been adapted to make oblique trees more interpretable. A major and practical component of this thesis is in providing the oblique.tree package in R that allows casual users to experiment with oblique trees in a way that was not possible before.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography