Dissertations / Theses on the topic 'Regression analysis'

To see the other types of publications on this topic, follow the link: Regression analysis.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Regression analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sullwald, Wichard. "Grain regression analysis." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/86526.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (MSc)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: Grain regression analysis forms an essential part of solid rocket motor simulation. In this thesis a numerical grain regression analysis module is developed as an alternative to cumbersome and time consuming analytical methods. The surface regression is performed by the level-set method, a numerical interface advancement scheme. A novel approach to the integration of the surface area and volume of a numerical interface, as defined implicitly in a level-set framework, by means of Monte-Carlo integration is proposed. The grain regression module is directly coupled to a quasi -1D internal ballistics solver in an on-line fashion, in order to take into account the effects of spatially varying burn rate distributions. A multi-timescale approach is proposed for the direct coupling of the two solvers.
AFRIKAANSE OPSOMMING: Gryn regressie analise vorm ’n integrale deel van soliede vuurpylmotor simulasie. In hierdie tesis word ’n numeriese gryn regressie analise model, as ’n alternatief tot dikwels omslagtige en tydrowende analitiese metodes, ontwikkel. Die oppervlak regressie word deur die vlak-set metode, ’n numeriese koppelvlak beweging skema uitgevoer. ’n Nuwe benadering tot die integrasie van die buite-oppervlakte en volume van ’n implisiete numeriese koppelvlak in ’n vlakset raamwerk, deur middel van Monte Carlo-integrasie word voorgestel. Die gryn regressie model word direk en aanlyn aan ’n kwasi-1D interne ballistiek model gekoppel, ten einde die uitwerking van ruimtelik-wisselende brand-koers in ag te neem. ’n Multi-tydskaal benadering word voorgestel vir die direkte koppeling van die twee modelle.
2

Dai, Elin, and Lara Güleryüz. "Factors that influence condominium pricing in Stockholm: A regression analysis : A regression analysis." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254235.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis aims to examine which factors that are of significance when forecasting the selling price of condominiums in Stockholm city. Through the use of multiple linear regression, response variable transformation, and a multitude of methods for refining the model fit, a conclusive, out of sample validated model with a confidence level of 95% was obtained. To conduct the statistical methods, the software R was used. This study is limited to the districts of inner city Stockholm with the postal codes 112-118, and the final model can only be applied to this area as the postal codes are included as regressors in the model. The time period in which the selling price was analyzed varied between January 2014 and April 2019, in which the volatility of the time value of money has not been taken into account for the time period. The final model included the following variables as the ones having an impact on the selling price: floor, living area, monthly fee, construction year, district of the city.
Denna studie ämnar till att undersöka vilka faktorer som är av betydelse när syftet är att förutsäga prissättningen på bostadsrätter i Stockholms innerstad. Genom att använda multipel linjär regression, transformation av responsvariabeln, samt en mängd olika metoder för att förfina modellen, togs en slutgiltig, out of sample-validerad modell med ett 95%-konfidensintervall fram. För att genomföra de statistiska metoderna användes programmet R. Denna studie är avgränsad till de distrikt i Stockholms innerstad vars postnummer varierar mellan 112-118, därav är det viktigt att modellen endast appliceras på dessa områden eftersom de är inkluderade i modellen som regressorer. Tidsperioden inom vilket slutpriserna analyserades var mellan januari 2014 och april 2019, i vilket valutans volatilitet inte har analyserats som en ekonomisk påverkande faktor. Den slutgiltiga modellen innefattar de följande variablerna: våning, boarea, månadsavgift, konstruktionsår, distrikt.
3

Zuo, Yanling. "Monotone regression functions." Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/29457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimates depend upon a "smoothing parameter": the number and location of knots in regression splines and the usual [formula omitted] in smoothing splines. Two standard techniques for choosing the smoothing parameter, GCV and AIC, are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets.
Science, Faculty of
Statistics, Department of
Graduate
4

Ryu, Duchwan. "Regression analysis with longitudinal measurements." Texas A&M University, 2005. http://hdl.handle.net/1969.1/2398.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Bayesian approaches to the regression analysis for longitudinal measurements are considered. The history of measurements from a subject may convey characteristics of the subject. Hence, in a regression analysis with longitudinal measurements, the characteristics of each subject can be served as covariates, in addition to possible other covariates. Also, the longitudinal measurements may lead to complicated covariance structures within each subject and they should be modeled properly. When covariates are some unobservable characteristics of each subject, Bayesian parametric and nonparametric regressions have been considered. Although covariates are not observable directly, by virtue of longitudinal measurements, the covariates can be estimated. In this case, the measurement error problem is inevitable. Hence, a classical measurement error model is established. In the Bayesian framework, the regression function as well as all the unobservable covariates and nuisance parameters are estimated. As multiple covariates are involved, a generalized additive model is adopted, and the Bayesian backfitting algorithm is utilized for each component of the additive model. For the binary response, the logistic regression has been proposed, where the link function is estimated by the Bayesian parametric and nonparametric regressions. For the link function, introduction of latent variables make the computing fast. In the next part, each subject is assumed to be observed not at the prespecifiedtime-points. Furthermore, the time of next measurement from a subject is supposed to be dependent on the previous measurement history of the subject. For this outcome- dependent follow-up times, various modeling options and the associated analyses have been examined to investigate how outcome-dependent follow-up times affect the estimation, within the frameworks of Bayesian parametric and nonparametric regressions. Correlation structures of outcomes are based on different correlation coefficients for different subjects. First, by assuming a Poisson process for the follow- up times, regression models have been constructed. To interpret the subject-specific random effects, more flexible models are considered by introducing a latent variable for the subject-specific random effect and a survival distribution for the follow-up times. The performance of each model has been evaluated by utilizing Bayesian model assessments.
5

Campbell, Ian. "The geometry of regression analysis." Thesis, University of Ottawa (Canada), 1989. http://hdl.handle.net/10393/5755.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Wiencierz, Andrea. "Regression analysis with imprecise data." Diss., Ludwig-Maximilians-Universität München, 2013. http://nbn-resolving.de/urn:nbn:de:bvb:19-166786.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Statistical methods usually require that the analyzed data are correct and precise observations of the variables of interest. In practice, however, often only incomplete or uncertain information about the quantities of interest is available. The question studied in the present thesis is, how a regression analysis can reasonably be performed when the variables are only imprecisely observed. At first, different approaches to analyzing imprecisely observed variables that were proposed in the Statistics literature are discussed. Then, a new likelihood-based methodology for regression analysis with imprecise data called Likelihood-based Imprecise Regression is introduced. The corresponding methodological framework is very broad and permits accounting for coarsening errors, in contrast to most alternative approaches to analyzing imprecise data. The methodology suggests considering as the result of a regression analysis the entire set of all regression functions that cannot be excluded in the light of the data, which can be interpreted as a confidence set. In the subsequent chapter, a very general regression method is derived from the likelihood-based methodology. This regression method does not impose restrictive assumptions about the form of the imprecise observations, about the underlying probability distribution, and about the shape of the relationship between the variables. Moreover, an exact algorithm is developed for the special case of simple linear regression with interval data and selected statistical properties of this regression method are studied. The proposed regression method turns out to be robust in terms of a high breakdown point and to provide very reliable insights in the sense of a set-valued result with a high coverage probability. In addition, an alternative approach proposed in the literature based on Support Vector Regression is studied in detail and generalized by embedding it into the framework of the formerly introduced likelihood-based methodology. In the end, the discussed regression methods are applied to two practical questions.
Methoden der statistischen Datenanalyse setzen in der Regel voraus, dass die vorhandenen Daten präzise und korrekte Beobachtungen der untersuchten Größen sind. Häufig können aber bei praktischen Studien die interessierenden Werte nur unvollständig oder unscharf beobachtet werden. Die vorliegende Arbeit beschäftigt sich mit der Fragestellung, wie Regressionsanalysen bei unscharfen Daten sinnvoll durchgeführt werden können. Zunächst werden verschiedene Ansätze zum Umgang mit unscharf beobachteten Variablen diskutiert, bevor eine neue Likelihood-basierte Methodologie für Regression mit unscharfen Daten eingeführt wird. Als Ergebnis der Regressionsanalyse wird bei diesem Ansatz keine einzelne Regressionsfunktion angestrebt, sondern die gesamte Menge aller anhand der Daten plausiblen Regressionsfunktionen betrachtet, welche als Konfidenzbereich für den untersuchten Zusammenhang interpretiert werden kann. Im darauffolgenden Kapitel wird im Rahmen dieser Methodologie eine Regressionsmethode entwickelt, die sehr allgemein bezüglich der Form der unscharfen Beobachtungen, der möglichen Verteilungen der Zufallsgrößen sowie der Form des funktionalen Zusammenhangs zwischen den untersuchten Variablen ist. Zudem werden ein exakter Algorithmus für den Spezialfall der linearen Einfachregression mit Intervalldaten entwickelt und einige statistische Eigenschaften der Methode näher untersucht. Dabei stellt sich heraus, dass die entwickelte Regressionsmethode sowohl robust im Sinne eines hohen Bruchpunktes ist, als auch sehr verlässliche Erkenntnisse hervorbringt, was sich in einer hohen Überdeckungswahrscheinlichkeit der Ergebnismenge äußert. Darüber hinaus wird in einem weiteren Kapitel ein in der Literatur vorgeschlagener Alternativansatz ausführlich diskutiert, der auf Support Vector Regression aufbaut. Dieser wird durch Einbettung in den methodologischen Rahmen des vorher eingeführten Likelihood-basierten Ansatzes weiter verallgemeinert. Abschließend werden die behandelten Regressionsmethoden auf zwei praktische Probleme angewandt.
7

Jeffrey, Stephen Glenn. "Quantile regression and frontier analysis." Thesis, University of Warwick, 2012. http://wrap.warwick.ac.uk/47747/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In chapter 3, quantile regression is used to estimate probabilistic frontiers, i.e. frontiers based on the probability of being dominated. The results from the empirical application using an Italian hotel dataset show rejections of a parametric functional form and a location shift effect, large uncertainty of the estimates of the frontier and wide confidence intervals for the estimates of efficiency. Quantile regression is further developed to estimate thick probabilistic frontiers, i.e. frontiers based on a group of efficient firms. The empirical results show that the differences between the inefficient and efficient firms at lower quantiles of the conditional distribution function are from the coefficient (85 percent of the total effect) and the residual effects (25 percent) and at higher quantiles from the coefficient (68 percent) and the regressor effects (22 percent). The results from the Monte Carlo simulations in chapter 4 show that under the correctly assumed stochastic frontier models, the probabilistic frontiers can have the lowest bias and mean squared error of the efficiency estimates. When outliers or location-scale shift effects are included, more preference is towards the probabilistic frontiers. The nonparametric probabilistic frontiers are nearly always preferable to Data Envelopment Analysis and Free Disposable Hull. In chapter 5, a fixed effects quantile regression estimator is used to estimate a cost frontier and efficiency levels for a panel dataset of English NHS Trusts. Waiting times elasticities are estimated from -0.14 to 0.17 in the cross-sectional models and -0.008 to 0.03 in the panel models. Cost minimisation ranged from 33 to 60 days in the cross-sectional model and from 37 to 54 days in the panel model. The results show that the effects of the inputs and control variables vary depending on the efficiency of the Trusts. The efficiency estimates reveal very different conclusions depending on the model choice.
8

Ranganai, Edmore. "Aspects of model development using regression quantiles and elemental regressions." Thesis, Stellenbosch : Stellenbosch University, 2007. http://hdl.handle.net/10019.1/18668.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dissertation (PhD)--University of Stellenbosch, 2007.
ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major data aberrations in the design space are collinearity and high leverage. Leverage points can also induce or hide collinearity in the design space. Such leverage points are referred to as collinearity influential points. As a consequence, over the years, many diagnostic tools to detect these anomalies as well as alternative procedures to counter them were developed. To counter deviations from the classical Gaussian assumptions many robust procedures have been proposed. One such class of procedures is the Koenker and Bassett (1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the linear model. RQs can be found as solutions to linear programming problems (LPs). The basic optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES) regressions, which consist of subsets of minimum size to estimate the necessary parameters of the model. On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown that many OLS statistics (estimators) are related to ES regression statistics (estimators). Therefore there is an inherent relationship amongst the three sets of procedures. The relationship between the ES procedure and the RQ one, has been noted almost “casually” in the literature while the latter has been fairly widely explored. Using these existing relationships between the ES procedure and the OLS one as well as new ones, collinearity, leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure was proposed as variable selection technique in the RQ scenario and some tentative results were given for it. These results are promising. Single case diagnostics were considered as well as their relationships to multiple case ones. In particular, multiple cases of the minimum size to estimate the necessary parameters of the model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were developed for both ESs and RQs. The main problems that affect RQs adversely are collinearity and leverage due to the nature of the computational procedures and the fact that RQs’ influence functions are unbounded in the design space but bounded in the response variable. As a consequence of this, RQs have a high affinity for leverage points and a high exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics were also considered in order to have a more holistic picture. The investigations used comprised analytic means as well as simulation. Furthermore, applications were made to artificial computer generated data sets as well as standard data sets from the literature. These revealed that the ES based statistics can be used to address problems arising in the RQ scenario to some degree of success. However, due to the interdependence between the different aspects, viz. the one between leverage and collinearity and the one between leverage and outliers, “solutions” are often dependent on the particular situation. In spite of this complexity, the research did produce some fairly general guidelines that can be fruitfully used in practice.
AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit deelversamelings van minimum grootte waarmee die parameters van die model beraam kan word. Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing. Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n meer holistiese beeld te verkry. Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings” dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die praktyk gebruik kan word.
9

Lo, Sau Yee. "Measurement error in logistic regression model /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?MATH%202004%20LO.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 82-83). Also available in electronic version. Access restricted to campus users.
10

Meless, Dejen. "Test Cycle Optimization using Regression Analysis." Thesis, Linköping University, Automatic Control, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-54809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:

Industrial robots make up an important part in today’s industry and are assigned to a range of different tasks. Needless to say, businesses need to rely on their machine park to function as planned, avoiding stops in production due to machine failures. This is where fault detection methods play a very important part. In this thesis a specific fault detection method based on signal analysis will be considered. When testing a robot for fault(s), a specific test cycle (trajectory) is executed in order to be able to compare test data from different test occasions. Furthermore, different test cycles yield different measurements to analyse, which may affect the performance of the analysis. The question posed is: Can we find an optimal test cycle so that the fault is best revealed in the test data? The goal of this thesis is to, using regression analysis, investigate how the presently executed test cycle in a specific diagnosis method relates to the faults that are monitored (in this case a so called friction fault) and decide if a different one should be recommended. The data also includes representations of two disturbances.

The results from the regression show that the variation in the test quantities utilised in the diagnosis method are not explained by neither the friction fault or the test cycle. It showed that the disturbances had too large effect on the test quantities. This made it impossible to recommend a different (optimal) test cycle based on the analysis.

11

Lee, Ho-Jin. "Functional data analysis: classification and regression." Texas A&M University, 2004. http://hdl.handle.net/1969.1/2805.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Functional data refer to data which consist of observed functions or curves evaluated at a finite subset of some interval. In this dissertation, we discuss statistical analysis, especially classification and regression when data are available in function forms. Due to the nature of functional data, one considers function spaces in presenting such type of data, and each functional observation is viewed as a realization generated by a random mechanism in the spaces. The classification procedure in this dissertation is based on dimension reduction techniques of the spaces. One commonly used method is Functional Principal Component Analysis (Functional PCA) in which eigen decomposition of the covariance function is employed to find the highest variability along which the data have in the function space. The reduced space of functions spanned by a few eigenfunctions are thought of as a space where most of the features of the functional data are contained. We also propose a functional regression model for scalar responses. Infinite dimensionality of the spaces for a predictor causes many problems, and one such problem is that there are infinitely many solutions. The space of the parameter function is restricted to Sobolev-Hilbert spaces and the loss function, so called, e-insensitive loss function is utilized. As a robust technique of function estimation, we present a way to find a function that has at most e deviation from the observed values and at the same time is as smooth as possible.
12

Lu, Xuewen. "Semiparametric regression models in survival analysis." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape15/PQDD_0030/NQ27458.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Sulieman, Hana. "Parametric sensitivity analysis in nonlinear regression." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape15/PQDD_0004/NQ27858.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Olsén, Johan. "Logistic regression modelling for STHR analysis." Thesis, KTH, Matematisk statistik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-148971.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Coronary artery heart disease (CAD) is a common condition which can impair the quality of life and lead to cardiac infarctions. Traditional criteria during exercise tests are good but far from perfect. A lot of patients with inconclusive tests are referred to radiological examinations. By finding better evaluation criteria during the exercise test we can save a lot of money and let the patients avoid unnecessary examinations. Computers record amounts of numerical data during the exercise test. In this retrospective study 267 patients with inconclusive exercise test and performed radiological examinations were included. The purpose was to use clinical considerations as-well as mathematical statistics to be able to find new diagnostic criteria. We created a few new parameters and evaluated them together with previously used parameters. For women we found some interesting univariable results where new parameters discriminated better than the formerly used. However, the number of females with observed CAD was small (14) which made it impossible to obtain strong significance. For men we computed a multivariable model, using logistic regression, which discriminates way better than the traditional parameters for these patients. The area under the ROC curve was 0:90 (95 % CI: 0.83-0.97) which is excellent to outstanding discrimination in a group initially included due to their inconclusive results. If the model can be proved to hold for another population it could contribute a lot to the diagnostics of this common medical conditions
15

Jin, Yi. "Regression Analysis of University Giving Data." Digital WPI, 2007. https://digitalcommons.wpi.edu/etd-theses/1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This project analyzed the giving data of Worcester Polytechnic Institute's alumni and other constituents (parents, friends, neighbors, etc.) from fiscal year 1983 to 2007 using a two-stage modeling approach. Logistic regression analysis was conducted in the first stage to predict the likelihood of giving for each constituent, followed by linear regression method in the second stage which was used to predict the amount of contribution to be expected from each contributor. Box-Cox transformation was performed in the linear regression phase to ensure the assumption underlying the model holds. Due to the nature of the data, multiple imputation was performed on the missing information to validate generalization of the models to a broader population. Concepts from the field of direct and database marketing, like "score" and "lift", were also introduced in this report.
16

Li, Yi-Hwei. "Regression analysis of failure time data /." The Ohio State University, 1991. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487694702784082.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Kulich, Michal. "Additive hazards regression with incomplete covariate data /." Thesis, Connect to this title online; UW restricted, 1997. http://hdl.handle.net/1773/9562.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Agard, David B. "Robust inferential procedures applied to regression." Diss., This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-10132005-152518/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Burnham, Alison J. "Multivariate latent variable regression : modelling and estimation /." *McMaster only, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
20

鄧明基 and Ming-kei Tang. "Assessment of influence in multivariate regression." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31219949.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Tang, Ming-kei. "Assessment of influence in multivariate regression /." Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B19853658.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Mitchell, Napoleon. "Outliers and Regression Models." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc279029/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The mitigation of outliers serves to increase the strength of a relationship between variables. This study defined outliers in three different ways and used five regression procedures to describe the effects of outliers on 50 data sets. This study also examined the relationship among the shape of the distribution, skewness, and outliers.
23

Zhang, Zhigang. "Nonproportional hazards regression models for survival analysis /." free to MU campus, to others for purchase, 2004. http://wwwlib.umi.com/cr/mo/fullcit?p3144473.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Detwiler, Dana. "Microcomputer implementation of robust regression techniques." Master's thesis, This resource online, 1993. http://scholar.lib.vt.edu/theses/available/etd-03302010-020305/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

McGlothlin, Anna E. Stamey James D. Seaman John Weldon. "Logistic regression with misclassified response and covariate measurement error a Bayesian approach /." Waco, Tex. : Baylor University, 2007. http://hdl.handle.net/2104/5101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Liu, Hai Chan Kung-sik. "Semiparametric regression analysis of zero-inflated data." Iowa City : University of Iowa, 2009. http://ir.uiowa.edu/etd/308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Ratnasingam, Suthakaran. "Sequential Change-point Detection in Linear Regression and Linear Quantile Regression Models Under High Dimensionality." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu159050606401363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Li, Lingzhu. "Model checking for general parametric regression models." HKBU Institutional Repository, 2019. https://repository.hkbu.edu.hk/etd_oa/654.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Model checking for regressions has drawn considerable attention in the last three decades. Compared with global smoothing tests, local smoothing tests, which are more sensitive to high-frequency alternatives, can only detect local alternatives dis- tinct from the null model at a much slower rate when the dimension of predictor is high. When the number of covariates is large, nonparametric estimations used in local smoothing tests lack efficiency. Corresponding tests then have trouble in maintaining the significance level and detecting the alternatives. To tackle the issue, we propose two methods under high but fixed dimension framework. Further, we investigate a model checking test under divergent dimension, where the numbers of covariates and unknown parameters go divergent with the sample size n. The first proposed test is constructed upon a typical kernel-based local smoothing test using projection method. Employed by projection and integral, the resulted test statistic has a closed form that depends only on the residuals and distances of the sample points. A merit of the developed test is that the distance is easy to implement compared with the kernel estimation, especially when the dimension is high. Moreover, the test inherits some feature of local smoothing tests owing to its construction. Although it is eventually similar to an Integrated Conditional Moment test in spirit, it leads to a test with a weight function that helps to collect more information from the samples than Integrated Conditional Moment test. Simulations and real data analysis justify the powerfulness of the test. The second test, which is a synthesis of local and global smoothing tests, aims at solving the slow convergence rate caused by nonparametric estimation in local smoothing tests. A significant feature of this approach is that it allows nonparamet- ric estimation-based tests, under the alternatives, also share the merits of existing empirical process-based tests. The proposed hybrid test can detect local alternatives at the fastest possible rate like the empirical process-based ones, and simultane- ously, retains the sensitivity to high-frequency alternatives from the nonparametric estimation-based ones. This feature is achieved by utilizing an indicative dimension in the field of dimension reduction. As a by-product, we have a systematic study on a residual-related central subspace for model adaptation, showing when alterna- tive models can be indicated and when cannot. Numerical studies are conducted to verify its application. Since the data volume nowadays is increasing, the numbers of predictors and un- known parameters are probably divergent as sample size n goes to infinity. Model checking under divergent dimension, however, is almost uncharted in the literature. In this thesis, an adaptive-to-model test is proposed to handle the divergent dimen- sion based on the two previous introduced tests. Theoretical results tell that, to get the asymptotic normality of the parameter estimator, the number of unknown parameters should be in the order of o(n1/3). Also, as a spinoff, we demonstrate the asymptotic properties of estimations for the residual-related central subspace and central mean subspace under different hypotheses.
29

Kriner, Monika. "Survival Analysis with Multivariate adaptive Regression Splines." Diss., lmu, 2007. http://nbn-resolving.de/urn:nbn:de:bvb:19-73695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Carvalho, Renato de Souza. "Nonlinear regression application to well test analysis /." Access abstract and link to full text, 1993. http://0-wwwlib.umi.com.library.utulsa.edu/dissertations/fullcit/9416602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Wiencierz, Andrea [Verfasser]. "Regression analysis with imprecise data / Andrea Wiencierz." München : Verlag Dr. Hut, 2014. http://d-nb.info/1050331575/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Flogvall, Carl, and Stefan Nordenskjöld. "A regression analysis of NHL cap hits." Thesis, KTH, Matematik (Inst.), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-155194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This report is a study if a multi linear regression could be used to predict the cap hit of hockey forwards from the NHL. Data was collected during the 2010-2011, 2011-2012, and 2012-2013 seasons. The chosen variables were common hockey statistics and a few none hockey-related, like origin and age. The initial model was improved by removing insignicant covariates, detected by BIC-test and p-values. The final model consisted of 291 players and had an adjusted R2-value of 0,7820. Of the covariates, goals, assists and ice time had the biggest impact on a player's cap hit.
33

Bjartmar, Hylta Sanna, and Emma Lundquist. "Pricing Single Malt Whisky : A Regression Analysis." Thesis, KTH, Matematisk statistik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis examines the factors that affect the price of whisky. Multiple regression analysis is used to model the relationship between the identified covariates that are believed to impact the price of whisky. The optimal marketing strategy for whisky producers in the regions Islay and Campbeltown are discussed. This analysis is based on the Marketing Mix. Furthermore, a Porter’s five forces analysis, focusing on the regions Campeltown and Islay, is examined. Finally the findings are summarised in a marketing strategy recommendation for producers in the regions Campbeltown and Islay. The result from the regression analysis shows that the covariates alcohol content and regions are affecting price the most. The small regions Islay and Campbeltown, with few distilleries, have a strong positive impact on price while whisky from unspecified regions in Scotland have a negative impact on price. The alcohol content has a positive, non-linear, impact on price. The thesis concludes that the positive relationship between alcohol content and price not is due to the alcohol taxes in Sweden, but that customers are ready to pay more for a whisky with higher alcohol content. In addition, it concludes that small regions with a few distilleries result in a higher price on whisky. The origin and tradition of whisky have a significant impact on price and should thus be emphasised in the marketing strategy for these companies.
Denna kandidatuppsats undersöker de faktorer som påverkar priset på whisky. Multipel regressionsanalys används för att modellera  förhållandet mellan de identifierade variablerna som tros påverka priset på whisky. Vidare diskuteras den optimala marknadsföringsstrategi f ̈or whiskyproducenter i regionerna Islay och Campbeltown. Analysen baseras på en Marknadsmix-analys för whisky i Skottland. Detta följs av Porters femkraftsmodell med fokus på regionerna Islay och Campeltown. Slutligen sammanfattas resultaten i en rekommendation av marknadsföringsstrategi för producenter i regionerna Islay och Campbeltown. Resultatet från regressionsanalysen visar att kovariaterna alkoholhalt och regioner har störst påverkan på priset. De små regionerna Islay och Campbeltown, med få destillerier, har en stark positiv inverkan på priset. Whisky från ospecificerade regioner i Skottland har däremot en negativ inverkan. Alkoholhalten har en positiv, icke-linjär inverkan på priset. I kandidatuppsatsen dras slutsatsen att det positiva sambandet mellan alkohol och pris ej kan förklaras av Sveriges alkoholskatt, utan att kunder är redo att betala mer för en whisky med högre alkoholhalt. Vidare konstateras att små regioner med få destillerier resulterar i ett högre pris på whisky. Whiskyns ursprung och tradition har en stor inverkan på pris och bör därför betonas i marknadsföringen.
34

SILVA, Ana Hermínia Andrade e. "Essays on data transformation and regression analysis." Universidade Federal de Pernambuco, 2017. https://repositorio.ufpe.br/handle/123456789/24585.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Submitted by Alice Araujo (alice.caraujo@ufpe.br) on 2018-05-10T18:25:09Z No. of bitstreams: 1 TESE Ana Hermínia Andrade e Silva.pdf: 1090771 bytes, checksum: 8e2a4ceb20b4376bc5081da0e2216081 (MD5)
Made available in DSpace on 2018-05-10T18:25:09Z (GMT). No. of bitstreams: 1 TESE Ana Hermínia Andrade e Silva.pdf: 1090771 bytes, checksum: 8e2a4ceb20b4376bc5081da0e2216081 (MD5) Previous issue date: 2017-02-14
CAPES
Na presente tese de doutorado, apresentamos estimadores dos parâmetros que indexam as transformações de Manly e Box-Cox, usadas para transformar a variável resposta do modelo de regressão linear, e também testes de hipóteses. A tese é composta por quatro capítulos. No Capítulo 2, desenvolvemos dois testes escore para a transformação de Box-Cox e dois testes escore para a transformação de Manly (Ts e Ts0), para estimar os parâmetros das transformações. A principal desvantagem da transformação de Box-Cox é que ela só pode ser aplicada a dados não negativos. Por outro lado, a transformação de Manly pode ser aplicada a qualquer dado real. Utilizamos simulações de Monte Carlo para avaliarmos os desempenhos dos estimadores e testes propostos. O principal resultado é que o teste Ts teve melhor desempenho que o teste Ts0, tanto em tamanho quanto em poder. No Capítulo 3 apresentamos refinamentos para os testes escore desenvolvidos no Capítulo 2 usando o fast double bootstrap. Seu desempenho foi avaliado via simulações de Monte Carlo. O resultado principal é que o teste fast double bootstrap é superior ao teste bootstrap clássico. No Capítulo 4 propusemos sete estimadores não-paramétricos para estimar os parâmetros que indexam as transformações de Box-Cox e Manly, com base em testes de normalidade. Realizamos simulações de Monte Carlo em três casos. Comparamos os desempenhos dos estimadores não-paramétricos com o do estimador de máxima verosimilhança (EMV). No terceiro caso, pelo menos um estimador não-paramétrico apresenta desempenho superior ao EMV.
In this PhD dissertation we develop estimators and tests on the parameters that index the Manly and Box-Cox transformations, which are used to transform the response variable of the linear regression model. It is composed of four chapters. In Chapter 2 we develop two score tests for the Box-Cox and Manly transformations (Ts and Ts0). The main disadvantage of the Box-Cox transformation is that it can only be applied to positive data. In contrast, Manly transformation can be applied to any real data. We performed Monte Carlo simulations to evaluate the finite sample performances of the pro-posed estimators and tests. The results show that the Ts test outperforms Ts0 test, both in size and in power. In Chapter 3, we present refinements for the score tests developed in Chapter 2 using the fast double bootstrap. We performed Monte Carlo simulations to evaluate the effectiveness of such a bootstrap scheme. The main result is that the fast double bootstrap is superior to the standard bootstrap. In Chapter 4, we propose seven nonparametric estimators for the parameters that index the Box-Cox and Manly transformations, based on normality tests. We performed Monte Carlo simulations in three cases. We compare performances of the nonparametric estimators with that of the maximum likelihood estimator (MLE).
35

Ah-Kine, Pascal Soon Shien. "Simultaneous confidence bands in linear regression analysis." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/167557/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A simultaneous confidence band provides useful information on the plausible range of an unknown regression model. For a simple linear regression model, the most frequently quoted bands in the statistical literature include the two-segment band, the three-segment band and the hyperbolic band, and for a multiple linear regression model, the most com- mon bands in the statistical literature include the hyperbolic band and the constant width band. The optimality criteria for confidence bands include the Average Width criterion considered by Gafarian (1964) and Naiman (1984) among others, and the Minimum Area Confidence Set (MACS) criterion of Liu and Hayter (2007). A concise review of the construction of two-sided simultaneous confidence bands in simple and multiple linear re- gressions and their comparison under the two mentioned optimality criteria is provided in the thesis. Two families of confidence bands, the inner-hyperbolic bands and the outerhyperbolic bands, which include the hyperbolic and three-segment bands as special cases, are introduced for a simple linear regression. Under the MACS criterion, the best con- fidence band within each family is found by numerical search and compared with the hyperbolic band, the best three-segment band and with each other. The inner-hyperbolic family of confidence bands, which include the hyperbolic and constant-width bands as special cases, is also constructed for a multiple linear regression model over an ellipsoidal covariate region and the best band within the family is found by numerical search. For a multiple linear regression model over a rectangular covariate region (i.e. the predictor variables are constrained in intervals), no method of constructing exact simultaneous con- fidence bands has been published so far. A method to construct exact two-sided hyperbolic and constant width bands over a rectangular covariate region and compare between them is provided in this thesis when there are up to three predictor variables. A simulation method similar to the ones used by Liu et al. (2005a) and Liu et al. (2005b) is also provided for the calculation of the average width and the minimum volume of confidence set when there are more than three predictor variables. The methods used in this thesis are illustrated with numerical examples and the Matlab programs used are available upon request.
36

Hu, ChungLynn. "Nonignorable nonresponse in the logistic regression analysis /." The Ohio State University, 1998. http://rave.ohiolink.edu/etdc/view?acc_num=osu1487950153601414.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

White, Lisa A. "Predicting hospital admissions with Poisson regression analysis." Thesis, Monterey, Calif. : Naval Postgraduate School, 2009. http://edocs.nps.edu/npspubs/scholarly/theses/2009/Jun/09Jun%5FWhite.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M.S. in Operations Research)--Naval Postgraduate School, June 2009.
Thesis Advisor(s): Whitaker, Lyn R. "June 2009." Description based on title screen as viewed on July 14, 2009. Author(s) subject terms: Poisson regression, MTF, military treatment facility, hospital admissions. Includes bibliographical references (p. 53-54). Also available in print.
38

Liu, Hai. "Semiparametric regression analysis of zero-inflated data." Diss., University of Iowa, 2009. https://ir.uiowa.edu/etd/308.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Zero-inflated data abound in ecological studies as well as in other scientific and quantitative fields. Nonparametric regression with zero-inflated response may be studied via the zero-inflated generalized additive model (ZIGAM). ZIGAM assumes that the conditional distribution of the response variable belongs to the zero-inflated 1-parameter exponential family which is a probabilistic mixture of the zero atom and the 1-parameter exponential family, where the zero atom accounts for an excess of zeroes in the data. We propose the constrained zero-inflated generalized additive model (COZIGAM) for analyzing zero-inflated data, with the further assumption that the probability of non-zero-inflation is some monotone function of the (non-zero-inflated) exponential family distribution mean. When the latter assumption obtains, the new approach provides a unified framework for modeling zero-inflated data, which is more parsimonious and efficient than the unconstrained ZIGAM. We develop an iterative algorithm for model estimation based on the penalized likelihood approach, and derive formulas for constructing confidence intervals of the maximum penalized likelihood estimator. Some asymptotic properties including the consistency of the regression function estimator and the limiting distribution of the parametric estimator are derived. We also propose a Bayesian model selection criterion for choosing between the unconstrained and the constrained ZIGAMs. We consider several useful extensions of the COZIGAM, including imposing additive-component-specific proportional and partial constraints, and incorporating threshold effects to account for regime shift phenomena. The new methods are illustrated with both simulated data and real applications. An R package COZIGAM has been developed for model fitting and model selection with zero-inflated data.
39

Ormerod, John T. Mathematics &amp Statistics Faculty of Science UNSW. "On semiparametric regression and data mining." Awarded by:University of New South Wales. Mathematics & Statistics, 2008. http://handle.unsw.edu.au/1959.4/40913.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Semiparametric regression is playing an increasingly large role in the analysis of datasets exhibiting various complications (Ruppert, Wand & Carroll, 2003). In particular semiparametric regression a plays prominent role in the area of data mining where such complications are numerous (Hastie, Tibshirani & Friedman, 2001). In this thesis we develop fast, interpretable methods addressing many of the difficulties associated with data mining applications including: model selection, missing value analysis, outliers and heteroscedastic noise. We focus on function estimation using penalised splines via mixed model methodology (Wahba 1990; Speed 1991; Ruppert et al. 2003). In dealing with the difficulties associated with data mining applications many of the models we consider deviate from typical normality assumptions. These models lead to likelihoods involving analytically intractable integrals. Thus, in keeping with the aim of speed, we seek analytic approximations to such integrals which are typically faster than numeric alternatives. These analytic approximations not only include popular penalised quasi-likelihood (PQL) approximations (Breslow & Clayton, 1993) but variational approximations. Originating in physics, variational approximations are a relatively new class of approximations (to statistics) which are simple, fast, flexible and effective. They have recently been applied to statistical problems in machine learning where they are rapidly gaining popularity (Jordan, Ghahramani, Jaakkola & Sau11999; Corduneanu & Bishop, 2001; Ueda & Ghahramani, 2002; Bishop & Winn, 2003; Winn & Bishop 2005). We develop variational approximations to: generalized linear mixed models (GLMMs); Bayesian GLMMs; simple missing values models; and for outlier and heteroscedastic noise models, which are, to the best of our knowledge, new. These methods are quite effective and extremely fast, with fitting taking minutes if not seconds on a typical 2008 computer. We also make a contribution to variational methods themselves. Variational approximations often underestimate the variance of posterior densities in Bayesian models (Humphreys & Titterington, 2000; Consonni & Marin, 2004; Wang & Titterington, 2005). We develop grid-based variational posterior approximations. These approximations combine a sequence of variational posterior approximations, can be extremely accurate and are reasonably fast.
40

Lai, Pik-ying, and 黎碧瑩. "Lp regression under general error distributions." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30287844.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Wang, Xue. "Empirical Bayes block shrinkage for wavelet regression." Thesis, University of Nottingham, 2006. http://eprints.nottingham.ac.uk/13516/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
There has been great interest in recent years in the development of wavelet methods for estimating an unknown function observed in the presence of noise, following the pioneering work of Donoho and Johnstone (1994, 1995) and Donoho et al. (1995). In this thesis, a novel empirical Bayes block (EBB) shrinkage procedure is proposed and the performance of this approach with both independent identically distributed (IID) noise and correlated noise is thoroughly explored. The first part of this thesis develops a Bayesian methodology involving the non-central X[superscript]2 distribution to simultaneously shrink wavelet coefficients in a block, based on the block sum of squares. A useful (and to the best of our knowledge, new) identity satisfied by the non-central X[superscript]2 density is exploited. This identity leads to tractable posterior calculations for suitable families of prior distributions. Also, the families of prior distribution we work with are sufficiently flexible to represent various forms of prior knowledge. Furthermore, an efficient method for finding the hyperparameters is implemented and simulations show that this method has a high degree of computational advantage. The second part relaxes the assumption of IID noise considered in the first part of this thesis. A semi-parametric model including a parametric component and a nonparametric component is presented to deal with correlated noise situations. In the parametric component, attention is paid to the covariance structure of the noise. Two distinct parametric methods (maximum likelihood estimation and time series model identification techniques) for estimating the parameters in the covariance matrix are investigated. Both methods have been successfully implemented and are believed to be new additions to smoothing methods.
42

McClelland, Robyn L. "Regression based variable clustering for data reduction /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/9611.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Zhou, Qi Jessie. "Inferential methods for extreme value regression models /." *McMaster only, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
44

Daud, Isa Bin. "Influence diagnostics in regression with censored data." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/11728.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The work in this thesis is concerned with the development and extension of techniques for the assessment of influence diagnostics in data that include censored observations. Various regression models with censored data are presented and we concentrate on two models which are the accelerated failure time model, where the errors are generated by mixtures of normal distributions,and the Cox proportional hazards model. For the former, both finite discrete and continuous mixtures are considered, and an EM algorithm is used to determine measures of influence for each case. For the Cox proportional hazards model, various approaches to approximating influence curves are investigated. One-step or few-step approximations are developed using an EM algorithm and compared with a Newton-Raphson approach. Cook's measures of local influence are also investigated for the detection of influential cases in the data. The validity of the proportional hazards assumptions is also investigated. The residuals of Schoenfeld are examined for the possibility of being used to detect time dependence of the covariates in the proportional hazards model. Estimates to describe the nature of the time dependency computed from these residuals are presented.
45

Huang, Jian. "Estimation in regression models with interval censoring /." Thesis, Connect to this title online; UW restricted, 1994. http://hdl.handle.net/1773/8950.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Kao, Tzu-Yuan, and 高子瑗. "Effect Regression Analysis." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/05416517583972590707.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
碩士
國立交通大學
統計學研究所
103
Specification of direct and total effects have been mistreated for years that will mislead to incorrect conclusion about statistical inferences about indirect effect. We prove that for detection of presence of mediation, it requires only to test the association between the predictor and mediator giving the reason that how Baron and Kenny (1986)’s three steps of tests has low power. We also provide theoretical proofs for the observation of Hayes (2009) for absence of total effect but there is indirect effect and the observation of Palmatier et al. (2009) for total effect containing no indirect effect. With regression function being formulated in terms of distributional parameters of variables of response, predictor and mediator, we allow to quantify the information of mediation existed in their joint distribution to be removed to specify unambiguous direct effect and total effects. One important discovery is that the mediation causes not only affect the regression slope parameters but also the regression intercept. So, instead of limited use of slope type effects, we introduce regression setup direct, indirect and total effects expanding to the scope of effect prediction. Statistical inferences for these effect regressions are introduced and evaluated.
47

Chang, Teng-Kai, and 張登凱. "Interaction Regression Analysis." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/t9ezhu.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
碩士
國立交通大學
統計學研究所
105
Classically researchers verify for presence of interaction, the effect of interdependence, though detection for presence of product term in regression function on record product derivative of this function. We first verify the appropriateness of product term criterion by introducing a first order derivative criterion. We then investigate if the indirect effect in causal inference can interpret the interaction effect. We also study the presence of switch line that divides the regression function into two parts: one with synergistic effect and one with antagonistic effect. Finally we conduct a data analysis.
48

(8039492), Huyunting Huang Sr. "Regression Principal Analysis." Thesis, 2019.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Principal Component Analysis (PCA) is a widely used dimensional reduction method that aims to find a low dimension sub space of highly correlated data for its major information to be used in further analysis. Machine learning methods based on PCA are popular in high dimensional data analysis, such as video and image processing. In video processing, the Robust PCA (RPCA), which is a modified method of the traditional PCA, has good properties in separating moving objects from the background, but it may have difficulties in separating those when light intensity of the background varies significantly in time. To overcome the difficulties, a modified PCA method, called Regression PCA (RegPCA), is proposed. The method is developed by combining the traditional PCA and regression approaches together, and it can be easily combined with RPCA for video processing. We focus the presentation of RegPCA with the combination of RPCA on video processing and find that it is more reliable than RPCA only. We use RegPCA to separate moving object from the background in a color video and get a better result than that given by RPCA. In the implementation, we first derive the explanatory variables by the background information. we then process a number of frames of the video and use those as a set of response variables. We remove the impact of the background by regressing the response against the explanatory variables by a regression model. The regression model provides a set of residuals, which can be further analyzed by RPCA. We compare the results of RegRPCA against those of RPCA only. It is evident that the moving objects can be completely removed from the background using our method but not in RPCA. Note that our result is based on a combination of RegPCA with RPCA. Our proposed method provides a new implementation of RPCA under the framework of regression approaches, which can be used to account for the impact of risk factors. This problem cannot be addressed by the application of RPCA only.
49

"Supervised ridge regression in high dimensional linear regression." 2013. http://library.cuhk.edu.hk/record=b5549319.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
在機器學習領域,我們通常有很多的特徵變量,以確定一些回應變量的行為。例如在基因測試問題,我們有數以萬計的基因用來作為特徵變量,而它們與某些疾病的關係需要被確定。沒有提供具體的知識,最簡單和基本的方法來模擬這種問題會是一個線性的模型。有很多現成的方法來解決線性回歸問題,像傳統的普通最小二乘回歸法,嶺回歸和套索回歸。設 N 為樣本數和,p 為特徵變量數,在普通的情況下,我們通常有足夠的樣本(N> P)。 在這種情況下,普通線性回歸的方法,例如嶺回歸通常會給予合理的對未來的回應變量測值的預測。隨著現代統計學的發展,我們經常會遇到高維問題(N << P),如 DNA 芯片數據的測試問題。在這些類型的高維問題中,確定特徵變量和回應變量之間的關係在沒有任何進一步的假設的情況下是相當困難的。在很多現實問題中,儘管有大量的特徵變量存在,但是完全有可能只有極少數的特徵變量和回應變量有直接關係,而大部分其他的特徵變量都是無效的。 套索和嶺回歸等傳統線性回歸在高維問題中有其局限性。套索回歸在應用於高維問題時,會因為測量噪聲的存在而表現得很糟糕,這將導致非常低的預測準確率。嶺回歸也有其明顯的局限性。它不能夠分開真正的特徵變量和無效的特徵變量。我提出的新方法的目的就是在高維線性回歸中克服以上兩種方法的局限性,從而導致更精確和穩定的預測。想法其實很簡單,與其做一個單一步驟的線性回歸,我們將回歸過程分成兩個步驟。第一步,我们棄那些預測有相關性很小或為零的特徵變量。第二步,我們應該得到一個消減過的特徵變量集,我們將用這個集和回應變量來進行嶺回歸從而得到我們需要的結果。
In the field of statistical learning, we usually have a lot of features to determine the behavior of some response. For example in gene testing problems we have lots of genes as features and their relations with certain disease need to be determined. Without specific knowledge available, the most simple and fundamental way to model this kind of problem would be a linear model. There are many existing method to solve linear regression, like conventional ordinary least squares, ridge regression and LASSO (least absolute shrinkage and selection operator). Let N denote the number of samples and p denote the number of predictors, in ordinary settings where we have enough samples (N > p), ordinary linear regression methods like ridge regression will usually give reasonable predictions for the future values of the response. In the development of modern statistical learning, it's quite often that we meet high dimensional problems (N << p), like documents classification problems and microarray data testing problems. In high-dimensional problems it is generally quite difficult to identify the relationship between the predictors and the response without any further assumptions. Despite the fact that there are many predictors for prediction, most of the predictors are actually spurious in a lot of real problems. A predictor being spurious means that it is not directly related to the response. For example in microarray data testing problems, millions of genes may be available for doing prediction, but only a few hundred genes are actually related to the target disease. Conventional techniques in linear regression like LASSO and ridge regression both have their limitations in high-dimensional problems. The LASSO is one of the "state of the art technique for sparsity recovery, but when applied to high-dimensional problems, LASSO's performance is degraded a lot due to the presence of the measurement noise, which will result in high variance prediction and large prediction error. Ridge regression on the other hand is more robust to the additive measurement noise, but has its obvious limitation of not being able to separate true predictors from spurious predictors. As mentioned previously in many high-dimensional problems a large number of the predictors could be spurious, then in these cases ridge's disability in separating spurious and true predictors will result in poor interpretability of the model as well as poor prediction performance. The new technique that I will propose in this thesis aims to accommodate for the limitations of these two methods thus resulting in more accurate and stable prediction performance in a high-dimensional linear regression problem with signicant measurement noise. The idea is simple, instead of the doing a single step regression, we divide the regression procedure into two steps. In the first step we try to identify the seemingly relevant predictors and those that are obviously spurious by calculating the uni-variant correlations between the predictors and the response. We then discard those predictors that have very small or zero correlation with the response. After the first step we should have obtained a reduced predictor set. In the second step we will perform a ridge regression between the reduced predictor set and the response, the result of this ridge regression will then be our desired output. The thesis will be organized as follows, first I will start with a literature review about the linear regression problem and introduce in details about the ridge and LASSO and explain more precisely about their limitations in high-dimensional problems. Then I will introduce my new method called supervised ridge regression and show the reasons why it should dominate the ridge and LASSO in high-dimensional problems, and some simulation results will be demonstrated to strengthen my argument. Finally I will conclude with the possible limitations of my method and point out possible directions for further investigations.
Detailed summary in vernacular field only.
Zhu, Xiangchen.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 68-69).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts also in Chinese.
Chapter 1. --- BASICS ABOUT LINEAR REGRESSION --- p.2
Chapter 1.1 --- Introduction --- p.2
Chapter 1.2 --- Linear Regression and Least Squares --- p.2
Chapter 1.2.1 --- Standard Notations --- p.2
Chapter 1.2.2 --- Least Squares and Its Geometric Meaning --- p.4
Chapter 2. --- PENALIZED LINEAR REGRESSION --- p.9
Chapter 2.1 --- Introduction --- p.9
Chapter 2.2 --- Deficiency of the Ordinary Least Squares Estimate --- p.9
Chapter 2.3 --- Ridge Regression --- p.12
Chapter 2.3.1 --- Introduction to Ridge Regression --- p.12
Chapter 2.3.2 --- Expected Prediction Error And Noise Variance Decomposition of Ridge Regression --- p.13
Chapter 2.3.3 --- Shrinkage effects on different principal components by ridge regression --- p.18
Chapter 2.4 --- The LASSO --- p.22
Chapter 2.4.1 --- Introduction to the LASSO --- p.22
Chapter 2.4.2 --- The Variable Selection Ability and Geometry of LASSO --- p.25
Chapter 2.4.3 --- Coordinate Descent Algorithm to solve for the LASSO --- p.28
Chapter 3. --- LINEAR REGRESSION IN HIGH-DIMENSIONAL PROBLEMS --- p.31
Chapter 3.1 --- Introduction --- p.31
Chapter 3.2 --- Spurious Predictors and Model Notations for High-dimensional Linear Regression --- p.32
Chapter 3.3 --- Ridge and LASSO in High-dimensional Linear Regression --- p.34
Chapter 4. --- THE SUPERVISED RIDGE REGRESSION --- p.39
Chapter 4.1 --- Introduction --- p.39
Chapter 4.2 --- Definition of Supervised Ridge Regression --- p.39
Chapter 4.3 --- An Underlying Latent Model --- p.43
Chapter 4.4 --- Ridge LASSO and Supervised Ridge Regression --- p.45
Chapter 4.4.1 --- LASSO vs SRR --- p.45
Chapter 4.4.2 --- Ridge regression vs SRR --- p.46
Chapter 5. --- TESTING AND SIMULATION --- p.49
Chapter 5.1 --- A Simulation Example --- p.49
Chapter 5.2 --- More Experiments --- p.54
Chapter 5.2.1 --- Correlated Spurious and True Predictors --- p.55
Chapter 5.2.2 --- Insufficient Amount of Data Samples --- p.59
Chapter 5.2.3 --- Low Dimensional Problem --- p.62
Chapter 6. --- CONCLUSIONS AND DISCUSSIONS --- p.66
Chapter 6.1 --- Conclusions --- p.66
Chapter 6.2 --- References and Related Works --- p.68
50

Li, Chia-hua, and 李嘉華. "Influence Analysis for ROC Regression." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/34336694525320073168.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
碩士
國立中正大學
統計科學所
96
Receiver operating characteristic (ROC) curve is a technique for evaluate screening or diagnostic tests with not binary test results. ROC regression analysis provides a method to evaluate covariate effects which may infuence the test accuracy. In ROC regression analysis, if we perturb one case in data, the ROC regression estimators estimated by using perturbed data may be more different than by complete data. The character of estimators may be determined by this case while most of the data is essentially ignored. Therefore, we have interests in studying the influence of unusual observations in ROC regression analysis. The perturbation theory provides a useful tool in sensitivity analysis. In this thesis, we develop single-perturbation influence functions to detect the influential points for ROC regression. A simulated data and a real data are provided to illustrate the applications of our approach.

To the bibliography