Dissertations / Theses on the topic 'Regression'

To see the other types of publications on this topic, follow the link: Regression.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Regression.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Jacobs, Mary Christine. "Regression Trees Versus Stepwise Regression." UNF Digital Commons, 1992. http://digitalcommons.unf.edu/etd/145.

Full text
Abstract:
Many methods have been developed to determine the "appropriate" subset of independent variables in a multiple variable problem. Some of the methods are application specific while others have a wide range of uses. This study compares two such methods, Regression Trees and Stepwise Regression. A simulation using a known distribution is used for the comparison. In 699 out of 742 cases the Regression Tree method gave better predictors than the Stepwise Regression procedure.
APA, Harvard, Vancouver, ISO, and other styles
2

Ranganai, Edmore. "Aspects of model development using regression quantiles and elemental regressions." Thesis, Stellenbosch : Stellenbosch University, 2007. http://hdl.handle.net/10019.1/18668.

Full text
Abstract:
Dissertation (PhD)--University of Stellenbosch, 2007.
ENGLISH ABSTRACT: It is well known that ordinary least squares (OLS) procedures are sensitive to deviations from the classical Gaussian assumptions (outliers) as well as data aberrations in the design space. The two major data aberrations in the design space are collinearity and high leverage. Leverage points can also induce or hide collinearity in the design space. Such leverage points are referred to as collinearity influential points. As a consequence, over the years, many diagnostic tools to detect these anomalies as well as alternative procedures to counter them were developed. To counter deviations from the classical Gaussian assumptions many robust procedures have been proposed. One such class of procedures is the Koenker and Bassett (1978) Regressions Quantiles (RQs), which are natural extensions of order statistics, to the linear model. RQs can be found as solutions to linear programming problems (LPs). The basic optimal solutions to these LPs (which are RQs) correspond to elemental subset (ES) regressions, which consist of subsets of minimum size to estimate the necessary parameters of the model. On the one hand, some ESs correspond to RQs. On the other hand, in the literature it is shown that many OLS statistics (estimators) are related to ES regression statistics (estimators). Therefore there is an inherent relationship amongst the three sets of procedures. The relationship between the ES procedure and the RQ one, has been noted almost “casually” in the literature while the latter has been fairly widely explored. Using these existing relationships between the ES procedure and the OLS one as well as new ones, collinearity, leverage and outlier problems in the RQ scenario were investigated. Also, a lasso procedure was proposed as variable selection technique in the RQ scenario and some tentative results were given for it. These results are promising. Single case diagnostics were considered as well as their relationships to multiple case ones. In particular, multiple cases of the minimum size to estimate the necessary parameters of the model, were considered, corresponding to a RQ (ES). In this way regression diagnostics were developed for both ESs and RQs. The main problems that affect RQs adversely are collinearity and leverage due to the nature of the computational procedures and the fact that RQs’ influence functions are unbounded in the design space but bounded in the response variable. As a consequence of this, RQs have a high affinity for leverage points and a high exclusion rate of outliers. The influential picture exhibited in the presence of both leverage points and outliers is the net result of these two antagonistic forces. Although RQs are bounded in the response variable (and therefore fairly robust to outliers), outlier diagnostics were also considered in order to have a more holistic picture. The investigations used comprised analytic means as well as simulation. Furthermore, applications were made to artificial computer generated data sets as well as standard data sets from the literature. These revealed that the ES based statistics can be used to address problems arising in the RQ scenario to some degree of success. However, due to the interdependence between the different aspects, viz. the one between leverage and collinearity and the one between leverage and outliers, “solutions” are often dependent on the particular situation. In spite of this complexity, the research did produce some fairly general guidelines that can be fruitfully used in practice.
AFRIKAANSE OPSOMMING: Dit is bekend dat die gewone kleinste kwadraat (KK) prosedures sensitief is vir afwykings vanaf die klassieke Gaussiese aannames (uitskieters) asook vir data afwykings in die ontwerpruimte. Twee tipes afwykings van belang in laasgenoemde geval, is kollinearitiet en punte met hoë hefboom waarde. Laasgenoemde punte kan ook kollineariteit induseer of versteek in die ontwerp. Na sodanige punte word verwys as kollinêre hefboom punte. Oor die jare is baie diagnostiese hulpmiddels ontwikkel om hierdie afwykings te identifiseer en om alternatiewe prosedures daarteen te ontwikkel. Om afwykings vanaf die Gaussiese aanname teen te werk, is heelwat robuuste prosedures ontwikkel. Een sodanige klas van prosedures is die Koenker en Bassett (1978) Regressie Kwantiele (RKe), wat natuurlike uitbreidings is van rangorde statistieke na die lineêre model. RKe kan bepaal word as oplossings van lineêre programmeringsprobleme (LPs). Die basiese optimale oplossings van hierdie LPs (wat RKe is) kom ooreen met die elementale deelversameling (ED) regressies, wat bestaan uit deelversamelings van minimum grootte waarmee die parameters van die model beraam kan word. Enersyds geld dat sekere EDs ooreenkom met RKe. Andersyds, uit die literatuur is dit bekend dat baie KK statistieke (beramers) verwant is aan ED regressie statistieke (beramers). Dit impliseer dat daar dus ‘n inherente verwantskap is tussen die drie klasse van prosedures. Die verwantskap tussen die ED en die ooreenkomstige RK prosedures is redelik “terloops” van melding gemaak in die literatuur, terwyl laasgenoemde prosedures redelik breedvoerig ondersoek is. Deur gebruik te maak van bestaande verwantskappe tussen ED en KK prosedures, sowel as nuwes wat ontwikkel is, is kollineariteit, punte met hoë hefboom waardes en uitskieter probleme in die RK omgewing ondersoek. Voorts is ‘n lasso prosedure as veranderlike seleksie tegniek voorgestel in die RK situasie en is enkele tentatiewe resultate daarvoor gegee. Hierdie resultate blyk belowend te wees, veral ook vir verdere navorsing. Enkel geval diagnostiese tegnieke is beskou sowel as hul verwantskap met meervoudige geval tegnieke. In die besonder is veral meervoudige gevalle beskou wat van minimum grootte is om die parameters van die model te kan beraam, en wat ooreenkom met ‘n RK (ED). Met sodanige benadering is regressie diagnostiese tegnieke ontwikkel vir beide EDs en RKe. Die belangrikste probleme wat RKe negatief beinvloed, is kollineariteit en punte met hoë hefboom waardes agv die aard van die berekeningsprosedures en die feit dat RKe se invloedfunksies begrensd is in die ruimte van die afhanklike veranderlike, maar onbegrensd is in die ontwerpruimte. Gevolglik het RKe ‘n hoë affiniteit vir punte met hoë hefboom waardes en poog gewoonlik om uitskieters uit te sluit. Die finale uitset wat verkry word wanneer beide punte met hoë hefboom waardes en uitskieters voorkom, is dan die netto resultaat van hierdie twee teenstrydige pogings. Alhoewel RKe begrensd is in die onafhanklike veranderlike (en dus redelik robuust is tov uitskieters), is uitskieter diagnostiese tegnieke ook beskou om ‘n meer holistiese beeld te verkry. Die ondersoek het analitiese sowel as simulasie tegnieke gebruik. Voorts is ook gebruik gemaak van kunsmatige datastelle en standard datastelle uit die literatuur. Hierdie ondersoeke het getoon dat die ED gebaseerde statistieke met ‘n redelike mate van sukses gebruik kan word om probleme in die RK omgewing aan te spreek. Dit is egter belangrik om daarop te let dat as gevolg van die interafhanklikheid tussen kollineariteit en punte met hoë hefboom waardes asook dié tussen punte met hoë hefboom waardes en uitskieters, “oplossings” dikwels afhanklik is van die bepaalde situasie. Ten spyte van hierdie kompleksiteit, is op grond van die navorsing wat gedoen is, tog redelike algemene riglyne verkry wat nuttig in die praktyk gebruik kan word.
APA, Harvard, Vancouver, ISO, and other styles
3

McCubbin, Courtney C. "Regressive Play| An Investigation of Regression in the Analytic Container." Thesis, Pacifica Graduate Institute, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13426903.

Full text
Abstract:

This thesis is a heuristic, hermeneutic investigation into regression using the author's experience as a case study. Regressive play and the desire for deeper regression within the analytic container are explored, guided by the question: What is the experience of following one's impulse to regress to more and more primordial states, and what kind of psychological container is needed to facilitate that deepening both inter- and intrapersonally? The author details a history of regression beginning with Sigmund Freud and continuing to psychoanalyst Michael Balint's basic fault, object relations therapist Donald Winnicott's regression to dependence, and Jungian analyst Brian Feldman's psychic skin. The therapeutic role of play is explored. The analyst's response to regression and how it facilitates or hinders the client's ability to regress are presented. This thesis challenges the notion that regression should be discouraged within a psychoanalytic frame, instead suggesting ways the analyst may hold the regression elementally.

APA, Harvard, Vancouver, ISO, and other styles
4

Ishikawa, Noemi Ichihara. "Uso de transformações em modelos de regressão logística." Universidade de São Paulo, 2007. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-05062007-202656/.

Full text
Abstract:
Modelos para dados binários são bastante utilizados em várias situações práticas. Transformações em Análise de Regressão podem ser aplicadas para linearizar ou simplificar o modelo e também para corrigir desvios de suposições. Neste trabalho, descrevemos o uso de transformações nos modelos de regressão logística para dados binários e apresentamos modelos envolvendo parâmetros adicionais de modo a obter um ajuste mais adequado. Posteriormente, analisamos o custo da estimação quando são adicionados parâmetros aos modelos e apresentamos os testes de hipóteses relativos aos parâmetros do modelo de regressão logística de Box-Cox. Finalizando, apresentamos alguns métodos de diagnóstico para avaliar a influência das observações nas estimativas dos parâmetros de transformação da covariável, com aplicação a um conjunto de dados reais.
Binary data models have a lot of utilities in many practical situations. In Regrssion Analisys, transformations can be applied to linearize or simplify the model and correct deviations of the suppositions. In this dissertation, we show the use of the transformations in logistic models to binary data models and models involving additional parameters to obtain more appropriate fits. We also present the cost of the estimation when parameters are added to models, hypothesis tests of the parameters in the Box-Cox logistic regression model and finally, diagnostics methods to evaluate the influence of the observations in the estimation of the transformation covariate parameters with their applications to a real data set.
APA, Harvard, Vancouver, ISO, and other styles
5

Schwartz, Amanda Jo. "Adaptive Regression Testing Strategies for Cost-Effective Regression Testing." Diss., North Dakota State University, 2013. https://hdl.handle.net/10365/26926.

Full text
Abstract:
Regression testing is an important but expensive part of the software development life-cycle. Many different techniques have been proposed for reducing the cost of regression testing. To date, much research has been performed comparing regression testing techniques, but very little research has been performed to aid practitioners and researchers in choosing the most cost-effective technique for a particular regression testing session. One recent study investigated this problem and proposed Adaptive Regression Testing (ART) strategies to aid practitioners in choosing the most cost-effective technique for a specific version of a software system. The results of this study showed that the techniques chosen by the ART strategy were more cost-effective than techniques that did not consider system lifetime and testing processes. This work has several limitations, however. First, it only considers one ART strategy. There are many other strategies which could be developed and studied that could be more cost-effective. Second, the ART strategy used the Analytical Hierarchy Process (AHP). The AHP method is subjective to the weights made by the decision maker. Also, the AHP method is very time consuming because it requires many pairwise comparisons. Pairwise comparisons also limit the scalability of the approach and are often found to be inconsistent. This work proposes three new ART strategies to address these limitations. One strategy utilizing the fuzzy AHP method is proposed to address imprecision in the judgment made by the decision maker. A second strategy utilizing a fuzzy expert system is proposed to reduce the time required by the decision maker, eliminate inconsistencies due to pairwise comparisons, and increase scalability. A third strategy utilizing the Weighted Sum Model is proposed to study the performance of a simple, low cost strategy. Then, a series of empirical studies are performed to evaluate the new strategies. The results of the studies show that the strategies proposed in this work are more cost-effective than the strategy presented in the previous study.
National Science Foundation
APA, Harvard, Vancouver, ISO, and other styles
6

Williams, Ulyana P. "On Some Ridge Regression Estimators for Logistic Regression Models." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Full text
Abstract:
The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.
APA, Harvard, Vancouver, ISO, and other styles
7

Sánchez, Lozano Enrique. "Continuous regression : a functional regression approach to facial landmark tracking." Thesis, University of Nottingham, 2017. http://eprints.nottingham.ac.uk/43300/.

Full text
Abstract:
Facial Landmark Tracking (Face Tracking) is a key step for many Face Analysis systems, such as Face Recognition, Facial Expression Recognition, or Age and Gender Recognition, among others. The goal of Facial Landmark Tracking is to locate a sparse set of points defining a facial shape in a video sequence. These typically include the mouth, the eyes, the contour, or the nose tip. The state of the art method for Face Tracking builds on Cascaded Regression, in which a set of linear regressors are used in a cascaded fashion, each receiving as input the output of the previous one, subsequently reducing the error with respect to the target locations. Despite its impressive results, Cascaded Regression suffers from several drawbacks, which are basically caused by the theoretical and practical implications of using Linear Regression. Under the context of Face Alignment, Linear Regression is used to predict shape displacements from image features through a linear mapping. This linear mapping is learnt through the typical least-squares problem, in which a set of random perturbations is given. This means that, each time a new regressor is to be trained, Cascaded Regression needs to generate perturbations and apply the sampling again. Moreover, existing solutions are not capable of incorporating incremental learning in real time. It is well-known that person-specific models perform better than generic ones, and thus the possibility of personalising generic models whilst tracking is ongoing is a desired property, yet to be addressed. This thesis proposes Continuous Regression, a Functional Regression solution to the least-squares problem, resulting in the first real-time incremental face tracker. Briefly speaking, Continuous Regression approximates the samples by an estimation based on a first-order Taylor expansion yielding a closed-form solution for the infinite set of shape displacements. This way, it is possible to model the space of shape displacements as a continuum, without the need of using complex bases. Further, this thesis introduces a novel measure that allows Continuous Regression to be extended to spaces of correlated variables. This novel solution is incorporated into the Cascaded Regression framework, and its computational benefits for training under different configurations are shown. Then, it presents an approach for incremental learning within Cascaded Regression, and shows its complexity allows for real-time implementation. To the best of my knowledge, this is the first incremental face tracker that is shown to operate in real-time. The tracker is tested in an extensive benchmark, attaining state of the art results, thanks to the incremental learning capabilities.
APA, Harvard, Vancouver, ISO, and other styles
8

Kazemi, Seyed Mehran. "Relational logistic regression." Thesis, University of British Columbia, 2014. http://hdl.handle.net/2429/50091.

Full text
Abstract:
Aggregation is a technique for representing conditional probability distributions as an analytic function of parents. Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this thesis, we consider extending logistic regression to directed relational models, where there are objects and relations among them, and we want to model varying populations and interactions among parents. We first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. We compare our relational logistic regression to Markov logic networks and represent their analogies and differences. Finally, we show how other well-known aggregators can be represented using relational logistic regression.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
9

Zuo, Yanling. "Monotone regression functions." Thesis, University of British Columbia, 1990. http://hdl.handle.net/2429/29457.

Full text
Abstract:
In some applications, we require a monotone estimate of a regression function. In others, we want to test whether the regression function is monotone. For solving the first problem, Ramsay's, Kelly and Rice's, as well as point-wise monotone regression functions in a spline space are discussed and their properties developed. Three monotone estimates are defined: least-square regression splines, smoothing splines and binomial regression splines. The three estimates depend upon a "smoothing parameter": the number and location of knots in regression splines and the usual [formula omitted] in smoothing splines. Two standard techniques for choosing the smoothing parameter, GCV and AIC, are modified for monotone estimation, for the normal errors case. For answering the second question, a test statistic is proposed and its null distribution conjectured. Simulations are carried out to check the conjecture. These techniques are applied to two data sets.
Science, Faculty of
Statistics, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
10

Sullwald, Wichard. "Grain regression analysis." Thesis, Stellenbosch : Stellenbosch University, 2014. http://hdl.handle.net/10019.1/86526.

Full text
Abstract:
Thesis (MSc)--Stellenbosch University, 2014.
ENGLISH ABSTRACT: Grain regression analysis forms an essential part of solid rocket motor simulation. In this thesis a numerical grain regression analysis module is developed as an alternative to cumbersome and time consuming analytical methods. The surface regression is performed by the level-set method, a numerical interface advancement scheme. A novel approach to the integration of the surface area and volume of a numerical interface, as defined implicitly in a level-set framework, by means of Monte-Carlo integration is proposed. The grain regression module is directly coupled to a quasi -1D internal ballistics solver in an on-line fashion, in order to take into account the effects of spatially varying burn rate distributions. A multi-timescale approach is proposed for the direct coupling of the two solvers.
AFRIKAANSE OPSOMMING: Gryn regressie analise vorm ’n integrale deel van soliede vuurpylmotor simulasie. In hierdie tesis word ’n numeriese gryn regressie analise model, as ’n alternatief tot dikwels omslagtige en tydrowende analitiese metodes, ontwikkel. Die oppervlak regressie word deur die vlak-set metode, ’n numeriese koppelvlak beweging skema uitgevoer. ’n Nuwe benadering tot die integrasie van die buite-oppervlakte en volume van ’n implisiete numeriese koppelvlak in ’n vlakset raamwerk, deur middel van Monte Carlo-integrasie word voorgestel. Die gryn regressie model word direk en aanlyn aan ’n kwasi-1D interne ballistiek model gekoppel, ten einde die uitwerking van ruimtelik-wisselende brand-koers in ag te neem. ’n Multi-tydskaal benadering word voorgestel vir die direkte koppeling van die twee modelle.
APA, Harvard, Vancouver, ISO, and other styles
11

Bai, Xue. "Robust linear regression." Kansas State University, 2012. http://hdl.handle.net/2097/14977.

Full text
Abstract:
Master of Science
Department of Statistics
Weixin Yao
In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the regression parameter estimates. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust regression methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, [tau]-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator.
APA, Harvard, Vancouver, ISO, and other styles
12

Guo, Mengmeng. "Generalized quantile regression." Doctoral thesis, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, 2012. http://dx.doi.org/10.18452/16569.

Full text
Abstract:
Die generalisierte Quantilregression, einschließlich der Sonderfälle bedingter Quantile und Expektile, ist insbesondere dann eine nützliche Alternative zum bedingten Mittel bei der Charakterisierung einer bedingten Wahrscheinlichkeitsverteilung, wenn das Hauptinteresse in den Tails der Verteilung liegt. Wir bezeichnen mit v_n(x) den Kerndichteschätzer der Expektilkurve und zeigen die stark gleichmßige Konsistenzrate von v-n(x) unter allgemeinen Bedingungen. Unter Zuhilfenahme von Extremwerttheorie und starken Approximationen der empirischen Prozesse betrachten wir die asymptotischen maximalen Abweichungen sup06x61 |v_n(x) − v(x)|. Nach Vorbild der asymptotischen Theorie konstruieren wir simultane Konfidenzb änder um die geschätzte Expektilfunktion. Wir entwickeln einen funktionalen Datenanalyseansatz um eine Familie von generalisierten Quantilregressionen gemeinsam zu schätzen. Dabei gehen wir in unserem Ansatz davon aus, dass die generalisierten Quantile einige gemeinsame Merkmale teilen, welche durch eine geringe Anzahl von Hauptkomponenten zusammengefasst werden können. Die Hauptkomponenten sind als Splinefunktionen modelliert und werden durch Minimierung eines penalisierten asymmetrischen Verlustmaßes gesch¨atzt. Zur Berechnung wird ein iterativ gewichteter Kleinste-Quadrate-Algorithmus entwickelt. Während die separate Schätzung von individuell generalisierten Quantilregressionen normalerweise unter großer Variablit¨at durch fehlende Daten leidet, verbessert unser Ansatz der gemeinsamen Schätzung die Effizienz signifikant. Dies haben wir in einer Simulationsstudie demonstriert. Unsere vorgeschlagene Methode haben wir auf einen Datensatz von 150 Wetterstationen in China angewendet, um die generalisierten Quantilkurven der Volatilität der Temperatur von diesen Stationen zu erhalten
Generalized quantile regressions, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We denote $v_n(x)$ as the kernel smoothing estimator of the expectile curves. We prove the strong uniform consistency rate of $v_{n}(x)$ under general conditions. Moreover, using strong approximations of the empirical process and extreme value theory, we consider the asymptotic maximal deviation $\sup_{ 0 \leqslant x \leqslant 1 }|v_n(x)-v(x)|$. According to the asymptotic theory, we construct simultaneous confidence bands around the estimated expectile function. We develop a functional data analysis approach to jointly estimate a family of generalized quantile regressions. Our approach assumes that the generalized quantiles share some common features that can be summarized by a small number of principal components functions. The principal components are modeled as spline functions and are estimated by minimizing a penalized asymmetric loss measure. An iteratively reweighted least squares algorithm is developed for computation. While separate estimation of individual generalized quantile regressions usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 150 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations
APA, Harvard, Vancouver, ISO, and other styles
13

Gündüz, Necla. "D-optimal designs for weighted linear regression and binary regression models." Thesis, University of Glasgow, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.301629.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Song, Dogyoon. "Blind regression : nonparametric regression for latent variable models via collaborative filtering." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105958.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 77-81).
Recommender systems are tools that provide suggestions for items that are most likely to be of interest to a particular user; they are central to various decision making processes so that recommender systems have become ubiquitous. We introduce blind regression, a framework motivated by matrix completion for recommender systems: given m users, n items, and a subset of user-item ratings, the goal is to predict the unobserved ratings given the data, i.e., to complete the partially observed matrix. We posit that user u and movie i have features x1(u) and x2(i) respectively, and their corresponding rating y(u, i) is a noisy measurement of f(x1(u), x2(i)) for some unknown function f. In contrast to classical regression, the features x = (x1(u), x2(i)) are not observed (latent), making it challenging to apply standard regression methods. We suggest a two-step procedure to overcome this challenge: 1) estimate distance for latent variables, and then 2) apply nonparametric regression. Applying this framework to matrix completion, we provide a prediction algorithm that is consistent for all Lipschitz functions. In fact, the analysis naturally leads to a variant of collaborative filtering, shedding insight into the widespread success of collaborative filtering. Assuming each entry is revealed independently with p = max(m-1+[delta], n-1/2+[delta]) for [delta] > 0, we prove that the expected fraction of our estimates with error greater than [epsilon] is less than [gamma]2/[epsilon]2, plus a polynomially decaying term, where [gamma]2 is the variance of the noise. Experiments with the MovieLens and Netflix datasets suggest that our algorithm provides principled improvements over basic collaborative filtering and is competitive with matrix factorization methods. The algorithm and analysis naturally extend to higher order tensor completion by simply flattening the tensor into a matrix. We show that our simple and principled approach is competitive with respect to state-of-art tensor completion algorithms when applied to image inpainting data. Lastly, we conclude this thesis by proposing various related directions for future research.
by Dogyoon Song.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
15

Rodrigues, Cátia Sofia Martins. "Quais os fatores que determinam o rendimento dos indivíduos em Portugal? - Regressão de Quantis." Master's thesis, Instituto Superior de Economia e Gestão, 2021. http://hdl.handle.net/10400.5/23425.

Full text
Abstract:
Mestrado Bolonha em Métodos Quantitativos para a Decisão Económica e Empresarial
Apesar de se ter vindo a verificar, ao longo dos anos, um decréscimo significativo na desigualdade entre rendimentos, este tema ainda é alvo de estudo, principalmente numa abordagem econométrica, onde o principal objetivo passa por identificar e perceber os principais fatores que estão por detrás das desigualdades sentidas. Desta forma, o presente projeto destina-se ao estudo dos fatores que determinam o rendimento dos indivíduos residentes em Portugal, adotando uma abordagem de regressão de quantis, uma vez que grupos de indivíduos com diferentes valores de rendimento podem ter comportamentos distintos. Para tal, foram utilizados dados provenientes do Instituto Nacional de Estatística (INE) que permitiram construir o modelo estimado. A variável em estudo é o rendimento anual dos residentes em Portugal, no ano de 2019, e o modelo conta com oito regressores que caracterizam não só o indivíduo, incluindo, nomeadamente, a sua idade, sexo ou estado civil, mas também a sua instituição empregadora, incluindo variáveis como a dimensão, número de horas de trabalho, entre outras. Com o desenvolvimento do projeto e tendo em conta a análise aos resultados da estimação, é possível concluir que existem fatores, nomeadamente o género, nível de educação e região onde o indivíduo reside, responsáveis pela diferença significativa no valor do rendimento anual dos residentes em Portugal. No entanto, esta diferença não é uniforme para todos os grupos de indivíduos e comporta-se de maneira diferente quando comparados grupos de indivíduos com rendimentos mais baixos, médios ou altos. Este comportamento não linear permitiu ainda compreender a vantagem da utilização do método de regressão de quantis face ao método econométrico mais comum, a regressão linear, cujo objetivo é estimar o efeito das diferentes variáveis explicativas nos valores médios da variável dependente. A base de dados utilizada foi construída utilizando o software SQL Developer e a análise foi conduzida com recurso ao Stata.
Despite the fact that, over the years, there has been a significant decrease in income inequality, this issue is still a subject under study, mainly in an econometric approach, with the aim of studying and understanding the factors behind those inequalities. The main focus of this project is to identify and study the factors that determine the income of individuals living in Portugal, adopting a quantile regression approach, since individuals with different wages may have different behaviors. For this purpose, a regression model was created, using data from Statistics Portugal. The variable under study is the annual income of residents in Portugal, in 2019, and the model has several regressors that not only characterize the individual, such as their age, sex or marital status, but also the company, such as their dimension and number of working hours. With the development of this project and taking into account the estimation results, it is possible to conclude that there are factors, namely the individual's gender, level of education and region where he lives, responsible for the significant difference in the value of the annual income of residents in Portugal. However, these differences are not uniform for all groups of individuals, since there is a different behavior when comparing groups of individuals with lower, medium or high income. This nonlinear behavior also allowed to understand the advantage of using quantile regression over the most common econometric method, linear regression, whose objective is to estimate the effect of different explanatory variables on the average values of the dependent variable. The database used was built using SQL Developer and the analysis was conducted with software Stata.
info:eu-repo/semantics/publishedVersion
APA, Harvard, Vancouver, ISO, and other styles
16

Li, Ying. "A Comparison Study of Principle Component Regression, Partial Least Square Regression and Ridge Regression with Application to FTIR Data." Thesis, Uppsala University, Department of Statistics, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-127983.

Full text
Abstract:

Least squares estimator may fail when the number of explanatory vari-able is relatively large in comparison to the sample or if the variablesare almost collinear. In such a situation, principle component regres-sion, partial least squares regression and ridge regression are oftenproposed methods and widely used in many practical data analysis,especially in chemometrics. They provide biased coecient estima-tors with the relatively smaller variation than the variance of the leastsquares estimator. In this paper, a brief literature review of PCR,PLS and RR is made from a theoretical perspective. Moreover, a dataset is used, in order to examine their performance on prediction. Theconclusion is that for prediction PCR, PLS and RR provide similarresults. It requires substantial verication for any claims as to thesuperiority of any of the three biased regression methods.

APA, Harvard, Vancouver, ISO, and other styles
17

Galarza, Morales Christian Eduardo 1988. "Quantile regression for mixed-effects models = Regressão quantílica para modelos de efeitos mistos." [s.n.], 2015. http://repositorio.unicamp.br/jspui/handle/REPOSIP/306681.

Full text
Abstract:
Orientador: Víctor Hugo Lachos Dávila
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação Científica
Made available in DSpace on 2018-08-27T06:40:31Z (GMT). No. of bitstreams: 1 GalarzaMorales_ChristianEduardo_M.pdf: 5076076 bytes, checksum: 0967f08c9ad75f9e7f5df339563ef75a (MD5) Previous issue date: 2015
Resumo: Os dados longitudinais são frequentemente analisados usando modelos de efeitos mistos normais. Além disso, os métodos de estimação tradicionais baseiam-se em regressão na média da distribuição considerada, o que leva a estimação de parâmetros não robusta quando a distribuição do erro não é normal. Em comparação com a abordagem de regressão na média convencional, a regressão quantílica (RQ) pode caracterizar toda a distribuição condicional da variável de resposta e é mais robusta na presença de outliers e especificações erradas da distribuição do erro. Esta tese desenvolve uma abordagem baseada em verossimilhança para analisar modelos de RQ para dados longitudinais contínuos correlacionados através da distribuição Laplace assimétrica (DLA). Explorando a conveniente representação hierárquica da DLA, a nossa abordagem clássica segue a aproximação estocástica do algoritmo EM (SAEM) para derivar estimativas de máxima verossimilhança (MV) exatas dos efeitos fixos e componentes de variância em modelos lineares e não lineares de efeitos mistos. Nós avaliamos o desempenho do algoritmo em amostras finitas e as propriedades assintóticas das estimativas de MV através de experimentos empíricos e aplicações para quatro conjuntos de dados reais. Os algoritmos SAEMs propostos são implementados nos pacotes do R qrLMM() e qrNLMM() respectivamente
Abstract: Longitudinal data are frequently analyzed using normal mixed effects models. Moreover, the traditional estimation methods are based on mean regression, which leads to non-robust parameter estimation for non-normal error distributions. Compared to the conventional mean regression approach, quantile regression (QR) can characterize the entire conditional distribution of the outcome variable and is more robust to the presence of outliers and misspecification of the error distribution. This thesis develops a likelihood-based approach to analyzing QR models for correlated continuous longitudinal data via the asymmetric Laplace distribution (ALD). Exploiting the nice hierarchical representation of the ALD, our classical approach follows the stochastic Approximation of the EM (SAEM) algorithm for deriving exact maximum likelihood (ML) estimates of the fixed-effects and variance components in linear and nonlinear mixed effects models. We evaluate the finite sample performance of the algorithm and the asymptotic properties of the ML estimates through empirical experiments and applications to four real life datasets. The proposed SAEMs algorithms are implemented in the R packages qrLMM() and qrNLMM() respectively
Mestrado
Estatistica
Mestre em Estatística
APA, Harvard, Vancouver, ISO, and other styles
18

Salanti, Georgia. "The Isotonic Regression Framework." Diss., lmu, 2003. http://nbn-resolving.de/urn:nbn:de:bvb:19-9665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Nordvall, Andreas. "Agile regression system testing." Thesis, KTH, Data- och elektroteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102934.

Full text
Abstract:
This report describes the work on automating the testing of nodes at CCS (Common Control System) in Ericsson. The goal was to every three hours configure nodes with the latest build and run the tests. This process is to be fully automatic without user in-put. The existing configuration tool CICC (Core Integration node Control Center) is to be used for configuration. Before work started fault reports were analyzed and creating a usecase for testing restarts should reduce some faults.The first step was to make the configuration tool CICC automated. To schedule the test-ing the continuous integration tool Jenkins was used. But Jenkins can’t by itself run CICC nor interpret the result. Therefore a wrapper layer was implemented. When the wrapper is finished it stores the results of the configuration run in a XML (eXtensible Markup Language) file, which Jenkins reads. Results can then be seen in Jenkins through web interface. If there were any failures during configuration or testing the failed step will have an error message.The project shows that automation is possible. Automating the testing reduce the time for correcting errors because they are more likely to be found early in the process. Be-fore implementing this project in production some improvements should be made. The most significant improvement is making the configuration and testing of each node par-allel with each other, in order to make the time limit for configuration and testing less of an issue.
Denna rapport beskriver arbetet med att automatisera testningen av noder hos CCS på Ericsson. Målet var att var tredje timma konfigurera noderna med binärfiler kompilerade från den senaste källkoden och sedan testa dem. Detta ska ske helt automatisk utan att användarens hjälp och konfigurationen ska använda det befintliga konfigurations verktyget CICC. Innan arbetet påbörjades skulle felrapporter analyseras för att se om det fanns något att tjäna på automaseringen.Uppgiften löstes genom att först titta på felrapporterna och konstatera att det fanns rum för förbättringar, främst gällande omstarter. Efter det automatiserades CICC som tidigare körts via en GUI. För att schemalägga konfiguration och testning användes testverktyget Jenkins. Jenkins använder sig av ett s.k. wrapperskript som kör CICC och testfallen. Wrapperskriptet sköter även felhanteringen och skriver sedan resultatet av körningen till en XML fil som läses av Jenkins.Resultaten av testen går sedan att se i Jenkins via ett webinterface. Där går det att se resultatet av wrapperskript körningen och testerna, om det blev några fel finns det felmeddelanden med anledningen till felet. Misslyckade tester visas också.Projektet visar att med automatisk testning som sker oftare kan fler fel hittas tidigare och därför åtgärdas snabbare. Innan arbetet används skarpt bör förbättringar ske som tillexempel att köra konfiguration och testning av olika noder parallellt med varandra i wrapperskriptet, för att klara tidsbegränsningen när det är flera noder.
APA, Harvard, Vancouver, ISO, and other styles
20

Pedroso, Estevam de Souza Camila. "Switching nonparametric regression models." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/45130.

Full text
Abstract:
In this thesis, we propose a methodology to analyze data arising from a curve that, over its domain, switches among J states. We consider a sequence of response variables, where each response y depends on a covariate x according to an unobserved state z, also called a hidden or latent state. The states form a stochastic process and their possible values are j=1,...,J. If z equals j the expected response of y is one of J unknown smooth functions evaluated at x. We call this model a switching nonparametric regression model. In a Bayesian switching nonparametric regression model the uncertainty about the functions is formulated by modeling the functions as realizations of stochastic processes. In a frequentist switching nonparametric regression model the functions are merely assumed to be smooth. We consider two different data structures: one with N replicates and the other with one single realization. For the hidden states, we consider those that are independent and identically distributed and those that follow a Markov structure. We develop an EM algorithm to estimate the parameters of the latent state process and the functions corresponding to the J states. Standard errors for the parameter estimates of the state process are also obtained. We investigate the frequentist properties of the proposed estimates via simulation studies. Two different applications of the proposed methodology are presented. In the first application we analyze the well-known motorcycle data in an innovative way: treating the data as coming from J>1 simulated accident runs with unobserved run labels. In the second application we analyze daytime power usage on business days in a building treating each day as a replicate and modeling power usage as arising from two functions, one function giving power usage when the cooling system of the building is off, the other function giving power usage when the cooling system is on.
APA, Harvard, Vancouver, ISO, and other styles
21

Yu, Keming. "Smooth regression quantile estimation." Thesis, Open University, 1996. http://oro.open.ac.uk/57655/.

Full text
Abstract:
In this thesis, attention will be mainly focused on the local linear kernel regression quantile estimation. Different estimators within this class have been proposed, developed asymptotically and applied to real applications. I include algorithmdesign and selection of smoothing parameters. Chapter 2 studies two estimators, first a single-kernel estimator based on "check function" and a bandwidth selection rule is proposed based on the asymptotic MSE of this estimator. Second a recursive double-kernel estimator which extends Fan et al's (1996) density estimator, and two algorithms are given for bandwidth selection. In Chapter 3, a comparison is carried out of local constant fitting and local linear fitting using MSEs of the estimates as a criterion. Chapter 4 gives a theoretical summary and a simulation study of local linear kernel estimation of conditional distribution function. This has a special interest in itself as well as being related to regression quantiles. In Chapter 5, a kernel-version method of LMS (Cole and Green, 1992) is considered. The method proposed, which is still a semi-parametric one, is based on a general idea of local linear kernel approach of log-likelihood model. Chapter 6 proposes a two-step method of smoothing regression quantiles called BPK. The method considered is based on the idea of combining k- NN method with Healy's et al (1988) partition rule, and correlated regression model are involved. In Chapter 7, methods of regression quantile estimation are compared for different underlying models and design densities in a simulation study. The ISE criterion of interior and boundary points is used as a basis for these comparisons. Three methods are recommended for quantile regression in practice, and they are double kernel method, LMS method and Box partition kernel method (BPK). In Chapter 8, attention is turned to a novel idea of local polynomial roughness penalty regression model, where a purely theoretical framework is considered.
APA, Harvard, Vancouver, ISO, and other styles
22

Cribari-Neto, Francisco, and Achim Zeileis. "Beta Regression in R." Department of Statistics and Mathematics x, WU Vienna University of Economics and Business, 2009. http://epub.wu.ac.at/726/1/document.pdf.

Full text
Abstract:
The class of beta regression models is commonly used by practitioners to model variables that assume values in the standard unit interval (0, 1). It is based on the assumption that the dependent variable is beta-distributed and that its mean is related to a set of regressors through a linear predictor with unknown coefficients and a link function. The model also includes a precision parameter which may be constant or depend on a (potentially different) set of regressors through a link function as well. This approach naturally incorporates features such as heteroskedasticity or skewness which are commonly observed in data taking values in the standard unit interval, such as rates or proportions. This paper describes the betareg package which provides the class of beta regressions in the R system for statistical computing. The underlying theory is briefly outlined, the implementation discussed and illustrated in various replication exercises.
Series: Research Report Series / Department of Statistics and Mathematics
APA, Harvard, Vancouver, ISO, and other styles
23

Bailey, Jacob. "Illinois basis regression models." Thesis, Kansas State University, 2014. http://hdl.handle.net/2097/17396.

Full text
Abstract:
Master of Agribusiness
Department of Agricultural Economics
Sean Fox
The commodity markets have seen a great deal of volatility over the past decade, which, for those involved, has created many challenges and opportunities. Some of those challenges and opportunities are related to the behavior of the basis – the difference between the local cash price of grain and its price in the futures market. This thesis examines factors impacting basis for corn and soybeans at an Illinois River barge terminal, inland grain terminals in central Illinois, and in the Decatur processing market. Factors used to explain basis behavior include the price level of futures markets, the price spread in the futures market, transportation cost, local demand conditions, and seasonal patterns. Using weekly data on basis from 2000 to 2013, regression models indicate that nearby corn futures, futures spread, inverted market, days until expiration, heating oil futures, and some months are significant drivers of corn basis. For inland terminals and processor regression models nearby corn futures do not appear to have significant effects. Using the same parameters for soybean basis nearby soybean futures, futures spread, inverted market, heating oil and some months are significant drivers but days until expiration do not appear to have a significant effect.
APA, Harvard, Vancouver, ISO, and other styles
24

Nottingham, Quinton J. "Model-robust quantal regression." Diss., Virginia Tech, 1995. http://hdl.handle.net/10919/40225.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Mitchell, Napoleon. "Outliers and Regression Models." Thesis, University of North Texas, 1992. https://digital.library.unt.edu/ark:/67531/metadc279029/.

Full text
Abstract:
The mitigation of outliers serves to increase the strength of a relationship between variables. This study defined outliers in three different ways and used five regression procedures to describe the effects of outliers on 50 data sets. This study also examined the relationship among the shape of the distribution, skewness, and outliers.
APA, Harvard, Vancouver, ISO, and other styles
26

Robinson, Timothy J. "Dual Model Robust Regression." Diss., Virginia Tech, 1997. http://hdl.handle.net/10919/11244.

Full text
Abstract:
In typical normal theory regression, the assumption of homogeneity of variances is often not appropriate. Instead of treating the variances as a nuisance and transforming away the heterogeneity, the structure of the variances may be of interest and it is desirable to model the variances. Aitkin (1987) proposes a parametric dual model in which a log linear dependence of the variances on a set of explanatory variables is assumed. Aitkin's parametric approach is an iterative one providing estimates for the parameters in the mean and variance models through joint maximum likelihood. Estimation of the mean and variance parameters are interrelatedas the responses in the variance model are the squared residuals from the fit to the means model. When one or both of the models (the mean or variance model) are misspecified, parametric dual modeling can lead to faulty inferences. An alternative to parametric dual modeling is to let the data completely determine the form of the true underlying mean and variance functions (nonparametric dual modeling). However, nonparametric techniques often result in estimates which are characterized by high variability and they ignore important knowledge that the user may have regarding the process. Mays and Birch (1996) have demonstrated an effective semiparametric method in the one regressor, single-model regression setting which is a "hybrid" of parametric and nonparametric fits. Using their techniques, we develop a dual modeling approach which is robust to misspecification in either or both of the two models. Examples will be presented to illustrate the new technique, termed here as Dual Model Robust Regression.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
27

Ehlers, Lathan. "REGRESSION TOWARD THE MEAN." OpenSIUC, 2017. https://opensiuc.lib.siu.edu/theses/2093.

Full text
Abstract:
AN ABSTRACT OF THE THESIS OF Lathan Ehlers, for the Master of Fine Arts degree in Creative Writing, presented on 03 April 2017, at Southern Illinois University Carbondale. TITLE: REGRESSION TOWARD THE MEAN MAJOR PROFESSOR: Judy Jordan The poems in Regression Toward the Mean strive to determine a way to navigate the self when memory cannot be trusted. Understanding that memory is fallible, suggestible, and likely inaccurate, the speaker in these poems attempts find objective answers to questions surrounding his past by blending scientific theory with the personal narrative. Science and poetry can both be viewed as quests for truth, but the speaker finds not this elusive truth, but more ambiguity. It is in this ambiguity, however, that the speaker also finds agency—a way to move beyond his past and create a future for himself.
APA, Harvard, Vancouver, ISO, and other styles
28

Rizzolo, Gregory. "The critique of regression." Thesis, University of Essex, 2018. http://repository.essex.ac.uk/21875/.

Full text
Abstract:
When we dream, Freud (1900) maintained, we slip backwards from a world of conscious action to an unconscious realm of infantile memory and desire. The residues of our waking life meet there with repressed primitive wishes capable of animating a dream. The idea of regression, with all of its intrigue, would shape a century of theory building. It would also become one of the thorniest, if recently neglected, areas of inquiry. The history of the concept attests to two interwoven but distinct traditions. One tradition emphasizes the defensive, or evasive, function of regression. The other calls attention to potential non-defensive, restorative functions. Both traditions rely problematically on what Hartmann (1965) termed the genetic fallacy: the reduction of later forms to their original precursors. The genetic fallacy, in turn, supports a morality of maturity whereby unwanted aspects of human experience, which we recognize to be universal, are nonetheless attributed uniquely to children or to images of the child within. I shall argue, contrary to the theory of regression, that the person is inextricably nested in the present field of lifespan development. What were formerly considered regressions are better described as shifts, or transformations, within the field. The pathologies of regression are best seen, not as the result of regressive arrest/fixation, but as adaptations to cyclical lifespan problems. I articulate the theoretical propositions behind this reframe and explore its application in two case histories, one of a defensive regression, one of restorative regression, in the recent literature.
APA, Harvard, Vancouver, ISO, and other styles
29

Hirsch, Daniel, and Tim Steinholtz. "Tidsserie regression på finansmarknaden." Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-255797.

Full text
Abstract:
I den här rapporten studerar vi prestanda för två maskininlärningsalgoritmer när de implementeras för prisförutsättningar på den svenska elmarknaden. Målet med detta projekt är att utvärdera om dessa algoritmer kan användas som verktyg för investeringar. Algoritmerna är Kernel Ridge Regression (KRR) och Support Vector Regression (SVR). Både KRR och SVR använder kernel trick för att effektivt hitta olinjära beroende på den volatila marknaden. Metoderna används båda med ett offline-tillvägagångssätt. För Kernel Ridge Regression, genomfördes också en online-approach using Stochastic Gradient Descent (SGD) för att minska beräkningskostnaden. Båda algoritmerna tillämpas på den svenska elmarknaden för år 2017, med hjälp av programmeringsmiljön Matlab. För att utvärdera algoritmens prestanda beräknades det genomsnittliga absoluta procentsatsfelet (MAPE), rotenhetens kvadratfel (RMSE) och det genomsnittliga absolutvärdet (MAE). Slutsatserna av detta projekt är att båda metoderna visar potential att användas i finansiella tidsserier förutsägelser. De presenterade implementationerna behöver dock vissa förbättringar. Exempel på möjliga sätt att raffinera de resultat som uppnåtts i detta projekt diskuteras, med idéer avlägsna implementeringar.
In this report, we study the performance of two machine learning algorithms when implemented for price predictions on the Swedish electricity market. The goal of this project is to evaluate if these algorithms can be used as a tool for investments. The algorithms are Kernel Ridge Regression (KRR), and Support Vector Regression (SVR). Both KRR and SVR use the kernel trick to efficiently find non-linear dependencies in the volatile market. The methods are both used with an offline approach. For the Kernel Ridge Regression, an online approach using Stochastic Gradient Descent (SGD) to reduce the computational cost was also implemented. Both algorithms are applied to the Swedish electricity market for the year 2017, using the programming environment Matlab. To evaluate the performance of the algorithms the mean absolute percentage error (MAPE), the root mean squared error (RMSE), and the mean absolute error (MAE) were calculated. The conclusions of this project are that both methods show potential for being used in financial time series predictions. The presented implementations, however, are in need of some refinements. Examples of possible ways to refine the results obtained in this project are discussed, with ideas of future implementations.
APA, Harvard, Vancouver, ISO, and other styles
30

Chen, Kun. "Regularized multivariate stochastic regression." Diss., University of Iowa, 2011. https://ir.uiowa.edu/etd/1209.

Full text
Abstract:
In many high dimensional problems, the dependence structure among the variables can be quite complex. An appropriate use of the regularization techniques coupled with other classical statistical methods can often improve estimation and prediction accuracy and facilitate model interpretation, by seeking a parsimonious model representation that involves only the subset of revelent variables. We propose two regularized stochastic regression approaches, for efficiently estimating certain sparse dependence structure in the data. We first consider a multivariate regression setting, in which the large number of responses and predictors may be associated through only a few channels/pathways and each of these associations may only involve a few responses and predictors. We propose a regularized reduced-rank regression approach, in which the model estimation and rank determination are conducted simultaneously and the resulting regularized estimator of the coefficient matrix admits a sparse singular value decomposition (SVD). Secondly, we consider model selection of subset autoregressive moving-average (ARMA) modelling, for which automatic selection methods do not directly apply because the innovation process is latent. We propose to identify the optimal subset ARMA model by fitting a penalized regression, e.g. adaptive Lasso, of the time series on its lags and the lags of the residuals from a long autoregression fitted to the time-series data, where the residuals serve as proxies for the innovations. Computation algorithms and regularization parameter selection methods for both proposed approaches are developed, and their properties are explored both theoretically and by simulation. Under mild regularity conditions, the proposed methods are shown to be selection consistent, asymptotically normal and enjoy the oracle properties. We apply the proposed approaches to several applications across disciplines including cancer genetics, ecology and macroeconomics.
APA, Harvard, Vancouver, ISO, and other styles
31

Zhang, Y. "Quantification of prediction uncertainty for principal components regression and partial least squares regression." Thesis, University College London (University of London), 2014. http://discovery.ucl.ac.uk/1433990/.

Full text
Abstract:
Principal components regression (PCR) and partial least squares regression (PLS) are widely used in multivariate calibration in the fields of chemometrics, econometrics, social science and so forth, serving as alternative solutions to the problems which arise in ordinary least squares regression when explanatory variables are either collinear, or there are hundreds of explanatory variables with a relatively small sample size. Both PCR and PLS tackle the problems by constructing lower dimensional factors based on the explanatory variables. The extra step of factor construction makes the standard prediction uncertainty theory of ordinary least squares regression not directly applicable to the two reduced dimension methods. In the thesis, we start by reviewing the ordinary least squares regression prediction uncertainty theory, and then investigate how the theory performs when it extends to PCR and PLS, aiming at potentially better approaches. The first main contribution of the thesis is to clarify the quantification of prediction uncertainty for PLS. We rephrase existing methods with consistent mathematical notations in the hope of giving a clear guidance to practitioners. The second main contribution is to develop a new linearisation method for PLS. After establishing the theory, simulation and real data studies have been employed to understand and compare the new method with several commonly used methods. From the studies of simulations and a real dataset, we investigate the properties of simple approaches based on the theory of ordinary least squares theory, the approaches using resampling of data, and the local linearisation approaches including a classical and our improved new methods. It is advisable to use the ordinary least squares type prediction variance with the estimated regression error variance from the tuning set in both PCR and PLS in practice.
APA, Harvard, Vancouver, ISO, and other styles
32

Dai, Elin, and Lara Güleryüz. "Factors that influence condominium pricing in Stockholm: A regression analysis : A regression analysis." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254235.

Full text
Abstract:
This thesis aims to examine which factors that are of significance when forecasting the selling price of condominiums in Stockholm city. Through the use of multiple linear regression, response variable transformation, and a multitude of methods for refining the model fit, a conclusive, out of sample validated model with a confidence level of 95% was obtained. To conduct the statistical methods, the software R was used. This study is limited to the districts of inner city Stockholm with the postal codes 112-118, and the final model can only be applied to this area as the postal codes are included as regressors in the model. The time period in which the selling price was analyzed varied between January 2014 and April 2019, in which the volatility of the time value of money has not been taken into account for the time period. The final model included the following variables as the ones having an impact on the selling price: floor, living area, monthly fee, construction year, district of the city.
Denna studie ämnar till att undersöka vilka faktorer som är av betydelse när syftet är att förutsäga prissättningen på bostadsrätter i Stockholms innerstad. Genom att använda multipel linjär regression, transformation av responsvariabeln, samt en mängd olika metoder för att förfina modellen, togs en slutgiltig, out of sample-validerad modell med ett 95%-konfidensintervall fram. För att genomföra de statistiska metoderna användes programmet R. Denna studie är avgränsad till de distrikt i Stockholms innerstad vars postnummer varierar mellan 112-118, därav är det viktigt att modellen endast appliceras på dessa områden eftersom de är inkluderade i modellen som regressorer. Tidsperioden inom vilket slutpriserna analyserades var mellan januari 2014 och april 2019, i vilket valutans volatilitet inte har analyserats som en ekonomisk påverkande faktor. Den slutgiltiga modellen innefattar de följande variablerna: våning, boarea, månadsavgift, konstruktionsår, distrikt.
APA, Harvard, Vancouver, ISO, and other styles
33

Souza, Saul de Azevêdo. "Modelagem da obesidade adulta nas nações: uma análise via modelos de regressão beta e quantílica." Universidade Federal da Paraíba, 2017. http://tede.biblioteca.ufpb.br:8080/handle/tede/9065.

Full text
Abstract:
Submitted by Viviane Lima da Cunha (viviane@biblioteca.ufpb.br) on 2017-07-06T14:30:18Z No. of bitstreams: 1 arquivototal.pdf: 6614647 bytes, checksum: 21e96f422787eaffc9d7176f0a15e007 (MD5)
Made available in DSpace on 2017-07-06T14:30:18Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 6614647 bytes, checksum: 21e96f422787eaffc9d7176f0a15e007 (MD5) Previous issue date: 2017-02-20
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
In this dissertation the beta regression models with variable dispersion and quantile regression are discussed. Therefore, an introduction was made with the objective of motivating its discussion in epidemiological studies, emphasizing the problematization around obesity. The application of these methods considered a real data set, obtained from public information sources, referring to adult obesity in the nations in the year 2014. After the descriptive analysis of the data it was verified that 50% of the nations present values of the proportion of obese adults greater than 0.20. In addition, viewing the obesity map by nation showed that the highest concentration of countries with the lowest obesity values is found in the continents of Asia and Africa. On the other hand, the highest concentrations of obese are found in the continents of America and Europe. Also, from the graphical analysis of the box-plot a possible difference in the proportions of obese adults between the continents of America and Europe with those of Africa and Asia was observed. After adjusting the beta and quantile regression models it was verified that the covariates average alcohol consumption in liters per person, percentage of insufficient physical activity and percentage of the population living in urban areas have a positive effect on the response variable. That is, individually such covariables tend to increase obesity values in the countries when the other covariables remain constant. In addition, the life expectancy variable in years presented a positive effect and was significant only for the variable regression beta regression model. Finally, analyzing the measures of prediction errors, it was verified that the estimates from the beta regression are more accurate when the mean square error and the total percentage error were evaluated. Therefore, for questions of predicting values for adult obesity in the nations in 2014, the beta regression model with variable dispersion was more suitable for this purpose.
Nesta dissertação são abordados os modelos de regressão beta com dispersão variável e de regressão quantílica. Para tanto, foi feita uma introdução com objetivo de motivar sua discussão em estudos epidemiológicos, enfatizando a problematização em torno da obesidade. A aplicação destes métodos considerou um conjunto de dados reais, obtidos a partir de fontes de informação pública, referente a obesidade adulta nas nações no ano de 2014. Após a análise descritiva dos dados verificou-se que 50% das nações apresentam valores da proporção de adultos obesos maiores do que 0.20. Além disso, visualizando o mapa da obesidade por nação constatou-se que a maior concentração de países com menores valores de obesidade encontra-se nos continentes da Ásia e África. Por outro lado, as maiores concentrações de obesos encontram-se nos continentes da América e Europa. Ainda, a partir da análise gráfica do box-plot foi observado uma possível diferença nas proporções de adultos obesos entre os continentes da América e Europa com os da África e Ásia. Após ajustar os modelos de regressão beta e quantílica verificou-se que as covariáveis consumo médio de álcool em litros por pessoa, porcentagem de atividade física insuficiente e porcentagem da população que vivem em áreas urbanas apresentam efeito positivo sobre a variável resposta. Ou seja, individualmente tais covariáveis tendem a aumentar os valores de obesidade nos países quando as demais covariáveis permanecem constantes. Além disso, a variável expectativa de vida em anos apresentou efeito positivo e foi significativa apenas para o modelo de regressão beta com dispersão variável. Por fim, analisando as medidas de erros de previsão verificou-se que as estimativas oriundas da regressão beta são mais precisas quando avaliado o erro quadrático médio e o erro percentual total. Portanto, para questões de predizer valores referentes a obesidade adulta nas nações em 2014 o modelo de regressão beta com dispersão variável se mostrou mais adequado para tal propósito.
APA, Harvard, Vancouver, ISO, and other styles
34

Peraça, Maria da Graça Teixeira. "Modelos para estimativa do grau de saturação do concreto mediante variáveis ambientais que influenciam na sua variação." reponame:Repositório Institucional da FURG, 2009. http://repositorio.furg.br/handle/1/3436.

Full text
Abstract:
Dissertação(mestrado) - Universidade Federal do Rio Grande, Programa de Pós-Graduação em Engenharia Oceânica, Escola de Engenharia, 2009.
Submitted by Lilian M. Silva (lilianmadeirasilva@hotmail.com) on 2013-04-22T19:51:54Z No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5)
Approved for entry into archive by Bruna Vieira(bruninha_vieira@ibest.com.br) on 2013-06-03T19:20:55Z (GMT) No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5)
Made available in DSpace on 2013-06-03T19:20:55Z (GMT). No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5) Previous issue date: 2009
Nas engenharias, é fundamental estimar o tempo de vida útil das estruturas construídas, o que neste trabalho significa o tempo que os íons cloretos levam para atingirem a armadura do concreto. Um dos coeficientes que influenciam na vida útil do concreto é o de difusão, sendo este diretamente influenciado pelo grau de saturação (GS) do concreto. Recentes estudos levaram ao desenvolvimento de um método de medição do GS. Embora esse método seja eficiente, ainda assim há um grande desperdício de tempo e dinheiro em utilizá-lo. O objetivo deste trabalho é reduzir estes custos calculando uma boa aproximação para o valor do GS com modelos matemáticos que estimem o seu valor através de variáveis ambientais que influenciam na sua variação. As variáveis analisadas nesta pesquisa, são: pressão atmosférica,temperatura do ar seco, temperatura máxima, temperatura mínima, taxa de evaporação interna (Pichê), taxa de precipitação, umidade relativa, insolação, visibilidade, nebulosidade e taxa de evaporação externa. Todas foram analisadas e comparadas estatisticamente com medidas do GS obtidas durante quatro anos de medições semanais, para diferentes famílias de concreto. Com essas análises, pode-se medir a relação entre estes dados verificando que os fatores mais influentes no GS são, temperatura máxima e umidade relativa. Após a verificação desse resultado, foram elaborados modelos estatísticos, para que, através dos dados ambientais, cedidos pelo banco de dados meteorológicos, se possam calcular, sem desperdício de tempo e dinheiro, as médias aproximadas do GS para cada estação sazonal da região sul do Brasil, garantindo assim uma melhor estimativa do tempo de vida útil em estruturas de concreto.
In engineering, it is fundamental to estimate the life-cycle of built structures, which in this study means the period of time required for chlorides to reach the concrete reinforcement. One of the coefficients that affect the life-cycle of concrete is the diffusion, which is directly influenced by the saturation degree (SD) of concrete. Recent studies have led to the development of a measurement method for the SD. Although this method is efficient, there is still waste of time and money when it is used. The objective of this study is to reduce costs by calculating a good approximation for the SD value with mathematical models that predict its value through environmental variables that affect its variation. The variables analysed in the study are: atmospheric pressure, temperature of the dry air, maximum temperature, minimum temperature, internal evaporation rate (Pichê), precipitation rate, relative humidity, insolation, visibility, cloudiness and external evaporation rate. All of them were statistically analysed and compared with measurements of SD obtained during four years of weekly assessments for different families of concrete. By considering these analyses, the relationship among these data can be measured and it can be verified that the most influent variables affecting the SD are the maximum temperature and the relative humidity. After verifying this result, statistical models were developed aiming to calculate, based on the environmental data provided by the meteorological database and without waste of time and money, the approximate averages of SD for each seasonal station of the south region of Brazil, thus providing a better estimative of life-cycle for concrete structures.
APA, Harvard, Vancouver, ISO, and other styles
35

Ryu, Duchwan. "Regression analysis with longitudinal measurements." Texas A&M University, 2005. http://hdl.handle.net/1969.1/2398.

Full text
Abstract:
Bayesian approaches to the regression analysis for longitudinal measurements are considered. The history of measurements from a subject may convey characteristics of the subject. Hence, in a regression analysis with longitudinal measurements, the characteristics of each subject can be served as covariates, in addition to possible other covariates. Also, the longitudinal measurements may lead to complicated covariance structures within each subject and they should be modeled properly. When covariates are some unobservable characteristics of each subject, Bayesian parametric and nonparametric regressions have been considered. Although covariates are not observable directly, by virtue of longitudinal measurements, the covariates can be estimated. In this case, the measurement error problem is inevitable. Hence, a classical measurement error model is established. In the Bayesian framework, the regression function as well as all the unobservable covariates and nuisance parameters are estimated. As multiple covariates are involved, a generalized additive model is adopted, and the Bayesian backfitting algorithm is utilized for each component of the additive model. For the binary response, the logistic regression has been proposed, where the link function is estimated by the Bayesian parametric and nonparametric regressions. For the link function, introduction of latent variables make the computing fast. In the next part, each subject is assumed to be observed not at the prespecifiedtime-points. Furthermore, the time of next measurement from a subject is supposed to be dependent on the previous measurement history of the subject. For this outcome- dependent follow-up times, various modeling options and the associated analyses have been examined to investigate how outcome-dependent follow-up times affect the estimation, within the frameworks of Bayesian parametric and nonparametric regressions. Correlation structures of outcomes are based on different correlation coefficients for different subjects. First, by assuming a Poisson process for the follow- up times, regression models have been constructed. To interpret the subject-specific random effects, more flexible models are considered by introducing a latent variable for the subject-specific random effect and a survival distribution for the follow-up times. The performance of each model has been evaluated by utilizing Bayesian model assessments.
APA, Harvard, Vancouver, ISO, and other styles
36

Campbell, Ian. "The geometry of regression analysis." Thesis, University of Ottawa (Canada), 1989. http://hdl.handle.net/10393/5755.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Chen, Hong Rui, and 陳弘叡. "Nonparametric Principal Components Regression Compared with Forward Regression." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/90471961665828545025.

Full text
Abstract:
碩士
國立政治大學
統計學系
104
In a general linear regression model, when the sample size $n$ is greater than the number of variables $p$, it is common to use the least squares method to estimate the parameters in the regression model. When $n
APA, Harvard, Vancouver, ISO, and other styles
38

"Supervised ridge regression in high dimensional linear regression." 2013. http://library.cuhk.edu.hk/record=b5549319.

Full text
Abstract:
在機器學習領域,我們通常有很多的特徵變量,以確定一些回應變量的行為。例如在基因測試問題,我們有數以萬計的基因用來作為特徵變量,而它們與某些疾病的關係需要被確定。沒有提供具體的知識,最簡單和基本的方法來模擬這種問題會是一個線性的模型。有很多現成的方法來解決線性回歸問題,像傳統的普通最小二乘回歸法,嶺回歸和套索回歸。設 N 為樣本數和,p 為特徵變量數,在普通的情況下,我們通常有足夠的樣本(N> P)。 在這種情況下,普通線性回歸的方法,例如嶺回歸通常會給予合理的對未來的回應變量測值的預測。隨著現代統計學的發展,我們經常會遇到高維問題(N << P),如 DNA 芯片數據的測試問題。在這些類型的高維問題中,確定特徵變量和回應變量之間的關係在沒有任何進一步的假設的情況下是相當困難的。在很多現實問題中,儘管有大量的特徵變量存在,但是完全有可能只有極少數的特徵變量和回應變量有直接關係,而大部分其他的特徵變量都是無效的。 套索和嶺回歸等傳統線性回歸在高維問題中有其局限性。套索回歸在應用於高維問題時,會因為測量噪聲的存在而表現得很糟糕,這將導致非常低的預測準確率。嶺回歸也有其明顯的局限性。它不能夠分開真正的特徵變量和無效的特徵變量。我提出的新方法的目的就是在高維線性回歸中克服以上兩種方法的局限性,從而導致更精確和穩定的預測。想法其實很簡單,與其做一個單一步驟的線性回歸,我們將回歸過程分成兩個步驟。第一步,我们棄那些預測有相關性很小或為零的特徵變量。第二步,我們應該得到一個消減過的特徵變量集,我們將用這個集和回應變量來進行嶺回歸從而得到我們需要的結果。
In the field of statistical learning, we usually have a lot of features to determine the behavior of some response. For example in gene testing problems we have lots of genes as features and their relations with certain disease need to be determined. Without specific knowledge available, the most simple and fundamental way to model this kind of problem would be a linear model. There are many existing method to solve linear regression, like conventional ordinary least squares, ridge regression and LASSO (least absolute shrinkage and selection operator). Let N denote the number of samples and p denote the number of predictors, in ordinary settings where we have enough samples (N > p), ordinary linear regression methods like ridge regression will usually give reasonable predictions for the future values of the response. In the development of modern statistical learning, it's quite often that we meet high dimensional problems (N << p), like documents classification problems and microarray data testing problems. In high-dimensional problems it is generally quite difficult to identify the relationship between the predictors and the response without any further assumptions. Despite the fact that there are many predictors for prediction, most of the predictors are actually spurious in a lot of real problems. A predictor being spurious means that it is not directly related to the response. For example in microarray data testing problems, millions of genes may be available for doing prediction, but only a few hundred genes are actually related to the target disease. Conventional techniques in linear regression like LASSO and ridge regression both have their limitations in high-dimensional problems. The LASSO is one of the "state of the art technique for sparsity recovery, but when applied to high-dimensional problems, LASSO's performance is degraded a lot due to the presence of the measurement noise, which will result in high variance prediction and large prediction error. Ridge regression on the other hand is more robust to the additive measurement noise, but has its obvious limitation of not being able to separate true predictors from spurious predictors. As mentioned previously in many high-dimensional problems a large number of the predictors could be spurious, then in these cases ridge's disability in separating spurious and true predictors will result in poor interpretability of the model as well as poor prediction performance. The new technique that I will propose in this thesis aims to accommodate for the limitations of these two methods thus resulting in more accurate and stable prediction performance in a high-dimensional linear regression problem with signicant measurement noise. The idea is simple, instead of the doing a single step regression, we divide the regression procedure into two steps. In the first step we try to identify the seemingly relevant predictors and those that are obviously spurious by calculating the uni-variant correlations between the predictors and the response. We then discard those predictors that have very small or zero correlation with the response. After the first step we should have obtained a reduced predictor set. In the second step we will perform a ridge regression between the reduced predictor set and the response, the result of this ridge regression will then be our desired output. The thesis will be organized as follows, first I will start with a literature review about the linear regression problem and introduce in details about the ridge and LASSO and explain more precisely about their limitations in high-dimensional problems. Then I will introduce my new method called supervised ridge regression and show the reasons why it should dominate the ridge and LASSO in high-dimensional problems, and some simulation results will be demonstrated to strengthen my argument. Finally I will conclude with the possible limitations of my method and point out possible directions for further investigations.
Detailed summary in vernacular field only.
Zhu, Xiangchen.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 68-69).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts also in Chinese.
Chapter 1. --- BASICS ABOUT LINEAR REGRESSION --- p.2
Chapter 1.1 --- Introduction --- p.2
Chapter 1.2 --- Linear Regression and Least Squares --- p.2
Chapter 1.2.1 --- Standard Notations --- p.2
Chapter 1.2.2 --- Least Squares and Its Geometric Meaning --- p.4
Chapter 2. --- PENALIZED LINEAR REGRESSION --- p.9
Chapter 2.1 --- Introduction --- p.9
Chapter 2.2 --- Deficiency of the Ordinary Least Squares Estimate --- p.9
Chapter 2.3 --- Ridge Regression --- p.12
Chapter 2.3.1 --- Introduction to Ridge Regression --- p.12
Chapter 2.3.2 --- Expected Prediction Error And Noise Variance Decomposition of Ridge Regression --- p.13
Chapter 2.3.3 --- Shrinkage effects on different principal components by ridge regression --- p.18
Chapter 2.4 --- The LASSO --- p.22
Chapter 2.4.1 --- Introduction to the LASSO --- p.22
Chapter 2.4.2 --- The Variable Selection Ability and Geometry of LASSO --- p.25
Chapter 2.4.3 --- Coordinate Descent Algorithm to solve for the LASSO --- p.28
Chapter 3. --- LINEAR REGRESSION IN HIGH-DIMENSIONAL PROBLEMS --- p.31
Chapter 3.1 --- Introduction --- p.31
Chapter 3.2 --- Spurious Predictors and Model Notations for High-dimensional Linear Regression --- p.32
Chapter 3.3 --- Ridge and LASSO in High-dimensional Linear Regression --- p.34
Chapter 4. --- THE SUPERVISED RIDGE REGRESSION --- p.39
Chapter 4.1 --- Introduction --- p.39
Chapter 4.2 --- Definition of Supervised Ridge Regression --- p.39
Chapter 4.3 --- An Underlying Latent Model --- p.43
Chapter 4.4 --- Ridge LASSO and Supervised Ridge Regression --- p.45
Chapter 4.4.1 --- LASSO vs SRR --- p.45
Chapter 4.4.2 --- Ridge regression vs SRR --- p.46
Chapter 5. --- TESTING AND SIMULATION --- p.49
Chapter 5.1 --- A Simulation Example --- p.49
Chapter 5.2 --- More Experiments --- p.54
Chapter 5.2.1 --- Correlated Spurious and True Predictors --- p.55
Chapter 5.2.2 --- Insufficient Amount of Data Samples --- p.59
Chapter 5.2.3 --- Low Dimensional Problem --- p.62
Chapter 6. --- CONCLUSIONS AND DISCUSSIONS --- p.66
Chapter 6.1 --- Conclusions --- p.66
Chapter 6.2 --- References and Related Works --- p.68
APA, Harvard, Vancouver, ISO, and other styles
39

Yao, Ruji. "Regression trees." 1994. http://catalog.hathitrust.org/api/volumes/oclc/31260152.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Liu, Su-Yun, and 劉素韻. "Robust Regression." Thesis, 2002. http://ndltd.ncl.edu.tw/handle/10070115676093643599.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Huang, Shui-mei, and 黃秀梅. "quantile regression." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/59580896304481039057.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

WEN, YU-TING, and 溫俞婷. "Interaction Regression." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/32531082138760802618.

Full text
Abstract:
碩士
國立交通大學
統計學研究所
103
The insertion of product terms into analytical model to test for presence of interaction effect is very common in economic, social and health sciences has two disadvantages: First, it has long been criticized for that existence of interaction is model dependent (Greenland (2009) and Mauderly and Samet (2009)). Second, this classical concept for interaction effect measurement shares the unawareness in common effect identification (Baron and Kenny (1986)) measuring the influences of explanatory variables only on regression function’s slope parameters ignoring its impact on its intercept parameter. We initiate in this research in a regression set-up interaction with a systematic definition and derivation of interaction effect on the regression function. The parametric interaction regression parameters are presented and their parametric maximum likelihood estimations are introduced and verified with simulation studies. Data analysis will also be presented.
APA, Harvard, Vancouver, ISO, and other styles
43

Mimno, David. "Topic regression." 2011. https://scholarworks.umass.edu/dissertations/AAI3498404.

Full text
Abstract:
Text documents are generally accompanied by non-textual information, such as authors, dates, publication sources, and, increasingly, automatically recognized named entities. Work in text analysis has often involved predicting these non-text values based on text data for tasks such as document classification and author identification. This thesis considers the opposite problem: predicting the textual content of documents based on non-text data. In this work I study several regression-based methods for estimating the influence of specific metadata elements in determining the content of text documents. Such topic regression methods allow users of document collections to test hypotheses about the underlying environments that produced those documents.
APA, Harvard, Vancouver, ISO, and other styles
44

Mimno, David. "Topic Regression." 2012. https://scholarworks.umass.edu/open_access_dissertations/520.

Full text
Abstract:
Text documents are generally accompanied by non-textual information, such as authors, dates, publication sources, and, increasingly, automatically recognized named entities. Work in text analysis has often involved predicting these non-text values based on text data for tasks such as document classification and author identification. This thesis considers the opposite problem: predicting the textual content of documents based on non-text data. In this work I study several regression-based methods for estimating the influence of specific metadata elements in determining the content of text documents. Such topic regression methods allow users of document collections to test hypotheses about the underlying environments that produced those documents.
APA, Harvard, Vancouver, ISO, and other styles
45

Wu, Jia-Han, and 吳佳翰. "A ridge regresssion method for improving the semiparametric regression with sparse data." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/49889603780198526742.

Full text
Abstract:
碩士
淡江大學
統計學系碩士班
102
In nonparametric regression analysis, local linear estimator (LLE) enjoys both smaller asymptotic bias and smaller asymptotic variance. However, Seifert and Gasser (1996) pointed out that in finite sample situations, when the design points are sparse or when design points are close to each other, LLE has unbounded conditional variance. The curve that estimated from the LLE has rough appearance accordingly; In order to improve this problem, Seifert and Gasser (1996) combines the local linear smoothing method and ridge regression to construct the local linear ridge regression estimator (LLRRE). This thesis use local linear ridge regression method of Seifert and Gasser (1996) to improve the estimation of semiparametric regression which comprises both parametric and nonparametric regression component. A cross-validation method is used to select the optimal bandwidth and ridge regression parameters. According to the simulation results, when LLE and LLRRE both use their respective cross validated parameters, the LLRRE’s nonparametric regression function estimates have significantly smaller sample mean integrated square error than that of the LLE’s. And the latter method’s coefficient estimates of parametric regression component have significantly smaller mean square errors than that of the former’s.
APA, Harvard, Vancouver, ISO, and other styles
46

Tai, Yun Chiang, and 戴允強. "New Algorithms for Monotone Nonparametric Regression and Monotone Quantile Regression." Thesis, 2000. http://ndltd.ncl.edu.tw/handle/57201176596846154809.

Full text
Abstract:
碩士
國立交通大學
統計所
88
A monotone nonparametric regression model is considered and a constrained weighted least squares solution is proposed for estimating monotone smooth functions from noisy data.The estimate obtained guarantees the monotonicity requirement.An efficient algorithm for computing the proposed solution is developed based on Lemke's algorithm for solving linear complemetarity problems.The leave-one-out cross validation method was adopted for the bandwidth selection.In addition,we propose a monotone nonparametric quantile regression method for interval estimation of the mean function.An iterative algorithm is developed for computing the quantile estimates.The proposed methods are demonstrated by some simulated numerical examples and a real example.The results indicate that the proposed methods are quite promising.
APA, Harvard, Vancouver, ISO, and other styles
47

Cai, Deng. "Spectral Regression : a regression framework for efficient regularized subspace learning /." 2009. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3362740.

Full text
Abstract:
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2009.
Source: Dissertation Abstracts International, Volume: 70-06, Section: B, page: 3593. Adviser: Jiawei Han. Includes bibliographical references (leaves 93-99) Available on microfilm from Pro Quest Information and Learning.
APA, Harvard, Vancouver, ISO, and other styles
48

Sayre, Kent. "Regression testing experiments." Thesis, 1999. http://hdl.handle.net/1957/33192.

Full text
Abstract:
Software maintenance is an expensive part of the software lifecycle: estimates put its cost at up to two-thirds of the entire cost of software. Regression testing, which tests software after it has been modified to help assess and increase its reliability, is responsible for a large part of this cost. Thus, making regression testing more efficient and effective is worthwhile. This thesis performs two experiments with regression testing techniques. The first experiment involves two regression test selection techniques, Dejavu and Pythia. These techniques select a subset of tests from the original test suite to be rerun instead of the entire original test suite in an attempt to save valuable testing time. The experiment investigates the cost and benefit tradeoffs between these techniques. The data indicate that Dejavu can occasionally select smaller test suites than Pythia while Pythia often is more efficient at figuring out which test cases to select than Dejavu. The second experiment involves the investigation of program spectra as a tool to enhance regression testing. Program spectra characterize a program's behavior. The experiment investigates the applicability of program spectra to the detection of faults in modified software. The data indicate that certain types of spectra identify faults on a consistent basis. The data also reveal cost-benefit tradeoffs among spectra types.
Graduation date: 2000
APA, Harvard, Vancouver, ISO, and other styles
49

林貞佑. "Regression Mode Interval." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/98044396995239762216.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Lo, Yi, and 羅驛. "Weighted Quantile Regression." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/31421059248782021412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!