Dissertations / Theses: 'Linear regression'

1

Bai, Xue. "Robust linear regression." Kansas State University, 2012. http://hdl.handle.net/2097/14977.

Abstract:

Master of Science
Department of Statistics
Weixin Yao
In practice, when applying a statistical method it often occurs that some observations deviate from the usual model assumptions. Least-squares (LS) estimators are very sensitive to outliers. Even one single atypical value may have a large effect on the regression parameter estimates. The goal of robust regression is to develop methods that are resistant to the possibility that one or several unknown outliers may occur anywhere in the data. In this paper, we review various robust regression methods including: M-estimate, LMS estimate, LTS estimate, S-estimate, [tau]-estimate, MM-estimate, GM-estimate, and REWLS estimate. Finally, we compare these robust estimates based on their robustness and efficiency through a simulation study. A real data set application is also provided to compare the robust estimates with traditional least squares estimator.

APA, Harvard, Vancouver, ISO, and other styles

2

Hernandez, Erika Lyn. "Parameter Estimation in Linear-Linear Segmented Regression." Diss., CLICK HERE for online access, 2010. http://contentdm.lib.byu.edu/ETD/image/etd3551.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ollikainen, Kati. "PARAMETER ESTIMATION IN LINEAR REGRESSION." Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4138.

Full text

Abstract:

Today increasing amounts of data are available for analysis purposes and often times for resource allocation. One method for analysis is linear regression which utilizes the least squares estimation technique to estimate a model's parameters. This research investigated, from a user's perspective, the ability of linear regression to estimate the parameters' confidence intervals at the usual 95% level for medium sized data sets. A controlled environment using simulation with known data characteristics (clean data, bias and or multicollinearity present) was used to show underlying problems exist with confidence intervals not including the true parameter (even though the variable was selected). The Elder/Pregibon rule was used for variable selection. A comparison of the bootstrap Percentile and BCa confidence interval was made as well as an investigation of adjustments to the usual 95% confidence intervals based on the Bonferroni and Scheffe multiple comparison principles. The results show that linear regression has problems in capturing the true parameters in the confidence intervals for the sample sizes considered, the bootstrap intervals perform no better than linear regression, and the Scheffe method is too wide for any application considered. The Bonferroni adjustment is recommended for larger sample sizes and when the t-value for a selected variable is about 3.35 or higher. For smaller sample sizes all methods show problems with type II errors resulting from confidence intervals being too wide.
Ph.D.
Department of Industrial Engineering and Management Systems
Engineering and Computer Science
Industrial Engineering and Management Systems

APA, Harvard, Vancouver, ISO, and other styles

4

Chen, Xinyu. "Inference in Constrained Linear Regression." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/405.

Full text

Abstract:

Regression analyses constitutes an important part of the statistical inference and has great applications in many areas. In some applications, we strongly believe that the regression function changes monotonically with some or all of the predictor variables in a region of interest. Deriving analyses under such constraints will be an enormous task. In this work, the restricted prediction interval for the mean of the regression function is constructed when two predictors are present. I use a modified likelihood ratio test (LRT) to construct prediction intervals.

APA, Harvard, Vancouver, ISO, and other styles

5

Waterman, Megan Janet Tuttle. "Linear Mixed Model Robust Regression." Diss., Virginia Tech, 2002. http://hdl.handle.net/10919/27708.

Full text

Abstract:

Mixed models are powerful tools for the analysis of clustered data and many extensions of the classical linear mixed model with normally distributed response have been established. As with all parametric models, correctness of the assumed model is critical for the validity of the ensuing inference. Model robust regression techniques predict mean response as a convex combination of a parametric and a nonparametric model fit to the data. It is a semiparametric method by which incompletely or incorrectly specified parametric models can be improved through adding an appropriate amount of a nonparametric fit. We apply this idea of model robustness in the framework of the linear mixed model. The mixed model robust regression (MMRR) predictions we propose are convex combinations of predictions obtained from a standard normal-theory linear mixed model, which serves as the parametric model component, and a locally weighted maximum likelihood fit which serves as the nonparametric component. An application of this technique with real data is provided.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

6

Ratnasingam, Suthakaran. "Sequential Change-point Detection in Linear Regression and Linear Quantile Regression Models Under High Dimensionality." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu159050606401363.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Rettes, Julio Alberto Sibaja. "Robust algorithms for linear regression and locally linear embedding." reponame:Repositório Institucional da UFC, 2017. http://www.repositorio.ufc.br/handle/riufc/22445.

Full text

Abstract:

RETTES, Julio Alberto Sibaja. Robust algorithms for linear regression and locally linear embedding. 2017. 105 f. Dissertação (Mestrado em Ciência da Computação)- Universidade Federal do Ceará, Fortaleza, 2017.
Submitted by Weslayne Nunes de Sales (weslaynesales@ufc.br) on 2017-03-30T13:15:27Z No. of bitstreams: 1 2017_dis_rettesjas.pdf: 3569500 bytes, checksum: 46cedc2d9f96d0f58bcdfe3e0d975d78 (MD5)
Approved for entry into archive by Rocilda Sales (rocilda@ufc.br) on 2017-04-04T11:10:44Z (GMT) No. of bitstreams: 1 2017_dis_rettesjas.pdf: 3569500 bytes, checksum: 46cedc2d9f96d0f58bcdfe3e0d975d78 (MD5)
Made available in DSpace on 2017-04-04T11:10:44Z (GMT). No. of bitstreams: 1 2017_dis_rettesjas.pdf: 3569500 bytes, checksum: 46cedc2d9f96d0f58bcdfe3e0d975d78 (MD5) Previous issue date: 2017
Nowadays a very large quantity of data is flowing around our digital society. There is a growing interest in converting this large amount of data into valuable and useful information. Machine learning plays an essential role in the transformation of data into knowledge. However, the probability of outliers inside the data is too high to marginalize the importance of robust algorithms. To understand that, various models of outliers are studied. In this work, several robust estimators within the generalized linear model for regression framework are discussed and analyzed: namely, the M-Estimator, the S-Estimator, the MM-Estimator, the RANSAC and the Theil-Sen estimator. This choice is motivated by the necessity of examining algorithms with different working principles. In particular, the M-, S-, MM-Estimator are based on a modification of the least squares criterion, whereas the RANSAC is based on finding the smallest subset of points that guarantees a predefined model accuracy. The Theil Sen, on the other hand, uses the median of least square models to estimate. The performance of the estimators under a wide range of experimental conditions is compared and analyzed. In addition to the linear regression problem, the dimensionality reduction problem is considered. More specifically, the locally linear embedding, the principal component analysis and some robust approaches of them are treated. Motivated by giving some robustness to the LLE algorithm, the RALLE algorithm is proposed. Its main idea is to use different sizes of neighborhoods to construct the weights of the points; to achieve this, the RAPCA is executed in each set of neighbors and the risky points are discarded from the corresponding neighborhood. The performance of the LLE, the RLLE and the RALLE over some datasets is evaluated.
Na atualidade um grande volume de dados é produzido na nossa sociedade digital. Existe um crescente interesse em converter esses dados em informação útil e o aprendizado de máquinas tem um papel central nessa transformação de dados em conhecimento. Por outro lado, a probabilidade dos dados conterem outliers é muito alta para ignorar a importância dos algoritmos robustos. Para se familiarizar com isso, são estudados vários modelos de outliers. Neste trabalho, discutimos e analisamos vários estimadores robustos dentro do contexto dos modelos de regressão linear generalizados: são eles o M-Estimator, o S-Estimator, o MM-Estimator, o RANSAC e o Theil-Senestimator. A escolha dos estimadores é motivada pelo principio de explorar algoritmos com distintos conceitos de funcionamento. Em particular os estimadores M, S e MM são baseados na modificação do critério de minimização dos mínimos quadrados, enquanto que o RANSAC se fundamenta em achar o menor subconjunto que permita garantir uma acurácia predefinida ao modelo. Por outro lado o Theil-Sen usa a mediana de modelos obtidos usando mínimos quadradosno processo de estimação. O desempenho dos estimadores em uma ampla gama de condições experimentais é comparado e analisado. Além do problema de regressão linear, considera-se o problema de redução da dimensionalidade. Especificamente, são tratados o Locally Linear Embedding, o Principal ComponentAnalysis e outras abordagens robustas destes. É proposto um método denominado RALLE com a motivação de prover de robustez ao algoritmo de LLE. A ideia principal é usar vizinhanças de tamanhos variáveis para construir os pesos dos pontos; para fazer isto possível, o RAPCA é executado em cada grupo de vizinhos e os pontos sob risco são descartados da vizinhança correspondente. É feita uma avaliação do desempenho do LLE, do RLLE e do RALLE sobre algumas bases de dados.

APA, Harvard, Vancouver, ISO, and other styles

8

Peraça, Maria da Graça Teixeira. "Modelos para estimativa do grau de saturação do concreto mediante variáveis ambientais que influenciam na sua variação." reponame:Repositório Institucional da FURG, 2009. http://repositorio.furg.br/handle/1/3436.

Full text

Abstract:

Dissertação(mestrado) - Universidade Federal do Rio Grande, Programa de Pós-Graduação em Engenharia Oceânica, Escola de Engenharia, 2009.
Submitted by Lilian M. Silva (lilianmadeirasilva@hotmail.com) on 2013-04-22T19:51:54Z No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5)
Approved for entry into archive by Bruna Vieira(bruninha_vieira@ibest.com.br) on 2013-06-03T19:20:55Z (GMT) No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5)
Made available in DSpace on 2013-06-03T19:20:55Z (GMT). No. of bitstreams: 1 Modelos para estimativa do Grau de Saturação do concreto mediante Variáveis Ambientais que influenciam na sua variação.pdf: 2786682 bytes, checksum: df174dab02a19756db94fc47c6bb021d (MD5) Previous issue date: 2009
Nas engenharias, é fundamental estimar o tempo de vida útil das estruturas construídas, o que neste trabalho significa o tempo que os íons cloretos levam para atingirem a armadura do concreto. Um dos coeficientes que influenciam na vida útil do concreto é o de difusão, sendo este diretamente influenciado pelo grau de saturação (GS) do concreto. Recentes estudos levaram ao desenvolvimento de um método de medição do GS. Embora esse método seja eficiente, ainda assim há um grande desperdício de tempo e dinheiro em utilizá-lo. O objetivo deste trabalho é reduzir estes custos calculando uma boa aproximação para o valor do GS com modelos matemáticos que estimem o seu valor através de variáveis ambientais que influenciam na sua variação. As variáveis analisadas nesta pesquisa, são: pressão atmosférica,temperatura do ar seco, temperatura máxima, temperatura mínima, taxa de evaporação interna (Pichê), taxa de precipitação, umidade relativa, insolação, visibilidade, nebulosidade e taxa de evaporação externa. Todas foram analisadas e comparadas estatisticamente com medidas do GS obtidas durante quatro anos de medições semanais, para diferentes famílias de concreto. Com essas análises, pode-se medir a relação entre estes dados verificando que os fatores mais influentes no GS são, temperatura máxima e umidade relativa. Após a verificação desse resultado, foram elaborados modelos estatísticos, para que, através dos dados ambientais, cedidos pelo banco de dados meteorológicos, se possam calcular, sem desperdício de tempo e dinheiro, as médias aproximadas do GS para cada estação sazonal da região sul do Brasil, garantindo assim uma melhor estimativa do tempo de vida útil em estruturas de concreto.
In engineering, it is fundamental to estimate the life-cycle of built structures, which in this study means the period of time required for chlorides to reach the concrete reinforcement. One of the coefficients that affect the life-cycle of concrete is the diffusion, which is directly influenced by the saturation degree (SD) of concrete. Recent studies have led to the development of a measurement method for the SD. Although this method is efficient, there is still waste of time and money when it is used. The objective of this study is to reduce costs by calculating a good approximation for the SD value with mathematical models that predict its value through environmental variables that affect its variation. The variables analysed in the study are: atmospheric pressure, temperature of the dry air, maximum temperature, minimum temperature, internal evaporation rate (Pichê), precipitation rate, relative humidity, insolation, visibility, cloudiness and external evaporation rate. All of them were statistically analysed and compared with measurements of SD obtained during four years of weekly assessments for different families of concrete. By considering these analyses, the relationship among these data can be measured and it can be verified that the most influent variables affecting the SD are the maximum temperature and the relative humidity. After verifying this result, statistical models were developed aiming to calculate, based on the environmental data provided by the meteorological database and without waste of time and money, the approximate averages of SD for each seasonal station of the south region of Brazil, thus providing a better estimative of life-cycle for concrete structures.

APA, Harvard, Vancouver, ISO, and other styles

9

Bocci, Cynthia Jacqueline. "Linear regression with spatially correlated data." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape10/PQDD_0012/NQ52271.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Mahmood, Nozad. "Sparse Ridge Fusion For Linear Regression." Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5986.

Full text

Abstract:

For a linear regression, the traditional technique deals with a case where the number of observations n more than the number of predictor variables p (n>p). In the case nM.S.
Masters
Statistics
Sciences
Statistical Computing

APA, Harvard, Vancouver, ISO, and other styles

11

Cao, Chendi. "Linear regression with Laplace measurement error." Kansas State University, 2016. http://hdl.handle.net/2097/32719.

Full text

Abstract:

Master of Science
Statistics
Weixing Song
In this report, an improved estimation procedure for the regression parameter in simple linear regression models with the Laplace measurement error is proposed. The estimation procedure is made feasible by a Tweedie type equality established for E(X|Z), where Z = X + U, X and U are independent, and U follows a Laplace distribution. When the density function of X is unknown, a kernel estimator for E(X|Z) is constructed in the estimation procedure. A leave-one-out cross validation bandwidth selection method is designed. The finite sample performance of the proposed estimation procedure is evaluated by simulation studies. Comparison study is also conducted to show the superiority of the proposed estimation procedure over some existing estimation methods.

APA, Harvard, Vancouver, ISO, and other styles

12

GuÌˆnduÌˆz, Necla. "D-optimal designs for weighted linear regression and binary regression models." Thesis, University of Glasgow, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.301629.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Rodrigues, Cátia Sofia Martins. "Quais os fatores que determinam o rendimento dos indivíduos em Portugal? - Regressão de Quantis." Master's thesis, Instituto Superior de Economia e Gestão, 2021. http://hdl.handle.net/10400.5/23425.

Full text

Abstract:

Mestrado Bolonha em Métodos Quantitativos para a Decisão Económica e Empresarial
Apesar de se ter vindo a verificar, ao longo dos anos, um decréscimo significativo na desigualdade entre rendimentos, este tema ainda é alvo de estudo, principalmente numa abordagem econométrica, onde o principal objetivo passa por identificar e perceber os principais fatores que estão por detrás das desigualdades sentidas. Desta forma, o presente projeto destina-se ao estudo dos fatores que determinam o rendimento dos indivíduos residentes em Portugal, adotando uma abordagem de regressão de quantis, uma vez que grupos de indivíduos com diferentes valores de rendimento podem ter comportamentos distintos. Para tal, foram utilizados dados provenientes do Instituto Nacional de Estatística (INE) que permitiram construir o modelo estimado. A variável em estudo é o rendimento anual dos residentes em Portugal, no ano de 2019, e o modelo conta com oito regressores que caracterizam não só o indivíduo, incluindo, nomeadamente, a sua idade, sexo ou estado civil, mas também a sua instituição empregadora, incluindo variáveis como a dimensão, número de horas de trabalho, entre outras. Com o desenvolvimento do projeto e tendo em conta a análise aos resultados da estimação, é possível concluir que existem fatores, nomeadamente o género, nível de educação e região onde o indivíduo reside, responsáveis pela diferença significativa no valor do rendimento anual dos residentes em Portugal. No entanto, esta diferença não é uniforme para todos os grupos de indivíduos e comporta-se de maneira diferente quando comparados grupos de indivíduos com rendimentos mais baixos, médios ou altos. Este comportamento não linear permitiu ainda compreender a vantagem da utilização do método de regressão de quantis face ao método econométrico mais comum, a regressão linear, cujo objetivo é estimar o efeito das diferentes variáveis explicativas nos valores médios da variável dependente. A base de dados utilizada foi construída utilizando o software SQL Developer e a análise foi conduzida com recurso ao Stata.
Despite the fact that, over the years, there has been a significant decrease in income inequality, this issue is still a subject under study, mainly in an econometric approach, with the aim of studying and understanding the factors behind those inequalities. The main focus of this project is to identify and study the factors that determine the income of individuals living in Portugal, adopting a quantile regression approach, since individuals with different wages may have different behaviors. For this purpose, a regression model was created, using data from Statistics Portugal. The variable under study is the annual income of residents in Portugal, in 2019, and the model has several regressors that not only characterize the individual, such as their age, sex or marital status, but also the company, such as their dimension and number of working hours. With the development of this project and taking into account the estimation results, it is possible to conclude that there are factors, namely the individual's gender, level of education and region where he lives, responsible for the significant difference in the value of the annual income of residents in Portugal. However, these differences are not uniform for all groups of individuals, since there is a different behavior when comparing groups of individuals with lower, medium or high income. This nonlinear behavior also allowed to understand the advantage of using quantile regression over the most common econometric method, linear regression, whose objective is to estimate the effect of different explanatory variables on the average values of the dependent variable. The database used was built using SQL Developer and the analysis was conducted with software Stata.
info:eu-repo/semantics/publishedVersion

APA, Harvard, Vancouver, ISO, and other styles

14

Edlund, Ove. "Solution of linear programming and non-linear regression problems using linear M-estimation methods /." Luleå, 1999. http://epubl.luth.se/1402-1544/1999/17/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Bullas, J. M. David. "K-nearest neighbours with weighted linear regression." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ34340.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Hamzah, Nor Aishah. "Robust regression estimation in generalized linear models." Thesis, University of Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294372.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Ah-Kine, Pascal Soon Shien. "Simultaneous confidence bands in linear regression analysis." Thesis, University of Southampton, 2010. https://eprints.soton.ac.uk/167557/.

Full text

Abstract:

A simultaneous confidence band provides useful information on the plausible range of an unknown regression model. For a simple linear regression model, the most frequently quoted bands in the statistical literature include the two-segment band, the three-segment band and the hyperbolic band, and for a multiple linear regression model, the most com- mon bands in the statistical literature include the hyperbolic band and the constant width band. The optimality criteria for confidence bands include the Average Width criterion considered by Gafarian (1964) and Naiman (1984) among others, and the Minimum Area Confidence Set (MACS) criterion of Liu and Hayter (2007). A concise review of the construction of two-sided simultaneous confidence bands in simple and multiple linear re- gressions and their comparison under the two mentioned optimality criteria is provided in the thesis. Two families of confidence bands, the inner-hyperbolic bands and the outerhyperbolic bands, which include the hyperbolic and three-segment bands as special cases, are introduced for a simple linear regression. Under the MACS criterion, the best con- fidence band within each family is found by numerical search and compared with the hyperbolic band, the best three-segment band and with each other. The inner-hyperbolic family of confidence bands, which include the hyperbolic and constant-width bands as special cases, is also constructed for a multiple linear regression model over an ellipsoidal covariate region and the best band within the family is found by numerical search. For a multiple linear regression model over a rectangular covariate region (i.e. the predictor variables are constrained in intervals), no method of constructing exact simultaneous con- fidence bands has been published so far. A method to construct exact two-sided hyperbolic and constant width bands over a rectangular covariate region and compare between them is provided in this thesis when there are up to three predictor variables. A simulation method similar to the ones used by Liu et al. (2005a) and Liu et al. (2005b) is also provided for the calculation of the average width and the minimum volume of confidence set when there are more than three predictor variables. The methods used in this thesis are illustrated with numerical examples and the Matlab programs used are available upon request.

APA, Harvard, Vancouver, ISO, and other styles

18

Essomba, Rene Franck. "An investigation into Functional Linear Regression Modeling." Master's thesis, University of Cape Town, 2015. http://hdl.handle.net/11427/15591.

Full text

Abstract:

Functional data analysis, commonly known as FDA", refers to the analysis of information on curves of functions. Key aspects of FDA include the choice of smoothing techniques, data reduction, model evaluation, functional linear modeling and forecasting methods. FDA is applicable in numerous applications such as Bioscience, Geology, Psychology, Sports Science, Econometrics, Meteorology, etc. This dissertation main objective is to focus more specifically on Functional Linear Regression Modelling (FLRM), which is an extension of Multivariate Linear Regression Modeling. The problem of constructing a Functional Linear Regression modelling with functional predictors and functional response variable is considered in great details. Discretely observed data for each variable involved in the modelling are expressed as smooth functions using: Fourier Basis, B-Splines Basis and Gaussian Basis. The Functional Linear Regression Model is estimated by the Least Square method, Maximum Likelihood method and more thoroughly by Penalized Maximum Likelihood method. A central issue when modelling Functional Regression models is the choice of a suitable model criterion as well as the number of basis functions and an appropriate smoothing parameter. Four different types of model criteria are reviewed: the Generalized Cross-Validation, the Generalized Information Criterion, the modified Akaike Information Criterion and Generalized Bayesian Information Criterion. Each of these aforementioned methods are applied to a dataset and contrasted based on their respective results.

APA, Harvard, Vancouver, ISO, and other styles

19

Gormley, Nolan D. "Knotilus: A Differentiable Piecewise Linear Regression Framework." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1617222994436272.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Khogasteh, Sam, and Edvin Wiorek. "Predicting Influencer Actual Reach Using Linear Regression." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299339.

Full text

Abstract:

The influencer marketing industry has seen a tremendous growth in recent years, yet the effectiveness of this marketing form is still largely unexplored. This report aims to explore how various performance measures are linked to the reach of social media pages, utilizing the linear regression model. Three different data sets were collected manually, or using web scraping. By splitting these data sets to training- and test data we examined the degree to which the linear regression model can predict the actual reach, the page views and the weekly growth of an influencer. We concluded that there is a statistically significant correlation between multiple performance metrics of a social media page and the actual reach or the page views of that account. This study is however limited by its narrow data set and time frame, warranting future research in order to further establish the degree of this correlation. The results of this study can benefit companies in their process of selecting influencers to collaborate with, as well as determining the expected return on investment for that particular collaboration. This can in turn lead to a more efficient, authentic and transparent marketplace, and to consumers being less exposed to advertisement from misleading and malicious influencers.
Under de senaste åren har marknadsföringsindustrin med influencers växt drastiskt, ändå är effektiviteten hos denna marknadsföringsform relativt outforskad. Denna rapport avser använda linjär regression för att utforska hur olika prestationsmått är kopplade till räckvidden hos profiler på sociala medier. De olika datamängderna samlades manuellt, eller med hjälp av web scraping. Genom att dela upp datamängderna i träningsdata och testdata undersökte vi i hur hög grad den linjära regressionsmodellen kan förutsäga faktisk räckvidd, sidvisningar och profilens tillväxt under en vecka. Vi drog slutsatsen att det finns en statistisk signifikant korrelation mellan flera prestationsmått för en profilsida, och antalet sidvisningar for det kontot. Studien är emellertid begränsad av sin datamängd och tidsspann, något som motiverar framtida studier for att ytterligare etablera korrelationsgraden. Studiens resultat kan gynna företag i deras process att välja vilka influencers de vill samarbeta med, såväl som i deras process att bestämma den förväntade avkastningen för ett specifikt samarbete. Detta kan i sin tur bidra till en mer effektiv, autentisk och transparent marknad, något som också gör att konsumenten ¨ blir mindre exponerad for marknadsföring från vilseledande och illvilliga influencers.

APA, Harvard, Vancouver, ISO, and other styles

21

Mirzayeva, Hijran. "Nonsmooth optimization algorithms for clusterwise linear regression." Thesis, University of Ballarat, 2013. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/41975.

Full text

Abstract:

Data mining is about solving problems by analyzing data that present in databases. Supervised and unsupervised data classification (clustering) are among the most important techniques in data mining. Regression analysis is the process of fitting a function (often linear) to the data to discover how one or more variables vary as a function of another. The aim of clusterwise regression is to combine both of these techniques, to discover trends within data, when more than one trend is likely to exist. Clusterwise regression has applications for instance in market segmentation, where it allows one to gather information on customer behaviors for several unknown groups of customers. There exist different methods for solving clusterwise linear regression problems. In spite of that, the development of efficient algorithms for solving clusterwise linear regression problems is still an important research topic. In this thesis our aim is to develop new algorithms for solving clusterwise linear regression problems in large data sets based on incremental and nonsmooth optimization approaches. Three new methods for solving clusterwise linear regression problems are developed and numerically tested on publicly available data sets for regression analysis. The first method is a new algorithm for solving the clusterwise linear regression problems based on their nonsmooth nonconvex formulation. This is an incremental algorithm. The second method is a nonsmooth optimization algorithm for solving clusterwise linear regression problems. Nonsmooth optimization techniques are proposed to use instead of the Sp¨ath algorithm to solve optimization problems at each iteration of the incremental algorithm. The discrete gradient method is used to solve nonsmooth optimization problems at each iteration of the incremental algorithm. This approach allows one to reduce the CPU time and the number of regression problems solved in comparison with the first incremental algorithm. The third algorithm is an algorithm based on an incremental approach and on the smoothing techniques for solving clusterwise linear regression problems. The use of smoothing techniques allows one to apply powerful methods of smooth nonlinear programming to solve clusterwise linear regression problems. Numerical results are presented for all three algorithms using small to large data sets. The new algorithms are also compared with multi-start Sp¨ath algorithm for clusterwise linear regression.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

22

Smith, David McCulloch. "Regression using QR decomposition methods." Thesis, University of Kent, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.303532.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Möls, Märt. "Linear mixed models with equivalent predictors /." Online version, 2004. http://dspace.utlib.ee/dspace/bitstream/10062/1339/5/Mols.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Forslund, Gustaf, and David Åkesson. "Predicting share price by using Multiple Linear Regression." Thesis, KTH, Farkost och flyg, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-140645.

Full text

Abstract:

The aim of the project was to design a multiple linear regression model and use it to predict the share’s closing price for 44 companies listed on the OMX Stockholm stock exchange’s Large Cap list. The model is intended to be used as a day trading guideline i.e. today’s information is used to predict tomorrow’s closing price. The regression was done in Microsoft Excel 2010[18] by using its built-in function LINEST. The LINEST-function uses the dependent variable y and all the covariates x to calculate the β-value belonging to each covariate. Several multiple linear regression models were created and their functionality was tested, but only seven models were better than chance i.e. more than 50 % in the right direction. To determine the most suitable model out of the remaining seven, Akaike’s Information Criterion (AIC), was applied. The covariates used in the final model were; Dow Jones closing price, Shanghai opening price, conjuncture, oil price, share’s opening price, share’s highest price, share’s lowest price, lending rate, reports, positive/negative insider trading, payday, positive/negative price target, number of completed transactions during one day, OMX Stockholm closing price, TCW index, increasing closing price three days in a row and decreasing closing price three days in a row. The maximum average deviation between the predicted closing price and the real closing price of all the 44 shares predicted were 6,60 %. In predicting the correct direction (increase or decrease) of the 44 shares an average of 61,72 % were achieved during the time period 2012-02-22 to 2013-02-20. If investing 50.000 SEK in each company i.e. a total investment of 2.2 million SEK, the total yield when using the regression model during the year 2012-02-22 to 2013-02-20 would have been 259.639 SEK (11,80 %) compared to 184.171 SEK (8,37 %) if the shares were never to be traded with during the same period of time. Of the 44 companies analysed, 31 (70,45 %) of them were profitable when using the regression model during the year compared to 30 (68,18 %) if the shares were never to be sold during the same period of time. The difference in yield in percentage between the model and keeping the shares for the year was 40,98 %.

APA, Harvard, Vancouver, ISO, and other styles

25

Aldahmani, Saeed. "High-dimensional linear regression problems via graphical models." Thesis, University of Essex, 2017. http://repository.essex.ac.uk/19207/.

Full text

Abstract:

This thesis introduces a new method for solving the linear regression problem where the number of observations n is smaller than the number of variables (predictors) v. In contrast to existing methods such as ridge regression, Lasso and Lars, the proposed method uses the idea of graphical models and provides unbiased parameter estimates under certain conditions. In addition, the new method provides a detailed graphical conditional correlation structure for the predictors, whereby the real causal relationship between predictors can be identified. Furthermore, the proposed method is extended to form a hybridisation with the idea of ridge regression to improve efficiency in terms of computation and model selection. In the extended method, less important variables are regularised by a ridge type penalty, and a search for models in the space is made for important covariates. This significantly reduces computational cost while giving unbiased estimates for the important variables as well as increasing the efficiency of model selection. Moreover, the extended method is used in dealing with the issue of portfolio selection within the Markowitz mean-variance framework, with n < v. Various simulations and real data analyses were conducted for comparison between the two novel methods and the aforementioned existing methods. Our experiments indicate that the new methods outperform all the other methods when n

APA, Harvard, Vancouver, ISO, and other styles

26

Saleem, Aban, and Jacob Blomgren. "Modelling Pupils’ Grades with Multiple Linear Regression Model." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-275672.

Full text

Abstract:

This thesis was based on the subjects of mathematical statistics and industrial economics and management in order to analyze the grades of pupils in the final year of elementary school. The purpose was to find out what variables had a statistically significant impact on pupils’ final grades so that municipalities and schools could better understand what variables are important when trying to improve the average school results. A multiple regression model was used on data, obtained from the database of Skolverket, in order to examine what variables were statistically important. The final regression model acquired through a model reduction procedure showed that mostly structural covariates such as the academic background of pupils, percentage of female pupils and the percentage with Swedish background had a statistically significant impact on the academic performances of the students. R2 adjusted of the final model was 0.5289. The multiple regression model was discussed by referencing to previous research. In addition, the strategic management performance framework known as Balanced Scorecard which was introduced by Robert S. Kaplan and David P. Norton was used to discuss relevant key performance indicators to achieve the strategic objectives of schools.
Detta examensarbete, inom ämnet för matematisk statistik och industriell ekonomi, genomfördes med syftet att analysera avgångsbetygen för år 9 i den svenska skolan. Syftet var att förstå vilka variabler som hade en statistisk signifikant påverkan på elevers avgångsbetyg, så kommuner kan förstå vilka variabler som är viktiga för att förbättra de genomsnittliga skolresultaten. En regressionsanalys utfördes, på data från Skolverket, för att se vilka variabler som var statistiskt signifikanta. Den slutgiltiga regressionsmodellen, erhållen genom iterativ reducering av variabler, visade att främst strukturella kovariat, som akademisk bakgrund hos elever, andel kvinnliga studenter och andel studenter med svensk bakgrund hade en signifikant betydelse på studenters akademiska resultat. Justerad R2 var 0.5289 för den slutgiltiga modellen. I diskussionen utvärderades modellen utifrån tidigare forskning. Vidare användes teorin om balanserat styrkort, utvecklat av Robert S. Kaplan och David P. Norton, för att diskutera relevanta nyckeltal för att uppnå strategiska mål för skolan.

APA, Harvard, Vancouver, ISO, and other styles

27

Brodbeck, William Joseph. "The Effect of Readability on Simple Linear Regression." Bowling Green State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1591867761661656.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Bunea, Florentina. "A model selection approach to partially linear regression /." Thesis, Connect to this title online; UW restricted, 2000. http://hdl.handle.net/1773/8971.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Mahmood, Arshad. "Rainfall prediction in Australia : Clusterwise linear regression approach." Thesis, Federation University Australia, 2017. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/159251.

Full text

Abstract:

Accurate rainfall prediction is a challenging task because of the complex physical processes involved. This complexity is compounded in Australia as the climate can be highly variable. Accurate rainfall prediction is immensely benecial for making informed policy, planning and management decisions, and can assist with the most sustainable operation of water resource systems. Short-term prediction of rainfall is provided by meteorological services; however, the intermediate to long-term prediction of rainfall remains challenging and contains much uncertainty. Many prediction approaches have been proposed in the literature, including statistical and computational intelligence approaches. However, finding a method to model the complex physical process of rainfall, especially in Australia where the climate is highly variable, is still a major challenge. The aims of this study are to: (a) develop an optimization based clusterwise linear regression method, (b) develop new prediction methods based on clusterwise linear regression, (c) assess the influence of geographic regions on the performance of prediction models in predicting monthly and weekly rainfall in Australia, (d) determine the combined influence of meteorological variables on rainfall prediction in Australia, and (e) carry out a comparative analysis of new and existing prediction techniques using Australian rainfall data. In this study, rainfall data with five input meteorological variables from 24 geographically diverse weather stations in Australia, over the period January 1970 to December 2014, have been taken from the Scientific Information for Land Owners (SILO). We also consider the climate zones when selecting weather stations, because Australia experiences a variety of climates due to its size. The data was divided into training and testing periods for evaluation purposes. In this study, optimization based clusterwise linear regression is modified and new prediction methods are developed for rainfall prediction. The proposed method is applied to predict monthly and weekly rainfall. The prediction performance of the clusterwise linear regression method was evaluated by comparing observed and predicted rainfall values using the performance measures: root mean squared error, the mean absolute error, the mean absolute scaled error and the Nash-Sutclie coefficient of efficiency. The proposed method is also compared with the clusterwise linear regression based on the maximum likelihood estimation, linear support vector machines for regression, support vector machines for regression with radial basis kernel function, multiple linear regression, artificial neural networks with and without hidden layer and k-nearest neighbours methods using computational results. Initially, to determine the appropriate input variables to be used in the investigation, we assessed all combinations of meteorological variables. The results confirm that single meteorological variables alone are unable to predict rainfall accurately. The prediction performance of all selected models was improved by adding the input variables in most locations. To assess the influence of geographic regions on the performance of prediction models and to compare the prediction performance of models, we trained models with the best combination of input variables and predicted monthly and weekly rainfall over the test periods. The results of this analysis confirm that the prediction performance of all selected models varied considerably with geographic regions for both weekly and monthly rainfall predictions. It is found that models have the lowest prediction error in the desert climate zone and highest in subtropical and tropical zones. The results also demonstrate that the proposed algorithm is capable of finding the patterns and trends of the observations for monthly and weekly rainfall predictions in all geographic regions. In desert, tropical and subtropical climate zones, the proposed method outperform other methods in most locations for both monthly and weekly rainfall predictions. In temperate and grassland zones the prediction performance of the proposed model is better in some locations while in the remaining locations it is slightly lower than the other models.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

30

Sardy, Sylvain. "A Comparison of Two Linear Nonparametric Regression Techniques." DigitalCommons@USU, 1992. https://digitalcommons.usu.edu/etd/7123.

Full text

Abstract:

This thesis presented a useful tool in regression. Nonparametric linear regression techniques were described in the general context of regression. A comparison of two of these techniques, kernel regression and iterative regression, showed various aspects of nonparametric linear regressors.

APA, Harvard, Vancouver, ISO, and other styles

31

Nunes, Hélio Rubens de Carvalho. "Ponderação Bayesiana de modelos em regressão linear clássica." Universidade de São Paulo, 2005. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-16112005-155133/.

Full text

Abstract:

Este trabalho tem o objetivo de divulgar a metodologia de ponderação de modelos ou Bayesian Model Averaging (BMA) entre os pesquisadores da área agronômica e discutir suas vantagens e limitações. Com o BMA é possível combinar resultados de diferentes modelos acerca de determinada quantidade de interesse, com isso, o BMA apresenta-se como sendo uma metodologia alternativa de análise de dados frente os usuais métodos de seleção de modelos tais como o Coeficiente de Determinação Múltipla (R2 ), Coeficiente de Determinação Múltipla Ajustado (R2), Estatística de Mallows ( Cp) e Soma de Quadrados de Predição (PRESS). Vários trabalhos foram, recentemente, realizados com o objetivo de comparar o desempenho do BMA em relação aos métodos de seleção de modelos, porém, há ainda muitas situações para serem exploradas até que se possa chegar a uma conclusão geral acerca desta metodologia. Neste trabalho, o BMA foi aplicado a um conjunto de dados proveniente de um experimento agronômico. A seguir, o desempenho preditivo do BMA foi comparado com o desempenho dos métodos de seleção acima citados por meio de um estudo de simulação variando o grau de multicolinearidade e o tamanho amostral. Em cada uma dessas situações, foram utilizadas 1000 amostras geradas a partir de medidas descritivas de conjuntos de dados reais da área agronômica. O desempenho preditivo das metodologias em comparação foi medido pelo Logaritmo do Escore Preditivo (LEP). Os resultados empíricos obtidos indicaram que o BMA apresenta desempenho semelhante aos métodos usuais de seleção de modelos nas situações de multicolinearidade exploradas neste trabalho.
The objective of this work was divulge to Bayesian Model Averaging (BMA) between the researchers of the agronomy area and discuss its advantages and limitations. With the BMA is possible combine results of difeerent models about determined quantity of interest, with that, the BMA presents as being a metodology alternative of data analysis front the usual models selection approaches, for example the Coefficient of Multiple Determination (R2), Coefficient of Multiple Determination Adjusted (R2), Mallows (Cp Statistics) and Prediction Error Sum Squares (PRESS). Several works recently were carried out with the objective of compare the performance of the BMA regarding the approaches of models selection, however, there is still many situations for will be exploited to that can arrive to a general conclusion about this metodology. In this work, the BMA was applied to data originating from an agronomy experiment. It follow, the predictive performance of the BMA was compared with the performance of the approaches of selection above cited by means of a study of simulation varying the degree of multicollinearity, measured by the number of condition of the matrix standardized X'X and the number of observations in the sample. In each one of those situations, were utilized 1000 samples generated from the descriptive information of agronomy data. The predictive performance of the metodologies in comparison was measured by the Logarithm of the Score Predictive (LEP). The empirical results obtained indicated that the BMA presents similar performance to the usual approaches of selection of models in the situations of multicollinearity exploited.

APA, Harvard, Vancouver, ISO, and other styles

32

Bowtell, Philip. "Non-linear functional relationships." Thesis, University of Reading, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.284183.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Lawrence, David E. "Cluster-Based Bounded Influence Regression." Diss., Virginia Tech, 2003. http://hdl.handle.net/10919/28455.

Full text

Abstract:

In the field of linear regression analysis, a single outlier can dramatically influence ordinary least squares estimation while low-breakdown procedures such as M regression and bounded influence regression may be unable to combat a small percentage of outliers. A high-breakdown procedure such as least trimmed squares (LTS) regression can accommodate up to 50% of the data (in the limit) being outlying with respect to the general trend. Two available one-step improvement procedures based on LTS are Mallows 1-step (M1S) regression and Schweppe 1-step (S1S) regression (the current state-of-the-art method). Issues with these methods include (1) computational approximations and sub-sampling variability, (2) dramatic coefficient sensitivity with respect to very slight differences in initial values, (3) internal instability when determining the general trend and (4) performance in low-breakdown scenarios. A new high-breakdown regression procedure is introduced that addresses these issues, plus offers an insightful summary regarding the presence and structure of multivariate outliers. This proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or CBI. Representing the data space via a special set of anchor points, a collection of point-addition OLS regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster "halfset" of observations, with the remaining observations becoming one or more minor clusters. An initial regression estimator arises from the main cluster, with a multiple point addition DFFITS argument used to carefully activate the minor clusters through a bounded influence regression framework. CBI achieves a 50% breakdown point, is regression equivariant, scale equivariant and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo studies demonstrate the performance advantage of CBI over S1S and the other high breakdown methods regarding coefficient stability, scale estimation and standard errors. A dendrogram of the clustering process is one graphical display available for multivariate outlier detection. Overall, the proposed methodology represents advancement in the field of robust regression, offering a distinct philosophical viewpoint towards data analysis and the marriage of estimation with diagnostic summary.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

34

Taga, Marcel Frederico de Lima. "Regressão linear com medidas censuradas." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/45/45133/tde-05122008-005901/.

Full text

Abstract:

Consideramos um modelo de regressão linear simples, em que tanto a variável resposta como a independente estão sujeitas a censura intervalar. Como motivação utilizamos um estudo em que o objetivo é avaliar a possibilidade de previsão dos resultados de um exame audiológico comportamental a partir dos resultados de um exame audiológico eletrofisiológico. Calculamos intervalos de previsão para a variável resposta, analisamos o comportamento dos estimadores de máxima verossimilhança obtidos sob o modelo proposto e comparamos seu desempenho com aquele de estimadores obtidos de um modelo de regressão linear simples usual, no qual a censura dos dados é desconsiderada.
We consider a simple linear regression model in which both variables are interval censored. To motivate the problem we use data from an audiometric study designed to evaluate the possibility of prediction of behavioral thresholds from physiological thresholds. We develop prediction intervals for the response variable, obtain the maximum likelihood estimators of the proposed model and compare their performance with that of estimators obtained under ordinary linear regression models.

APA, Harvard, Vancouver, ISO, and other styles

35

Januario, Ana Paula Ferrari. "Análise estatística da produção de vitelão Mertolengo." Master's thesis, Universidade de Évora, 2021. http://hdl.handle.net/10174/29316.

Full text

Abstract:

The work was intended to support the Association of mertolenga cattle breed in its breeding process and decision making, namely in modeling the cost per day of production of the male mertolenga cattle, and in identifying the variables that favor the sale of the animal as a product with a protected designation of origin (PDO) seal. The database contained information on 716 male animals, of which 54 % went to the slaughter that guarantees the PDO seal. We also had data on the cost structure production of the animals from when it enters into the CTR to slaughter, in addition to the individual characteristics of each animal, in particular, of its estimated breeding value. To obtain the cost-per-day production model, multiple linear regression models and other generalized linear models were used. For the classi cation of the animal as a PDO slaughter destination, a logistic regression model was used. When we comparing the generalized linear models tested, the multiple linear regression model was con rmed as the best technique to explain the cost per day of production. For this model, it was found that information such as weight at entry as well as di erent estimated breeding value positively in uence the cost of production. With regard to logistic regression, weight at entry, age at entry and genetic values referring to maternal capacity and calving interval are factors that enhance the animal being sold under the PDO seal; Sumário: Com o trabalho desenvolvido nesta dissertação, pretendeu-se apoiar a Associação de produtores de bovinos da raça mertolenga no seu processo de recria e nas tomadas de decisão, nomeadamente na modelação do custo por dia de produção de bovinos machos da raça mertolenga, e na identificação das variáveis que favorecem a venda do animal como um produto com selo de denominação de origem protegida (DOP). A base de dados continha a informação de 716 animais machos, dos quais 54% foram para o abate que garante o selo DOP, dados referentes _a estrutura de custo de produção dos animais desde a entrada no CTR até o abate, além das características individuais de cada animal, em particular, dos seus valores genéticos. Para obter o modelo do custo por dia de produção, utilizou-se modelos de regressão linear múltipla e outros modelos lineares generalizados. Para a classificação do animal por destino de abate DOP, utilizou-se um modelo de regressão logística. Quando se comparou os diferentes modelos lineares generalizados testados, confirmou-se o modelo de regressão linear multipla como o mais adequado para explicar o custo por dia de produção. Para este modelo, verificou-se que informações como o peso à entrada bem como diferentes valores genéticos infuenciam de forma positiva o custo de produção. No que diz respeito a regressão logística, o peso à entrada, a idade à entrada e os valores genéticos referentes à capacidade maternal e intervalo entre partos são fatores potenciadores do animal ser vendido

APA, Harvard, Vancouver, ISO, and other styles

36

Rodriguez, Mary Ana Petersen. "Parâmetros genéticos e fenotípicos do perfil de ácidos graxos do leite de vacas da raça holandesa." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/11/11139/tde-30102013-110828/.

Full text

Abstract:

Durante as últimas décadas, o melhoramento genético em bovinos leiteiros no Brasil baseou-se somente na importação de material genético, resultando em ganhos genéticos de pequena magnitude para as características de interesse econômico. Dessa forma, existe a necessidade eminente de avaliações genéticas dos animais sob condições nacionais de ambiente, de maneira a se prover um aumento na produção de leite aliado à qualidade. Neste contexto, o conhecimento sobre a composição do leite é de extrema importância para o entendimento de como alguns fatores ambientais e, principalmente genéticos podem influenciar no aumento dos conteúdos de proteína (PROT), gordura (GOR) e ácidos graxos (AG) benéficos e na redução da contagem de células somáticas, visando a melhoria da qualidade nutricional deste produto. Diante disso, o objetivo desse trabalho foi predizer os teores de AG de interesse usando regressão linear bayesiana, bem como estimar componentes de variância, coeficientes de herdabilidade e comparar modelos de diferentes ordens de ajuste por meio de funções polinomiais de Legendre, sob modelos de regressão aleatória. Amostras de leite foram submetidas a análises de cromatografia gasosa e espectrometria em infravermelho médio para determinação dos ácidos graxos. A comparação dos resultados obtidos por ambos os métodos foi realizada por meio da correlação de Pearson, análise de Bland-Altman e regressão linear bayesiana e, posteriormente, equações de predição foram desenvolvidas para os ácidos graxos mirístico (C14:0) e linoléico conjugado (CLA), a partir de regressões lineares simples e múltipla bayesiana considerando-se prioris nãoinformativas e informativas. Polinômios ortogonais de Legendre de 1ª a 6ª ordens foram utilizados para o ajuste das regressões aleatórias das características. A predição dos AG por meio da aplicação da regressão linear foi viável, com erros de predição variando entre 0,01 e 4,84g por 100g de gordura para o C14:0 e 0,002 e 1,85 por 100g de gordura para o CLA, sendo neste caso os menores erros de predição obtidos quando adotada a regressão múltipla com priori não informativa. Os modelos que melhor se ajustaram para GOR, PROT, C16:0, C18:0, C18:1c9, CLA, saturados (SAT), insaturados (INSAT), monoinsaturados (MONO) e poliinsaturados (POLI) foi o de 1ª ordem, e para escore de célula somática (ESC) e C14:0 o de 2ª ordem. As estimativas de herdabilidade obtidas variaram de 0,08 a 0,11 para GOR; 0,28 a 0,35 para PROT; 0,03 a 0,22 para ECS; 0,12 a 0,31 para C16:0; 0,08 a 0,14 para C18:0; 0,24 a 0,43 para C14:0; 0,07 a 0,17 para C18:1c9; 0,13 a 0,39 para CLA; 0,14 a 0,31 para SAT; 0,04 a 0,14 para INSAT; 0,04 a 0,13 para MONO; 0,09 a 0,20 para POLI e 0,12 para PROD, nos modelos que melhor se ajustaram. Concluise que melhorias na qualidade nutricional do leite podem ser obtidas por meio da inclusão das características produtivas e do perfil de ácidos graxos em programas de seleção genética.
During the last decades, genetic improvement in dairy cattle in Brazil was based only on the importation of genetic material, resulting in small genetic gains for economic interest traits. There is a perceived need for genetic evaluation under national environment conditions to provide an increase in milk production allied to quality. In this context, the knowledge of the milk composition is very important for understanding how certain environmental factors and especially genetic factors may influence the increase in protein content (PROT), fat (FAT), beneficial fatty acids (FA) and in reducing somatic cell count, aiming to improve the nutritional quality of this product. The aim of this study was to predict the levels of interest FA using Bayesian linear regression and estimate the components of variance, coefficients of heritability and compare models with different orders of adjustment by Legendre polynomials functions, in random regression models. Milk samples were subjected to gas chromatography analysis and mid-infrared spectrometry for the determination of fatty acids. The comparison of the results obtained by both methods was performed using Pearson\'s correlation, Bland-Altman analysis and Bayesian linear regression, subsequently, prediction equations were developed for the fatty acids myristic (C14:0) and conjugated linoleic (CLA) from simple linear regressions and multiple Bayesian considering non-informative and informative priors. Legendre orthogonal polynomials from 1st to 6th orders were used to fit the random regression of the traits. That was viable the prediction of FA by applying the linear regression with prediction errors ranging from 0.01 to 4.84 g per 100 g of fat for C14:0 and 0.002 to 1.85 per 100 g of fat for CLA, in this case the smaller prediction errors obtained when adopted the multiple regression with non-informative priori. The models that best fit for FAT, PROT, C16:0, C18:0, C18:1C9, CLA, saturated (SAT), unsaturated (UNSAT), monounsaturated (MONO) and polyunsaturated (POLY) was the one of 1st order and for somatic cell scores (SCS) and C14:0 the one of 2nd order. The estimates of heritability ranged from 0.08 to 0.11 for FAT; 0.28 to 0.35 for PROT; 0.03 to 0.22 for SCS; 0.12 to 0.31 for C16:0; 0.08 to 0.14 for C18:0; 0.24 to 0.43 for C14:0; 0.07 to 0.17 for C18:1C9; 0.13 to 0.39 for CLA; 0.14 to 0.31 for SAT; 0.04 to 0.14 for UNSAT; 0.04 to 0.13 for MONO, 0.09 to 0.20 for POLY and 0.12 for PROD, in the models that best fit. We conclude that improvements in the nutritional quality of milk can be obtained through the inclusion of productive traits and fatty acid profile in genetic selection programs.

APA, Harvard, Vancouver, ISO, and other styles

37

Medeiros, Patrick Valverde. "Análise da evapotranspiração de referência a partir de medidas lisimétricas e ajuste estatístico de estimativas de nove equações empírico-teóricas com base na equação de Penman-Monteith." Universidade de São Paulo, 2008. http://www.teses.usp.br/teses/disponiveis/18/18138/tde-21052008-090008/.

Full text

Abstract:

A quantificação da evapotranspiração é uma tarefa essencial para a determinação do balanço hídrico em uma bacia hidrográfica e para o estabelecimento do déficit hídrico de uma cultura. Nesse sentido, o presente trabalho aborda a análise da evapotranspiração de referência (ETo) para a região de Jaboticabal-SP. O comportamento do fenômeno na região foi estudado a partir da interpretação de dados de uma bateria de 12 lisímetros de drenagem (EToLis) e estimativas teóricas por 10 equações diferentes disponíveis na literatura. A análise estatística de correlação indica que as estimativas da ETo por equações teóricas comparadas à EToLis medida em lisímetro de drenagem não apresentaram bons índices de comparação e erro. Admitindo que a operação dos lisímetros não permitiu a determinação da ETo com boa confiabilidade, propôs-se um ajuste local das demais metodologias de estimativa da ETo, através de auto-regressão (AR) dos ruídos destas equações em comparação com uma média anual estimada pela equação de Penman-Monteith (EToPM), tomada como padrão, em períodos quinzenal e mensal. O ajuste através de regressão linear simples também foi analisado. Os resultados obtidos indicam que a radiação efetiva é a variável climática de maior importância para o estabelecimento da ETo na região. A estimativa pela equação de Penman-Monteith apresentou excelente concordância com as equações de Makkink (1957) e do balanço de energia. Os ajustes locais propostos apresentaram excelentes resultados para a maioria das equações testadas, dando-se destaque às equações da radiação solar FAO-24, de Makkink (1957), de Jensen-Haise (1963), de Camargo (1971), do balanço de radiação, de Turc (1961) e de Thornthwaite (1948). O ajuste por regressão linear simples é de mais fácil execução e apresentou excelentes resultados.
The quantification of the evapotranspiration is an essential task for the determination of the water balance in a watershed and for the establishment of the culture´s water deficit. Therefore, the present work describes the analysis of the reference evapotranspiration (ETo) for the region of Jaboticabal-SP. The phenomenon behavior in the region was studied based on the interpretation of 12 drainage lysimeters data (EToLis) and on theoretical estimates for 10 different equations available in the Literature. An statistical analysis indicated that the theoretical ETo estimates compared with the EToLis did not present good indices of comparison and error. Admitting that the lysimeters operation did not allow a reliable ETo determination, a local adjustment of the theoretical methodologies for ETo estimate was considered. An auto-regression (AR) of the noises of these equations in comparison with the annual average estimate for the Penman-Monteith equation (EToPM), taken as standard, has been performed in fortnightly and monthly periods. The adjustment through simple linear regression has also been analyzed. The obtained results indicate that the effective radiation is the most important climatic variable for the establishment of the ETo in the region. The Penman-Monteith estimate presented excellent correlation to the estimates by Makkink (1957) equation and the energy balance. The local adjustments presented excellent results for the majority of the tested equations, specially for the solar radiation FAO-24, Makkink (1957), Jensen-Haise (1963), Camargo (1971), radiation balance, Turc (1961) and Thornthwaite (1948) equations. The adjustment by simple linear regression is of easier execution and also presented excellent results.

APA, Harvard, Vancouver, ISO, and other styles

38

Kartal, Elcin. "Metamodeling Complex Systems Using Linear And Nonlinear Regression Methods." Master's thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/2/12608930/index.pdf.

Full text

Abstract:

Metamodeling is a very popular approach for the approximation of complex systems. Metamodeling techniques can be categorized according to the type of regression method employed as linear and nonlinear models. The Response Surface Methodology (RSM) is an example of linear regression. In classical RSM metamodels, parameters are estimated using the Least Squares (LS) Method. Robust regression techniques, such as Least Absolute Deviation (LAD) and M-regression, are also considered in this study due to the outliers existing in data sets. Artificial Neural Networks (ANN) and Multivariate Adaptive Regression Splines (MARS) are examples for non-linear regression technique. In this thesis these two nonlinear metamodeling techniques are constructed and their performances are compared with the performances of linear models.

APA, Harvard, Vancouver, ISO, and other styles

39

Bentley, Jason Phillip. "Exact Markov chain Monte Carlo and Bayesian linear regression." Thesis, University of Canterbury. Mathematics and Statistics, 2009. http://hdl.handle.net/10092/2534.

Full text

Abstract:

In this work we investigate the use of perfect sampling methods within the context of Bayesian linear regression. We focus on inference problems related to the marginal posterior model probabilities. Model averaged inference for the response and Bayesian variable selection are considered. Perfect sampling is an alternate form of Markov chain Monte Carlo that generates exact sample points from the posterior of interest. This approach removes the need for burn-in assessment faced by traditional MCMC methods. For model averaged inference, we find the monotone Gibbs coupling from the past (CFTP) algorithm is the preferred choice. This requires the predictor matrix be orthogonal, preventing variable selection, but allowing model averaging for prediction of the response. Exploring choices of priors for the parameters in the Bayesian linear model, we investigate sufficiency for monotonicity assuming Gaussian errors. We discover that a number of other sufficient conditions exist, besides an orthogonal predictor matrix, for the construction of a monotone Gibbs Markov chain. Requiring an orthogonal predictor matrix, we investigate new methods of orthogonalizing the original predictor matrix. We find that a new method using the modified Gram-Schmidt orthogonalization procedure performs comparably with existing transformation methods, such as generalized principal components. Accounting for the effect of using an orthogonal predictor matrix, we discover that inference using model averaging for in-sample prediction of the response is comparable between the original and orthogonal predictor matrix. The Gibbs sampler is then investigated for sampling when using the original predictor matrix and the orthogonal predictor matrix. We find that a hybrid method, using a standard Gibbs sampler on the orthogonal space in conjunction with the monotone CFTP Gibbs sampler, provides the fastest computation and convergence to the posterior distribution. We conclude the hybrid approach should be used when the monotone Gibbs CFTP sampler becomes impractical, due to large backwards coupling times. We demonstrate large backwards coupling times occur when the sample size is close to the number of predictors, or when hyper-parameter choices increase model competition. The monotone Gibbs CFTP sampler should be taken advantage of when the backwards coupling time is small. For the problem of variable selection we turn to the exact version of the independent Metropolis-Hastings (IMH) algorithm. We reiterate the notion that the exact IMH sampler is redundant, being a needlessly complicated rejection sampler. We then determine a rejection sampler is feasible for variable selection when the sample size is close to the number of predictors and using Zellner’s prior with a small value for the hyper-parameter c. Finally, we use the example of simulating from the posterior of c conditional on a model to demonstrate how the use of an exact IMH view-point clarifies how the rejection sampler can be adapted to improve efficiency.

APA, Harvard, Vancouver, ISO, and other styles

40

Crews, Hugh Bates. "Fast FSR Methods for Second-Order Linear Regression Models." NCSU, 2008. http://www.lib.ncsu.edu/theses/available/etd-04282008-151809/.

Full text

Abstract:

Many variable selection techniques have been developed that focus on first-order linear regression models. In some applications, such as modeling response surfaces, fitting second-order terms can improve predictive accuracy. However, the number of spurious interactions can be large leading to poor results with many methods. We focus on forward selection, describing algorithms that use the natural hierarchy existing in second-order linear regression models to limit spurious interactions. We then develop stopping rules by extending False Selection Rate methodology to these algorithms. In addition, we describe alternative estimation methods for fitting regression models including the LASSO, CART, and MARS. We also propose a general method for controlling multiple-group false selection rates, which we apply to second-order linear regression models. By estimating a separate entry level for first-order and second-order terms, we obtain equal contributions to the false selection rate from each group. We compare the methods via Monte Carlo simulation and apply them to optimizing response surface experimental designs.

APA, Harvard, Vancouver, ISO, and other styles

41

Tsakonas, Efthymios. "Convex Optimization for Assignment and Generalized Linear Regression Problems." Doctoral thesis, KTH, Signalbehandling, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-150338.

Full text

Abstract:

This thesis considers optimization techniques with applications in assignment and generalized linear regression problems. The first part of the thesis investigates the worst-case robust counterparts of combinatorial optimization problems with least squares (LS) cost functions, where the uncertainty lies on the linear transformation of the design variables. We consider the case of ellipsoidal uncertainty, and prove that the worst case robust LS optimization problem, although NP-hard, is still amenable to convexrelaxation based on semidefinite optimization. We motivate our proposed relaxation using Lagrangian duality, and illustrate that the tightness of the Lagrange bidual relaxation is strongly dependent on the description of the feasible region of the worst-case robust LS problem. The results arising from this analysis are applicable to a broad range of assignment problems. The second part of the thesis considers combinatorial optimization problems arising specifically in the context of conference program formation. We start by arguing that both papers and reviewers can be represented as feature vectors in a suitable keyword space. This enables rigorous mathematical formulation of the conference formation process. The first problem, paper-to-session assignment, is formulated as a capacitatedk-means clustering problem. We formally prove that it is NP-hard and propose a variety of approximate solutions, ranging from alternating optimization to semidefinite relaxation. Suitable convex relaxation methods are proposed for the paper-to-reviewer assignment problem as well. Our methods are tested using real conference data for both problems, and show very promising results. In a related but distinct research direction, the third part of the thesis focuses on preference measurement applications: Review profiling, i.e., determining the reviewer’s expertise (and thus identifying the associated feature vector for the reviewer) on the basis of their past and present review preferences, or ‘bids’, is an excellent example of preference measurement. We argue that the need for robust preference measurement is apparent in modern applications. Using conjoint analysis (CA) as a basis, we propose a new statistical model for choice-based preference measurement, a part of preference analysis where data are only expressed in the form of binary choices. The model uses deterministic auxiliary variables to account for outliers and to detect the salient features that influence decisions. Contributions include conditions for statistical identifiability, derivation of the pertinent Cramér-Rao Lower Bound (CRLB), and ML consistency conditions for the proposed nonlinear model. The proposed ML approach lends itself naturally to ℓ1-type convex relaxations which are well-suited for distributed implementation, based on the alternating direction method of multipliers (ADMM). A particular decomposition is advocated which bypasses the apparent need for outlier variable communication, thus maintaining scalability. In the last part of the thesis we argue that this modeling has greater intellectual merits than preference measurement, and explain how related ideas can be put in the context of generalized linear regression models, drawing links between ℓ1-methods, stochastic convex optimization, and the field of robust statistics.

QC 20140902

APA, Harvard, Vancouver, ISO, and other styles

42

Gustafsson, Alexander, and Sebastian Wogenius. "Modelling Apartment Prices with the Multiple Linear Regression Model." Thesis, KTH, Matematisk statistik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-146735.

Full text

Abstract:

This thesis examines factors that are of most statistical significance for the sales prices of apartments in the Stockholm City Centre. Factors examined are address, area, balcony, construction year, elevator, fireplace, floor number, maisonette, monthly fee, penthouse and number of rooms. On the basis of this examination, a model for predicting prices of apartments is constructed. In order to evaluate how the factors influence the price, this thesis analyses sales statistics and the mathematical method used is the multiple linear regression model. In a minor case-study and literature review, included in this thesis, the relationship between proximity to public transport and the prices of apartments in Stockholm are examined. The result of this thesis states that it is possible to construct a model, from the factors analysed, which can predict the prices of apartments in Stockholm City Centre with an explanation degree of 91% and a two million SEK confidence interval of 95%. Furthermore, a conclusion can be drawn that the model predicts lower priced apartments more accurately. In the case-study and literature review, the result indicates support for the hypothesis that proximity to public transport is positive for the price of an apartment. However, such a variable should be regarded with caution due to the purpose of the modelling, which differs between an individual application and a social economic application
Denna uppsats undersöker faktorer som är av störst statistisk signifikans för priset vid försäljning av lägenheter i Stockholms innerstad. Faktorer som undersöks är adress, yta, balkong, byggår, hiss, kakelugn, våningsnummer, etage, månadsavgift, vindsvåning och antal rum. Utifrån denna undersökning konstrueras en modell för att predicera priset på lägenheter. För att avgöra vilka faktorer som påverkar priset på lägenheter analyseras försäljningsstatistik. Den matematiska metoden som används är multipel linjär regressionsanalys. I en mindre litteratur- och fallstudie, inkluderad i denna uppsats, undersöks sambandet mellan närhet till kollektivtrafik och priset på läagenheter i Stockholm. Resultatet av denna uppsats visar att det är möjligt att konstruera en modell, utifrån de faktorer som undersöks, som kan predicera priset på läagenheter i Stockholms innerstad med en förklaringsgrad på 91 % och ett två miljoner SEK konfidensintervall på 95 %. Vidare dras en slutsats att modellen preciderar lägenheter med ett lägre pris noggrannare. I litteratur- och fallstudien indikerar resultatet stöd för hypotesen att närhet till kollektivtrafik är positivt för priset på en lägenhet. Detta skall dock betraktas med försiktighet med anledning av syftet med modelleringen vilket skiljer sig mellan en individuell tillämpning och en samhällsekonomisk tillämpning.

APA, Harvard, Vancouver, ISO, and other styles

43

Lin, Shan. "Simultaneous confidence bands for linear and logistic regression models." Thesis, University of Southampton, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.443030.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

"Supervised ridge regression in high dimensional linear regression." 2013. http://library.cuhk.edu.hk/record=b5549319.

Full text

Abstract:

在機器學習領域，我們通常有很多的特徵變量，以確定一些回應變量的行為。例如在基因測試問題，我們有數以萬計的基因用來作為特徵變量，而它們與某些疾病的關係需要被確定。沒有提供具體的知識，最簡單和基本的方法來模擬這種問題會是一個線性的模型。有很多現成的方法來解決線性回歸問題，像傳統的普通最小二乘回歸法，嶺回歸和套索回歸。設 N 為樣本數和，p 為特徵變量數，在普通的情況下，我們通常有足夠的樣本（N> P）。在這種情況下，普通線性回歸的方法，例如嶺回歸通常會給予合理的對未來的回應變量測值的預測。隨著現代統計學的發展，我們經常會遇到高維問題（N << P），如 DNA 芯片數據的測試問題。在這些類型的高維問題中，確定特徵變量和回應變量之間的關係在沒有任何進一步的假設的情況下是相當困難的。在很多現實問題中，儘管有大量的特徵變量存在，但是完全有可能只有極少數的特徵變量和回應變量有直接關係，而大部分其他的特徵變量都是無效的。套索和嶺回歸等傳統線性回歸在高維問題中有其局限性。套索回歸在應用於高維問題時，會因為測量噪聲的存在而表現得很糟糕，這將導致非常低的預測準確率。嶺回歸也有其明顯的局限性。它不能夠分開真正的特徵變量和無效的特徵變量。我提出的新方法的目的就是在高維線性回歸中克服以上兩種方法的局限性，從而導致更精確和穩定的預測。想法其實很簡單，與其做一個單一步驟的線性回歸，我們將回歸過程分成兩個步驟。第一步，我们棄那些預測有相關性很小或為零的特徵變量。第二步，我們應該得到一個消減過的特徵變量集，我們將用這個集和回應變量來進行嶺回歸從而得到我們需要的結果。
In the field of statistical learning, we usually have a lot of features to determine the behavior of some response. For example in gene testing problems we have lots of genes as features and their relations with certain disease need to be determined. Without specific knowledge available, the most simple and fundamental way to model this kind of problem would be a linear model. There are many existing method to solve linear regression, like conventional ordinary least squares, ridge regression and LASSO (least absolute shrinkage and selection operator). Let N denote the number of samples and p denote the number of predictors, in ordinary settings where we have enough samples (N > p), ordinary linear regression methods like ridge regression will usually give reasonable predictions for the future values of the response. In the development of modern statistical learning, it's quite often that we meet high dimensional problems (N << p), like documents classification problems and microarray data testing problems. In high-dimensional problems it is generally quite difficult to identify the relationship between the predictors and the response without any further assumptions. Despite the fact that there are many predictors for prediction, most of the predictors are actually spurious in a lot of real problems. A predictor being spurious means that it is not directly related to the response. For example in microarray data testing problems, millions of genes may be available for doing prediction, but only a few hundred genes are actually related to the target disease. Conventional techniques in linear regression like LASSO and ridge regression both have their limitations in high-dimensional problems. The LASSO is one of the "state of the art technique for sparsity recovery, but when applied to high-dimensional problems, LASSO's performance is degraded a lot due to the presence of the measurement noise, which will result in high variance prediction and large prediction error. Ridge regression on the other hand is more robust to the additive measurement noise, but has its obvious limitation of not being able to separate true predictors from spurious predictors. As mentioned previously in many high-dimensional problems a large number of the predictors could be spurious, then in these cases ridge's disability in separating spurious and true predictors will result in poor interpretability of the model as well as poor prediction performance. The new technique that I will propose in this thesis aims to accommodate for the limitations of these two methods thus resulting in more accurate and stable prediction performance in a high-dimensional linear regression problem with signicant measurement noise. The idea is simple, instead of the doing a single step regression, we divide the regression procedure into two steps. In the first step we try to identify the seemingly relevant predictors and those that are obviously spurious by calculating the uni-variant correlations between the predictors and the response. We then discard those predictors that have very small or zero correlation with the response. After the first step we should have obtained a reduced predictor set. In the second step we will perform a ridge regression between the reduced predictor set and the response, the result of this ridge regression will then be our desired output. The thesis will be organized as follows, first I will start with a literature review about the linear regression problem and introduce in details about the ridge and LASSO and explain more precisely about their limitations in high-dimensional problems. Then I will introduce my new method called supervised ridge regression and show the reasons why it should dominate the ridge and LASSO in high-dimensional problems, and some simulation results will be demonstrated to strengthen my argument. Finally I will conclude with the possible limitations of my method and point out possible directions for further investigations.
Detailed summary in vernacular field only.
Zhu, Xiangchen.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2013.
Includes bibliographical references (leaves 68-69).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts also in Chinese.
Chapter 1. --- BASICS ABOUT LINEAR REGRESSION --- p.2
Chapter 1.1 --- Introduction --- p.2
Chapter 1.2 --- Linear Regression and Least Squares --- p.2
Chapter 1.2.1 --- Standard Notations --- p.2
Chapter 1.2.2 --- Least Squares and Its Geometric Meaning --- p.4
Chapter 2. --- PENALIZED LINEAR REGRESSION --- p.9
Chapter 2.1 --- Introduction --- p.9
Chapter 2.2 --- Deficiency of the Ordinary Least Squares Estimate --- p.9
Chapter 2.3 --- Ridge Regression --- p.12
Chapter 2.3.1 --- Introduction to Ridge Regression --- p.12
Chapter 2.3.2 --- Expected Prediction Error And Noise Variance Decomposition of Ridge Regression --- p.13
Chapter 2.3.3 --- Shrinkage effects on different principal components by ridge regression --- p.18
Chapter 2.4 --- The LASSO --- p.22
Chapter 2.4.1 --- Introduction to the LASSO --- p.22
Chapter 2.4.2 --- The Variable Selection Ability and Geometry of LASSO --- p.25
Chapter 2.4.3 --- Coordinate Descent Algorithm to solve for the LASSO --- p.28
Chapter 3. --- LINEAR REGRESSION IN HIGH-DIMENSIONAL PROBLEMS --- p.31
Chapter 3.1 --- Introduction --- p.31
Chapter 3.2 --- Spurious Predictors and Model Notations for High-dimensional Linear Regression --- p.32
Chapter 3.3 --- Ridge and LASSO in High-dimensional Linear Regression --- p.34
Chapter 4. --- THE SUPERVISED RIDGE REGRESSION --- p.39
Chapter 4.1 --- Introduction --- p.39
Chapter 4.2 --- Definition of Supervised Ridge Regression --- p.39
Chapter 4.3 --- An Underlying Latent Model --- p.43
Chapter 4.4 --- Ridge LASSO and Supervised Ridge Regression --- p.45
Chapter 4.4.1 --- LASSO vs SRR --- p.45
Chapter 4.4.2 --- Ridge regression vs SRR --- p.46
Chapter 5. --- TESTING AND SIMULATION --- p.49
Chapter 5.1 --- A Simulation Example --- p.49
Chapter 5.2 --- More Experiments --- p.54
Chapter 5.2.1 --- Correlated Spurious and True Predictors --- p.55
Chapter 5.2.2 --- Insufficient Amount of Data Samples --- p.59
Chapter 5.2.3 --- Low Dimensional Problem --- p.62
Chapter 6. --- CONCLUSIONS AND DISCUSSIONS --- p.66
Chapter 6.1 --- Conclusions --- p.66
Chapter 6.2 --- References and Related Works --- p.68

APA, Harvard, Vancouver, ISO, and other styles

45

"Benchmarking non-linear series with quasi-linear regression." 2012. http://library.cuhk.edu.hk/record=b5549055.

Full text

Abstract:

一個社會經濟學的目標變量，經常存在兩種不同收集頻率的數據。由於較低頻率的一組數據通常由大型普查中所獲得，其準確度及可靠性會較高。因此較低頻率的一組數據一般會視作基準，用作對頻率較高的另一組數據進行修正。
在基準修正過程中，一般會假設調查誤差及目標數據的大小互相獨立，即「累加模型」。然而，現實中兩者通常是相關的，目標變量越大，調查誤差亦會越大，即「乘積模型」。對此問題，陳兆國及胡家浩提出了利用準線性回歸手法對乘積模型進行基準修正。在本論文中，假設調查誤差服從AR(1)模型，首先我們會示範如何利用準線性回歸手法及默認調查誤差模型進行基準數據修正。然後，運用基準預測的方式，提出一個對調查誤差模型的估計辦法。最後我們會比較兩者的表現以及一些選擇誤差模型的指引。
For a target socio-economic variable, two sources of data with different collecting frequencies may be available in survey data analysis. In general, due to the difference of sample size or the data source, two sets of data do not agree with each other. Usually, the more frequent observations are less reliable, and the less frequent observations are much more accurate. In benchmarking problem, the less frequent observations can be treated as benchmarks, and will be used to adjust the higher frequent data.
In the common benchmarking setting, the survey error and the target variable are always assumed to be independent (Additive case). However, in reality, they should be correlated (Multiplicative case). The larger the variable, the larger the survey error. To deal with this problem, Chen and Wu (2006) proposed a regression method called quasi-linear regression for the multiplicative case. In this paper, by assuming the survey error to be an AR(1) model, we will demonstrate the benchmarking procedure using default error model for the quasi-linear regression. Also an error modelling procedure using benchmark forecast method will be proposed. Finally, we will compare the performance of the default error model with the fitted error model.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Luk, Wing Pan.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 56-57).
Abstracts also in Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Recent Development For Benchmarking Methods --- p.2
Chapter 1.2 --- Multiplicative Case And Benchmarking Problem --- p.3
Chapter 2 --- Benchmarking With Quasi-linear Regression --- p.8
Chapter 2.1 --- Iterative Procedure For Quasi-linear Regression --- p.9
Chapter 2.2 --- Prediction Using Default Value φ --- p.16
Chapter 2.3 --- Performance Of Using Default Error Model --- p.17
Chapter 3 --- Estimation Of φ Via BM Forecasting method --- p.26
Chapter 3.1 --- Benchmark Forecasting Method --- p.26
Chapter 3.2 --- Performance Of Benchmark Forecasting Method --- p.28
Chapter 4 --- Benchmarking By The Estimated Value --- p.34
Chapter 4.1 --- Benchmarking With The Estimated Error Model --- p.35
Chapter 4.2 --- Performance Of Using Estimated Error Model --- p.36
Chapter 4.3 --- Suggestions For Selecting Error Model --- p.45
Chapter 5 --- Fitting AR(1) Model For Non-AR(1) Error --- p.47
Chapter 5.1 --- Settings For Non-AR(1) Model --- p.47
Chapter 5.2 --- Simulation Studies --- p.48
Chapter 6 --- An Illustrative Example: The Canada Total Retail Trade Se-ries --- p.50
Chapter 7 --- Conclusion --- p.54
Bibliography --- p.56

APA, Harvard, Vancouver, ISO, and other styles

46

Lu, QiQi. "Linear regression under multiple changepoints." 2004. http://purl.galileo.usg.edu/uga%5Fetd/lu%5Fqiqi%5F200408%5Fphd.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

曾麗齡. "Linear Regression with Censored Data." Thesis, 1990. http://ndltd.ncl.edu.tw/handle/73824948008674721841.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Dias, Sónia Manuela Mendes. "Linear regression with empirical distributions." Doctoral thesis, 2014. https://repositorio-aberto.up.pt/handle/10216/74191.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Huang, Min Ching. "Piecewise linear tree-structured regression." 1989. http://catalog.hathitrust.org/api/volumes/oclc/21951798.html.

Full text

Abstract:

Thesis (Ph. D.)--University of Wisconsin--Madison, 1989.
Typescript. Vita. eContent provider-neutral record in process. Description based on print version record. Includes bibliographical references (leaves 101-104).

APA, Harvard, Vancouver, ISO, and other styles

50

Dias, Sónia Manuela Mendes. "Linear regression with empirical distributions." Tese, 2014. https://repositorio-aberto.up.pt/handle/10216/74191.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Linear regression'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles