Journal articles on the topic 'LASSO regression models'

To see the other types of publications on this topic, follow the link: LASSO regression models.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'LASSO regression models.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Giurcanu, Mihai, and Brett Presnell. "Bootstrapping LASSO-type estimators in regression models." Journal of Statistical Planning and Inference 199 (March 2019): 114–25. http://dx.doi.org/10.1016/j.jspi.2018.05.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Bohan, and Juan Wu. "Bayesian bootstrap adaptive lasso estimators of regression models." Journal of Statistical Computation and Simulation 91, no. 8 (January 11, 2021): 1651–80. http://dx.doi.org/10.1080/00949655.2020.1865959.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Xin, Lingchen Kong, and Liqun Wang. "Estimation of Error Variance in Regularized Regression Models via Adaptive Lasso." Mathematics 10, no. 11 (June 6, 2022): 1937. http://dx.doi.org/10.3390/math10111937.

Full text
Abstract:
Estimation of error variance in a regression model is a fundamental problem in statistical modeling and inference. In high-dimensional linear models, variance estimation is a difficult problem, due to the issue of model selection. In this paper, we propose a novel approach for variance estimation that combines the reparameterization technique and the adaptive lasso, which is called the natural adaptive lasso. This method can, simultaneously, select and estimate the regression and variance parameters. Moreover, we show that the natural adaptive lasso, for regression parameters, is equivalent to the adaptive lasso. We establish the asymptotic properties of the natural adaptive lasso, for regression parameters, and derive the mean squared error bound for the variance estimator. Our theoretical results show that under appropriate regularity conditions, the natural adaptive lasso for error variance is closer to the so-called oracle estimator than some other existing methods. Finally, Monte Carlo simulations are presented, to demonstrate the superiority of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
4

Emmert-Streib, Frank, and Matthias Dehmer. "High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection." Machine Learning and Knowledge Extraction 1, no. 1 (January 14, 2019): 359–83. http://dx.doi.org/10.3390/make1010021.

Full text
Abstract:
Regression models are a form of supervised learning methods that are important for machine learning, statistics, and general data science. Despite the fact that classical ordinary least squares (OLS) regression models have been known for a long time, in recent years there are many new developments that extend this model significantly. Above all, the least absolute shrinkage and selection operator (LASSO) model gained considerable interest. In this paper, we review general regression models with a focus on the LASSO and extensions thereof, including the adaptive LASSO, elastic net, and group LASSO. We discuss the regularization terms responsible for inducing coefficient shrinkage and variable selection leading to improved performance metrics of these regression models. This makes these modern, computational regression models valuable tools for analyzing high-dimensional problems.
APA, Harvard, Vancouver, ISO, and other styles
5

Honda, Toshio, Ching-Kang Ing, and Wei-Ying Wu. "Adaptively weighted group Lasso for semiparametric quantile regression models." Bernoulli 25, no. 4B (November 2019): 3311–38. http://dx.doi.org/10.3150/18-bej1091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ahmed, S. Ejaz, Shakhawat Hossain, and Kjell A. Doksum. "LASSO and shrinkage estimation in Weibull censored regression models." Journal of Statistical Planning and Inference 142, no. 6 (June 2012): 1273–84. http://dx.doi.org/10.1016/j.jspi.2011.12.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Matsui, Hidetoshi. "Sparse group lasso for multiclass functional logistic regression models." Communications in Statistics - Simulation and Computation 48, no. 6 (February 21, 2018): 1784–97. http://dx.doi.org/10.1080/03610918.2018.1423693.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tian, Yuzhu, Silian Shen, Ge Lu, Manlai Tang, and Maozai Tian. "Bayesian LASSO-Regularized quantile regression for linear regression models with autoregressive errors." Communications in Statistics - Simulation and Computation 48, no. 3 (December 6, 2017): 777–96. http://dx.doi.org/10.1080/03610918.2017.1397166.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Xin, Seng Jia, and Kamil Khalid. "Modelling House Price Using Ridge Regression and Lasso Regression." International Journal of Engineering & Technology 7, no. 4.30 (November 30, 2018): 498. http://dx.doi.org/10.14419/ijet.v7i4.30.22378.

Full text
Abstract:
House price prediction is important for the government, finance company, real estate sector and also the house owner. The data of the house price at Ames, Iowa in United State which from the year 2006 to 2010 is used for multivariate analysis. However, multicollinearity is commonly occurred in the multivariate analysis and gives a serious effect to the model. Therefore, in this study investigates the performance of the Ridge regression model and Lasso regression model as both regressions can deal with multicollinearity. Ridge regression model and Lasso regression model are constructed and compared. The root mean square error (RMSE) and adjusted R-squared are used to evaluate the performance of the models. This comparative study found that the Lasso regression model is performing better compared to the Ridge regression model. Based on this analysis, the selected variables includes the aspect of house size, age of house, condition of house and also the location of the house.
APA, Harvard, Vancouver, ISO, and other styles
10

Qiao, Xin, Yoshikazu Kobayashi, Kenichi Oda, and Katsuya Nakamura. "Improved Acoustic Emission Tomography Algorithm Based on Lasso Regression." Applied Sciences 12, no. 22 (November 20, 2022): 11800. http://dx.doi.org/10.3390/app122211800.

Full text
Abstract:
This study developed a novel acoustic emission (AE) tomography algorithm for non-destructive testing (NDT) based on Lasso regression (LASSO). The conventional AE tomography method takes considerable measurement data to obtain the elastic velocity distribution for structure evaluation. However, the new algorithm in which the LASSO algorithm is applied to AE tomography eliminates these deficiencies and reconstructs equivalent velocity distribution with fewer event data to describe the defected range. Three numerical simulation models were studied to reveal the capacity of the proposed method, and the functional performance was verified by three different types of classical concrete damage numerical simulation models and compared to that of the conventional SIRT algorithm in the experiment. Finally, this study demonstrates that the LASSO algorithm can be applied in AE tomography, and the shadow parts are eliminated in resultant elastic velocity distributions with fewer measurement paths.
APA, Harvard, Vancouver, ISO, and other styles
11

Qian, Junhui, and Liangjun Su. "SHRINKAGE ESTIMATION OF REGRESSION MODELS WITH MULTIPLE STRUCTURAL CHANGES." Econometric Theory 32, no. 6 (June 23, 2015): 1376–433. http://dx.doi.org/10.1017/s0266466615000237.

Full text
Abstract:
In this paper, we consider the problem of determining the number of structural changes in multiple linear regression models via group fused Lasso. We show that with probability tending to one, our method can correctly determine the unknown number of breaks, and the estimated break dates are sufficiently close to the true break dates. We obtain estimates of the regression coefficients via post Lasso and establish the asymptotic distributions of the estimates of both break ratios and regression coefficients. We also propose and validate a data-driven method to determine the tuning parameter. Monte Carlo simulations demonstrate that the proposed method works well in finite samples. We illustrate the use of our method with a predictive regression of the equity premium on fundamental information.
APA, Harvard, Vancouver, ISO, and other styles
12

Zhao, Yuanying, and Xingde Duan. "Bayesian Adaptive Lasso for Regression Models with Nonignorable Missing Responses." Journal of Mathematics 2022 (February 11, 2022): 1–12. http://dx.doi.org/10.1155/2022/3168735.

Full text
Abstract:
The main purpose of this article is to develop a Bayesian adaptive lasso procedure for analyzing linear regression models with nonignorable missing responses, in which the missingness mechanism is specified by a logistic regression model. A sampling procedure combining the Gibbs sampler and Metropolis-Hastings algorithm is employed to obtain the Bayesian estimates of the regression coefficients, shrinkage coefficients, missingness mechanism models parameters, and their standard errors. We extend the partial posterior predictive p value for goodness-of-fit statistic to investigate the plausibility of the posited model. Finally, several simulation studies and the air pollution data example are undertaken to demonstrate the newly developed methodologies.
APA, Harvard, Vancouver, ISO, and other styles
13

Zhou, Xiuqing, and Guoxiang Liu. "LAD-Lasso variable selection for doubly censored median regression models." Communications in Statistics - Theory and Methods 45, no. 12 (May 4, 2016): 3658–67. http://dx.doi.org/10.1080/03610926.2014.904357.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Yang, Xiaoxing, and Wushao Wen. "Ridge and Lasso Regression Models for Cross-Version Defect Prediction." IEEE Transactions on Reliability 67, no. 3 (September 2018): 885–96. http://dx.doi.org/10.1109/tr.2018.2847353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Kong, Shengchun, Zhuqing Yu, Xianyang Zhang, and Guang Cheng. "High‐dimensional robust inference for Cox regression models using desparsified Lasso." Scandinavian Journal of Statistics 48, no. 3 (July 19, 2021): 1068–95. http://dx.doi.org/10.1111/sjos.12543.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Xue, Yujie, and Masanobu Taniguchi. "Modified LASSO estimators for time series regression models with dependent disturbances." Statistical Methods & Applications 29, no. 4 (January 16, 2020): 845–69. http://dx.doi.org/10.1007/s10260-020-00506-w.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

SUDHEER KUMAR, S.D. ATTRI, and K.K. SINGH. "Comparison of Lasso and stepwise regression technique for wheat yield prediction." Journal of Agrometeorology 21, no. 2 (November 10, 2021): 188–92. http://dx.doi.org/10.54386/jam.v21i2.231.

Full text
Abstract:
Multiple regression approach has been used to forecast the crop production widely. This study has been undertaken to evaluate the performance of stepwise and Lasso (Least absolute shrinkage and selection operator) regression technique in variable selection and development of wheat forecast model for crop yield using weather data and wheat yield for the period of 1984-2015, collected from IARI, New Delhi. Statistical parameters viz. R2, RMSE, and MAPE were 0.81, 195.90 and 4.54 per cent respectively with stepwise regression and 0.95, 99.27, 2.7 percentage, respectively with Lasso regression. Forecast models were validated during 2013-14 and 2014-15. Prediction errors were -8.5 and 10.14 per cent with stepwise and 1.89 and 1.64 percent with the Lasso. This shows that performance of Lasso regression is better than stepwise regression to some extent.
APA, Harvard, Vancouver, ISO, and other styles
18

Usman, M., S. I. S. Doguwa, and B. B. Alhaji. "Comparing the Prediction Accuracy of Ridge, Lasso and Elastic Net Regression Models with Linear Regression Using Breast Cancer Data." Bayero Journal of Pure and Applied Sciences 14, no. 2 (July 6, 2022): 134–49. http://dx.doi.org/10.4314/bajopas.v14i2.16.

Full text
Abstract:
Regularised regression methods have been developed in the past to overcome the shortcomings of ordinarily least squares (OLS) regression of not performing well with respect to both prediction accuracy and model complexity. OLS method may fail or produce regression estimates with high variance in the presence of multi-collinearity or when the predictor variables are greater than the number of observations. This study compares the predictive performance and additional information gained of Ridge, Lasso and Elastic net regularised methods with the classical OLS method using data of breast cancer patients. The findings have shown that using all the predictor variables, the OLS method failed because of the presence of multiple collinearity, while regularised Ridge, Lasso and Elastic net methods produced results that showed the predictor variables mostly significant. Using the training data, the Elastic net and Lasso seemed to indicate more significant predictor variables than the Ridge method. The result also indicated that breast cancer patients in age groups 30-39, those that are married and in stage1 of the disease, have longer survival times, while patients that are in stage2 and stage3 have shorter survival times. The OLS regression produced results only when four of the predictor variables were excluded; even then, the regularised methods still outperformed the OLS regression in terms of prediction accuracy.
APA, Harvard, Vancouver, ISO, and other styles
19

Greenwood, Christopher J., George J. Youssef, Primrose Letcher, Jacqui A. Macdonald, Lauryn J. Hagg, Ann Sanson, Jenn Mcintosh, et al. "A comparison of penalised regression methods for informing the selection of predictive markers." PLOS ONE 15, no. 11 (November 20, 2020): e0242730. http://dx.doi.org/10.1371/journal.pone.0242730.

Full text
Abstract:
Background Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. Design Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. Findings Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. Conclusions Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.
APA, Harvard, Vancouver, ISO, and other styles
20

Kayanan, Manickavasagar, and Pushpakanthie Wijekoon. "Stochastic Restricted LASSO-Type Estimator in the Linear Regression Model." Journal of Probability and Statistics 2020 (March 30, 2020): 1–7. http://dx.doi.org/10.1155/2020/7352097.

Full text
Abstract:
Among several variable selection methods, LASSO is the most desirable estimation procedure for handling regularization and variable selection simultaneously in the high-dimensional linear regression models when multicollinearity exists among the predictor variables. Since LASSO is unstable under high multicollinearity, the elastic-net (Enet) estimator has been used to overcome this issue. According to the literature, the estimation of regression parameters can be improved by adding prior information about regression coefficients to the model, which is available in the form of exact or stochastic linear restrictions. In this article, we proposed a stochastic restricted LASSO-type estimator (SRLASSO) by incorporating stochastic linear restrictions. Furthermore, we compared the performance of SRLASSO with LASSO and Enet in root mean square error (RMSE) criterion and mean absolute prediction error (MAPE) criterion based on a Monte Carlo simulation study. Finally, a real-world example was used to demonstrate the performance of SRLASSO.
APA, Harvard, Vancouver, ISO, and other styles
21

Tajuddeen, Ibrahim, Seyed Masoud Sajjadian, and Mina Jafari. "Regression Models for Predicting the Global Warming Potential of Thermal Insulation Materials." Buildings 13, no. 1 (January 9, 2023): 171. http://dx.doi.org/10.3390/buildings13010171.

Full text
Abstract:
The impacts and benefits of thermal insulations on saving operational energy have been widely investigated and well-documented. Recently, many studies have shifted their focus to comparing the environmental impacts and CO2 emission-related policies of these materials, which are mostly the Embodied Energy (EE) and Global Warming Potential (GWP). In this paper, machine learning techniques were used to analyse the untapped aspect of these environmental impacts. A collection of over 120 datasets from reliable open-source databases including Okobaudat and Ecoinvent, as well as from the scientific literature containing data from the Environmental Product Declarations (EPD), was compiled and analysed. Comparisons of Multiple Linear Regression (MLR), Support Vector Regression (SVR), Least Absolute Shrinkage and Selection Operator (LASSO) regression, and Extreme Gradient Boosting (XGBoost) regression methods were completed for the prediction task. The experimental results revealed that MLR, SVR, and LASSO methods outperformed the XGBoost method according to both the K-Fold and Monte-Carlo cross-validation techniques. MLR, SVR, and LASSO achieved 0.85/0.73, 0.82/0.72, and 0.85/0.71 scores according to the R2 measure for the Monte-Carlo/K-Fold cross-validations, respectively, and the XGBoost overfitted the training set, showing it to be less reliable for this task. Overall, the results of this task will contribute to the selection of effective yet low-energy-intensive thermal insulation, thus mitigating environmental impacts.
APA, Harvard, Vancouver, ISO, and other styles
22

Song, Min, Minhyuk Lee, Taesung Park, and Mira Park. "MP-LASSO chart: a multi-level polar chart for visualizing group LASSO analysis of genomic data." Genomics & Informatics 20, no. 4 (December 31, 2022): e48. http://dx.doi.org/10.5808/gi.22075.

Full text
Abstract:
Penalized regression has been widely used in genome-wide association studies for joint analyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (LASSO) method effectively removes some coefficients from the model by shrinking them toward zero. To handle group structures, such as genes and pathways, several modified LASSO penalties have been proposed, including group LASSO and sparse group LASSO. Group LASSO ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group LASSO performs group selection as in group LASSO, but also performs individual selection as in LASSO. While these sparse methods are useful in high-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. LASSO's results are often expressed as trace plots of regression coefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar LASSO (MP-LASSO) chart, which can effectively represent the results from group LASSO and sparse group LASSO analyses. An R package to draw MP-LASSO charts was developed. Through a real-world genetic data application, we demonstrated that our MP-LASSO chart package effectively visualizes the results of LASSO, group LASSO, and sparse group LASSO.
APA, Harvard, Vancouver, ISO, and other styles
23

Takada, Masaaki, Taiji Suzuki, and Hironori Fujisawa. "Independently Interpretable Lasso for Generalized Linear Models." Neural Computation 32, no. 6 (June 2020): 1168–221. http://dx.doi.org/10.1162/neco_a_01279.

Full text
Abstract:
Sparse regularization such as [Formula: see text] regularization is a quite powerful and widely used strategy for high-dimensional learning problems. The effectiveness of sparse regularization has been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary [Formula: see text] regularization selects variables correlated with each other under weak regularizations, which results in deterioration of not only its estimation error but also interpretability. In this letter, we propose a new regularization method, independently interpretable lasso (IILasso), for generalized linear models. Our proposed regularizer suppresses selecting correlated variables, so that each active variable affects the response independently in the model. Hence, we can interpret regression coefficients intuitively, and the performance is also improved by avoiding overfitting. We analyze the theoretical property of the IILasso and show that the proposed method is advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of the IILasso.
APA, Harvard, Vancouver, ISO, and other styles
24

Banerjee, Prithish, Broti Garai, Himel Mallick, Shrabanti Chowdhury, and Saptarshi Chatterjee. "A Note on the Adaptive LASSO for Zero-Inflated Poisson Regression." Journal of Probability and Statistics 2018 (December 30, 2018): 1–9. http://dx.doi.org/10.1155/2018/2834183.

Full text
Abstract:
We consider the problem of modelling count data with excess zeros using Zero-Inflated Poisson (ZIP) regression. Recently, various regularization methods have been developed for variable selection in ZIP models. Among these, EM LASSO is a popular method for simultaneous variable selection and parameter estimation. However, EM LASSO suffers from estimation inefficiency and selection inconsistency. To remedy these problems, we propose a set of EM adaptive LASSO methods using a variety of data-adaptive weights. We show theoretically that the new methods are able to identify the true model consistently, and the resulting estimators can be as efficient as oracle. The methods are further evaluated through extensive synthetic experiments and applied to a German health care demand dataset.
APA, Harvard, Vancouver, ISO, and other styles
25

Geraldo-Campos, Luis Alberto, Juan J. Soria, and Tamara Pando-Ezcurra. "Machine Learning for Credit Risk in the Reactive Peru Program: A Comparison of the Lasso and Ridge Regression Models." Economies 10, no. 8 (July 30, 2022): 188. http://dx.doi.org/10.3390/economies10080188.

Full text
Abstract:
COVID-19 has caused an economic crisis in the business world, leaving limitations in the continuity of the payment chain, with companies resorting to credit access. This study aimed to determine the optimal machine learning predictive model for the credit risk of companies under the Reactiva Peru Program because of COVID-19. A multivariate regression analysis was applied with four regressor variables (economic sector, granting entity, amount covered, and department) and one predictor (risk level), with a population of 501,298 companies benefiting from the program, under the CRISP-DM methodology oriented especially for data mining projects, with artificial intelligence techniques under the machine learning Lasso and Ridge regression models, with econometric algebraic mathematical verification to compare and validate the predictive models using SPSS, Jamovi, R Studio, and MATLAB software. The results revealed a better Lasso regression model (λ60 = 0.00038; RMSE = 0.3573685) that optimally predicted the level of risk compared to the Ridge regression model (λ100 = 0.00910; RMSE = 0.3573812) and the least squares model with algebraic mathematics, which corroborates that the Lasso regression model is the best predictive model to detect the level of credit risk of the Reactiva Peru Program. The best predictive model for detecting the level of corporate credit risk is the Lasso regression model.
APA, Harvard, Vancouver, ISO, and other styles
26

Patil, Abhijeet R., and Sangjin Kim. "Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data." Mathematics 8, no. 1 (January 10, 2020): 110. http://dx.doi.org/10.3390/math8010110.

Full text
Abstract:
In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers’ cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean.
APA, Harvard, Vancouver, ISO, and other styles
27

Liu, Feng, and David Pitt. "Application of bivariate negative binomial regression model in analysing insurance count data." Annals of Actuarial Science 11, no. 2 (May 4, 2017): 390–411. http://dx.doi.org/10.1017/s1748499517000070.

Full text
Abstract:
AbstractIn this paper we analyse insurance claim frequency data using the bivariate negative binomial regression (BNBR) model. We use general insurance data on claims from simple third-party liability insurance and comprehensive insurance. We find that bivariate regression, with its capacity for modelling correlation between the two observed claim counts, provides both a superior fit and out-of-sample prediction compared with the more common practice of fitting univariate negative binomial regression models separately to each claim type. Noting the complexity of BNBR models and their potential for a large number of parameters, we explore the use of model shrinkage methodology, namely the least absolute shrinkage and selection operator (Lasso) and ridge regression. We find that models estimated using shrinkage methods outperform the ordinary likelihood-based models when being used to make predictions out-of-sample. We find that the Lasso performs better than ridge regression as a method of shrinkage.
APA, Harvard, Vancouver, ISO, and other styles
28

Meynet, Caroline. "Anℓ1-oracle inequality for the Lasso in finite mixture Gaussian regression models." ESAIM: Probability and Statistics 17 (2013): 650–71. http://dx.doi.org/10.1051/ps/2012016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Lloyd-Jones, Luke R., Hien D. Nguyen, and Geoffrey J. McLachlan. "A globally convergent algorithm for lasso-penalized mixture of linear regression models." Computational Statistics & Data Analysis 119 (March 2018): 19–38. http://dx.doi.org/10.1016/j.csda.2017.09.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Bingwen, Jun Geng, and Lifeng Lai. "Multiple Change-Points Estimation in Linear Regression Models via Sparse Group Lasso." IEEE Transactions on Signal Processing 63, no. 9 (May 2015): 2209–24. http://dx.doi.org/10.1109/tsp.2015.2411220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Xu, Dengke, and Niansheng Tang. "Bayesian adaptive Lasso for quantile regression models with nonignorably missing response data." Communications in Statistics - Simulation and Computation 48, no. 9 (January 12, 2019): 2727–42. http://dx.doi.org/10.1080/03610918.2018.1468452.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

He, Yuxin, Yang Zhao, and Kwok Leung Tsui. "Exploring influencing factors on transit ridership from a local perspective." Smart and Resilient Transport 1, no. 1 (May 14, 2019): 2–16. http://dx.doi.org/10.1108/srt-06-2019-0002.

Full text
Abstract:
Purpose Exploring the influencing factors on urban rail transit (URT) ridership is vital for travel demand estimation and urban resources planning. Among various existing ridership modeling methods, direct demand model with ordinary least square (OLS) multiple regression as a representative has considerable advantages over the traditional four-step model. Nevertheless, OLS multiple regression neglects spatial instability and spatial heterogeneity from the magnitude of the coefficients across the urban area. This paper aims to focus on modeling and analyzing the factors influencing metro ridership at the station level. Design/methodology/approach This paper constructs two novel direct demand models based on geographically weighted regression (GWR) for modeling influencing factors on metro ridership from a local perspective. One is GWR with globally implemented LASSO for feature selection, and the other one is geographically weighted LASSO (GWL) model, which is GWR with locally implemented LASSO for feature selection. Findings The results of real-world case study of Shenzhen Metro show that the two local models presented perform better than the traditional global model (OLS) in terms of estimation error of ridership and goodness-of-fit. Additionally, the GWL model results in a better fit than GWR with global LASSO model, indicating that the locally implemented LASSO is more effective for the accurate estimation of Shenzhen metro ridership than global LASSO does. Moreover, the information provided by both two local models regarding the spatial varied elasticities demonstrates the strong spatial interpretability of models and potentials in transport planning. Originality/value The main contributions are threefold: the approach is based on spatial models considering spatial autocorrelation of variables, which outperform the traditional global regression model – OLS – in terms of model fitting and spatial explanatory power. GWR with global feature selection using LASSO and GWL is compared through a real-world case study on Shenzhen Metro, that is, the difference between global feature selection and local feature selection is discussed. Network structures as a type of factors are quantified with the measurements in the field of complex network.
APA, Harvard, Vancouver, ISO, and other styles
33

Bröcker, Jochen. "Regularized Logistic Models for Probabilistic Forecasting and Diagnostics." Monthly Weather Review 138, no. 2 (February 1, 2010): 592–604. http://dx.doi.org/10.1175/2009mwr3126.1.

Full text
Abstract:
Abstract Logistic models are studied as a tool to convert dynamical forecast information (deterministic and ensemble) into probability forecasts. A logistic model is obtained by setting the logarithmic odds ratio equal to a linear combination of the inputs. As with any statistical model, logistic models will suffer from overfitting if the number of inputs is comparable to the number of forecast instances. Computational approaches to avoid overfitting by regularization are discussed, and efficient techniques for model assessment and selection are presented. A logit version of the lasso (originally a linear regression technique), is discussed. In lasso models, less important inputs are identified and the corresponding coefficient is set to zero, providing an efficient and automatic model reduction procedure. For the same reason, lasso models are particularly appealing for diagnostic purposes.
APA, Harvard, Vancouver, ISO, and other styles
34

Kaushik, Sakshi, Alka Sabharwal, and Gurprit Grover. "Extracting relevant predictors of the severity of mental illnesses from clinical information using regularisation regression models." Statistics in Transition New Series 23, no. 2 (June 1, 2022): 129–52. http://dx.doi.org/10.2478/stattrans-2022-0020.

Full text
Abstract:
Abstract Mental disorders are common non-communicable diseases whose occurrence rises at epidemic rates globally. The determination of the severity of a mental illness has important clinical implications and it serves as a prognostic factor for effective intervention planning and management. This paper aims to identify the relevant predictors of the severity of mental illnesses (measured by psychiatric rating scales) from a wide range of clinical variables consisting of information on both laboratory test results and psychiatric factors. The laboratory test results collectively indicate the measurements of 23 components derived from vital signs and blood tests results for the evaluation of the complete blood count. The 8 psychiatric factors known to affect the severity of mental illnesses are considered, viz. the family history, course and onset of an illness, etc. Retrospective data of 78 patients diagnosed with mental and behavioural disorders were collected from the Lady Hardinge Medical College & Smt. S.K, Hospital in New Delhi, India. The observations missing in the data are imputed using the non-parametric random forest algorithm. The multicollinearity is detected based on the variance inflation factor. Owing to the presence of multicollinearity, regularisation techniques such as ridge regression and extensions of the least absolute shrinkage and selection operator (LASSO), viz. adaptive and group LASSO are used for fitting the regression model. Optimal tuning parameter λ is obtained through 13-fold cross-validation. It was observed that the coefficients of the quantitative predictors extracted by the adaptive LASSO and the group of predictors extracted by the group LASSO were comparable to the coefficients obtained through ridge regression.
APA, Harvard, Vancouver, ISO, and other styles
35

Ding, Zhikun, Zhan Wang, Ting Hu, and Huilong Wang. "A Comprehensive Study on Integrating Clustering with Regression for Short-Term Forecasting of Building Energy Consumption: Case Study of a Green Building." Buildings 12, no. 10 (October 16, 2022): 1701. http://dx.doi.org/10.3390/buildings12101701.

Full text
Abstract:
Integrating clustering with regression has gained great popularity due to its excellent performance for building energy prediction tasks. However, there is a lack of studies on finding suitable regression models for integrating clustering and the combination of clustering and regression models that can achieve the best performance. Moreover, there is also a lack of studies on the optimal cluster number in the task of short-term forecasting of building energy consumption. In this paper, a comprehensive study is conducted on the integration of clustering and regression, which includes three types of clustering algorithms (K-means, K-medians, and Hierarchical clustering) and four types of representative regression models (Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Regression (SVR), Artificial Neural Network (ANN), and extreme gradient boosting (XGBoost)). A novel performance evaluation index (PI) dedicated to comparing the performance of two prediction models is proposed, which can comprehensively consider different performance indexes. A larger PI means a larger performance improvement. The results indicate that by integrating clustering, the largest PI for SVR, LASSO, XGBoost, and ANN is 2.41, 1.97, 1.57, and 1.12, respectively. On the other hand, the performance of regression models integrated with clustering algorithms from high to low is XGBoost, SVR, ANN, and LASSO. The results also show that the optimal cluster number determined by clustering evaluation metrics may not be the optimal number for the ensemble model (integration of clustering and regression model).
APA, Harvard, Vancouver, ISO, and other styles
36

Li, Shanshan, Jian Yu, Huimin Kang, and Jianfeng Liu. "Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data." Animals 12, no. 18 (September 14, 2022): 2419. http://dx.doi.org/10.3390/ani12182419.

Full text
Abstract:
Genomic selection (GS) is an efficient method to improve genetically economic traits. Feature selection is an important method for GS based on whole-genome sequencing (WGS) data. We investigated the prediction performance of GS of milk production traits using imputed WGS data on 7957 Chinese Holsteins. We used two regularized regression models, least absolute shrinkage and selection operator (LASSO) and elastic net (EN) for feature selection. For comparison, we performed genome-wide association studies based on a linear mixed model (LMM), and the N single nucleotide polymorphisms (SNPs) with the lowest p-values were selected (LMMLASSO and LMMEN), where N was the number of non-zero effect SNPs selected by LASSO or EN. GS was conducted using a genomic best linear unbiased prediction (GBLUP) model and several sets of SNPs: (1) selected WGS SNPs; (2) 50K SNP chip data; (3) WGS data; and (4) a combined set of selected WGS SNPs and 50K SNP chip data. The results showed that the prediction accuracies of GS with features selected using LASSO or EN were comparable to those using features selected with LMMLASSO or LMMEN. For milk and protein yields, GS using a combination of SNPs selected with LASSO and 50K SNP chip data achieved the best prediction performance, and GS using SNPs selected with LMMLASSO combined with 50K SNP chip data performed best for fat yield. The proposed method, feature selection using regularization regression models, provides a valuable novel strategy for WGS-based GS.
APA, Harvard, Vancouver, ISO, and other styles
37

Uniejewski, Bartosz, and Rafał Weron. "Efficient Forecasting of Electricity Spot Prices with Expert and LASSO Models." Energies 11, no. 8 (August 6, 2018): 2039. http://dx.doi.org/10.3390/en11082039.

Full text
Abstract:
Recent electricity price forecasting (EPF) studies suggest that the least absolute shrinkage and selection operator (LASSO) leads to well performing models that are generally better than those obtained from other variable selection schemes. By conducting an empirical study involving datasets from two major power markets (Nord Pool and PJM Interconnection), three expert models, two multi-parameter regression (called baseline) models and four variance stabilizing transformations combined with the seasonal component approach, we discuss the optimal way of implementing the LASSO. We show that using a complex baseline model with nearly 400 explanatory variables, a well chosen variance stabilizing transformation (asinh or N-PIT), and a procedure that recalibrates the LASSO regularization parameter once or twice a day indeed leads to significant accuracy gains compared to the typically considered EPF models. Moreover, by analyzing the structures of the best LASSO-estimated models, we identify the most important explanatory variables and thus provide guidelines to structuring better performing models.
APA, Harvard, Vancouver, ISO, and other styles
38

Yuan, Lei-Ming, Xiaofeng Yang, Xueping Fu, Jiao Yang, Xi Chen, Guangzao Huang, Xiaojing Chen, Limin Li, and Wen Shi. "Consensual Regression of Lasso-Sparse PLS Models for Near-Infrared Spectra of Food." Agriculture 12, no. 11 (October 29, 2022): 1804. http://dx.doi.org/10.3390/agriculture12111804.

Full text
Abstract:
In some cases, near-infrared spectra (NIRS) make the prediction of quantitative models unreliable, and the choice of a suitable number of latent variables (LVs) for partial least square (PLS) is difficult. In this case, a strategy of fusing member models with important information is gradually becoming valued in recent research. In this work, a series of PLS regression models were developed with an increasing number of LVs as member models. Then, the least absolute shrinkage and selection operator (Lasso) was employed as the model’s selection access to sparse uninformative ones among these PLS member models. Deviation weighted fusion (DW-F), partial least squares regression coefficient fusion (PLS-F), and ridge regression coefficient fusion (RR-F) were comparatively used further to fuse the above sparsed member models, respectively. Three spectral datasets, including six attributes in NIR data of corn, apple, and marzipan, respectively, were applied in order to validate the feasibility of this fusion algorithm. Six fusion models of the above attributes performed better than the general optimal PLS model, with a noticeable enhancement of root mean errors squared of prediction (RMSEP) arriving at its highest at 80%. It also reduced more than half of the spectral bands; the DW-F especially showed its excellent fusing capacity and obtained the best performance. Results show that the preferred strategy of DW-F model combined with Lasso selection can make full use of spectral information, and significantly improve the prediction accuracy of fusion models.
APA, Harvard, Vancouver, ISO, and other styles
39

Raeisi Shahraki, Hadi, Saeedeh Pourahmad, and Seyyed Mohammad Taghi Ayatollahi. "Identifying the Prognosis Factors in Death after Liver Transplantation via Adaptive LASSO in Iran." Journal of Environmental and Public Health 2016 (2016): 1–6. http://dx.doi.org/10.1155/2016/7620157.

Full text
Abstract:
Despite the widespread use of liver transplantation as a routine therapy in liver diseases, the effective factors on its outcomes are still controversial. This study attempted to identify the most effective factors on death after liver transplantation. For this purpose, modified least absolute shrinkage and selection operator (LASSO), called Adaptive LASSO, was utilized. One of the best advantages of this method is considering high number of factors. Therefore, in a historical cohort study from 2008 to 2013, the clinical findings of 680 patients undergoing liver transplant surgery were considered. Ridge and Adaptive LASSO regression methods were then implemented to identify the most effective factors on death. To compare the performance of these two models, receiver operating characteristic (ROC) curve was used. According to the results, 12 factors in Ridge regression and 9 ones in Adaptive LASSO regression were significant. The area under the ROC curve (AUC) of Adaptive LASSO was equal to 89% (95% CI: 86%–91%), which was significantly greater than Ridge regression (64%, 95% CI: 61%–68%) (p<0.001). As a conclusion, the significant factors and the performance criteria revealed the superiority of Adaptive LASSO method as a penalized model versus traditional regression model in the present study.
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Fuhai, Hui Xin, Jidong Zhang, Mingqiang Fu, Jingmin Zhou, and Zhexun Lian. "Prediction model of in-hospital mortality in intensive care unit patients with heart failure: machine learning-based, retrospective analysis of the MIMIC-III database." BMJ Open 11, no. 7 (July 2021): e044779. http://dx.doi.org/10.1136/bmjopen-2020-044779.

Full text
Abstract:
ObjectiveThe predictors of in-hospital mortality for intensive care units (ICUs)-admitted heart failure (HF) patients remain poorly characterised. We aimed to develop and validate a prediction model for all-cause in-hospital mortality among ICU-admitted HF patients.DesignA retrospective cohort study.Setting and participantsData were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. Data on 1177 heart failure patients were analysed.MethodsPatients meeting the inclusion criteria were identified from the MIMIC-III database and randomly divided into derivation (n=825, 70%) and a validation (n=352, 30%) group. Independent risk factors for in-hospital mortality were screened using the extreme gradient boosting (XGBoost) and the least absolute shrinkage and selection operator (LASSO) regression models in the derivation sample. Multivariate logistic regression analysis was used to build prediction models in derivation group, and then validated in validation cohort. Discrimination, calibration and clinical usefulness of the predicting model were assessed using the C-index, calibration plot and decision curve analysis. After pairwise comparison, the best performing model was chosen to build a nomogram according to the regression coefficients.ResultsAmong the 1177 admissions, in-hospital mortality was 13.52%. In both groups, the XGBoost, LASSO regression and Get With the Guidelines-Heart Failure (GWTG-HF) risk score models showed acceptable discrimination. The XGBoost and LASSO regression models also showed good calibration. In pairwise comparison, the prediction effectiveness was higher with the XGBoost and LASSO regression models than with the GWTG-HF risk score model (p<0.05). The XGBoost model was chosen as our final model for its more concise and wider net benefit threshold probability range and was presented as the nomogram.ConclusionsOur nomogram enabled good prediction of in-hospital mortality in ICU-admitted HF patients, which may help clinical decision-making for such patients.
APA, Harvard, Vancouver, ISO, and other styles
41

Gilbraith, William E., J. Chance Carter, Kristl L. Adams, Karl S. Booksh, and Joshua M. Ottaway. "Improving Prediction of Peroxide Value of Edible Oils Using Regularized Regression Models." Molecules 26, no. 23 (November 30, 2021): 7281. http://dx.doi.org/10.3390/molecules26237281.

Full text
Abstract:
We present four unique prediction techniques, combined with multiple data pre-processing methods, utilizing a wide range of both oil types and oil peroxide values (PV) as well as incorporating natural aging for peroxide creation. Samples were PV assayed using a standard starch titration method, AOCS Method Cd 8-53, and used as a verified reference method for PV determination. Near-infrared (NIR) spectra were collected from each sample in two unique optical pathlengths (OPLs), 2 and 24 mm, then fused into a third distinct set. All three sets were used in partial least squares (PLS) regression, ridge regression, LASSO regression, and elastic net regression model calculation. While no individual regression model was established as the best, global models for each regression type and pre-processing method show good agreement between all regression types when performed in their optimal scenarios. Furthermore, small spectral window size boxcar averaging shows prediction accuracy improvements for edible oil PVs. Best-performing models for each regression type are: PLS regression, 25 point boxcar window fused OPL spectral information RMSEP = 2.50; ridge regression, 5 point boxcar window, 24 mm OPL, RMSEP = 2.20; LASSO raw spectral information, 24 mm OPL, RMSEP = 1.80; and elastic net, 10 point boxcar window, 24 mm OPL, RMSEP = 1.91. The results show promising advancements in the development of a full global model for PV determination of edible oils.
APA, Harvard, Vancouver, ISO, and other styles
42

Deng, Hao, Jianghong Chen, Biqin Song, and Zhibin Pan. "Error Bound of Mode-Based Additive Models." Entropy 23, no. 6 (May 22, 2021): 651. http://dx.doi.org/10.3390/e23060651.

Full text
Abstract:
Due to their flexibility and interpretability, additive models are powerful tools for high-dimensional mean regression and variable selection. However, the least-squares loss-based mean regression models suffer from sensitivity to non-Gaussian noises, and there is also a need to improve the model’s robustness. This paper considers the estimation and variable selection via modal regression in reproducing kernel Hilbert spaces (RKHSs). Based on the mode-induced metric and two-fold Lasso-type regularizer, we proposed a sparse modal regression algorithm and gave the excess generalization error. The experimental results demonstrated the effectiveness of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
43

Jovanovic, Milos, Sandro Radovanovic, Milan Vukicevic, Sven Van Poucke, and Boris Delibasic. "Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression." Artificial Intelligence in Medicine 72 (September 2016): 12–21. http://dx.doi.org/10.1016/j.artmed.2016.07.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Kimmatkar, N. V., and B. Vijaya Babu. "Human Emotion Detection with Electroencephalography Signals and Accuracy Analysis Using Feature Fusion Techniques and a Multimodal Approach for Multiclass Classification." Engineering, Technology & Applied Science Research 12, no. 4 (August 1, 2022): 9012–17. http://dx.doi.org/10.48084/etasr.5073.

Full text
Abstract:
Biological brain signals may be used to identify emotions in a variety of ways, with accuracy depended on the methods used for signal processing, feature extraction, feature selection, and classification. The major goal of the current work was to use an adaptive channel selection and classification strategy to improve the effectiveness of emotion detection utilizing brain signals. Using different features picked by feature fusion approaches, the accuracy of existing classification models' emotion detection is assessed. Statistical modeling is used to determine time-domain and frequency-domain properties. Multiclass classification accuracy is examined using Neural Networks (NNs), Lasso regression, k-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Random Forest (RF). After performing hyperparameter tuning, a remarkable increase in accuracy is achieved using Lasso regression, while RF performed well for all the feature sets. 78.02% and 76.77% accuracy were achieved for a small and noisy 24 feature dataset by Lasso regression and RF respectively whereas 76.54% accuracy is achieved by Lasso regression with the backward elimination wrapper method.
APA, Harvard, Vancouver, ISO, and other styles
45

Muchisha, Nadya Dwi, Novian Tamara, Andriansyah Andriansyah, and Agus M. Soleh. "Nowcasting Indonesia’s GDP Growth Using Machine Learning Algorithms." Indonesian Journal of Statistics and Its Applications 5, no. 2 (June 30, 2021): 355–68. http://dx.doi.org/10.29244/ijsa.v5i2p355-368.

Full text
Abstract:
GDP is very important to be monitored in real time because of its usefulness for policy making. We built and compared the ML models to forecast real-time Indonesia's GDP growth. We used 18 variables that consist a number of quarterly macroeconomic and financial market statistics. We have evaluated the performance of six popular ML algorithms, such as Random Forest, LASSO, Ridge, Elastic Net, Neural Networks, and Support Vector Machines, in doing real-time forecast on GDP growth from 2013:Q3 to 2019:Q4 period. We used the RMSE, MAD, and Pearson correlation coefficient as measurements of forecast accuracy. The results showed that the performance of all these models outperformed AR (1) benchmark. The individual model that showed the best performance is random forest. To gain more accurate forecast result, we run forecast combination using equal weighting and lasso regression. The best model was obtained from forecast combination using lasso regression with selected ML models, which are Random Forest, Ridge, Support Vector Machine, and Neural Network.
APA, Harvard, Vancouver, ISO, and other styles
46

Aronsson, Linus, Roland Andersson, and Daniel Ansari. "Artificial neural networks versus LASSO regression for the prediction of long-term survival after surgery for invasive IPMN of the pancreas." PLOS ONE 16, no. 3 (March 25, 2021): e0249206. http://dx.doi.org/10.1371/journal.pone.0249206.

Full text
Abstract:
Prediction of long-term survival in patients with invasive intraductal papillary mucinous neoplasm (IPMN) of the pancreas may aid in patient assessment, risk stratification and personalization of treatment. This study aimed to investigate the predictive ability of artificial neural networks (ANN) and LASSO regression in terms of 5-year disease-specific survival. ANN work in a non-linear fashion, having a potential advantage in analysis of variables with complex correlations compared to regression models. LASSO is a type of regression analysis facilitating variable selection and regularization. A total of 440 patients undergoing surgical treatment for invasive IPMN of the pancreas registered in the Surveillance, Epidemiology and End Results (SEER) database between 2004 and 2016 were analyzed. The dataset was prior to analysis randomly split into a modelling and test set (7:3). The accuracy, precision and F1 score for predicting mortality were 0.82, 0.83 and 0.89, respectively for ANN with variable selection compared to 0.79, 0.85 and 0.87, respectively for the LASSO-model. ANN using all variables showed similar accuracy, precision and F1 score of 0.81, 0.85 and 0.88, respectively compared to a logistic regression analysis. McNemar´s test showed no statistical difference between the models. The models showed high and similar performance with regard to accuracy and precision for predicting 5-year survival status.
APA, Harvard, Vancouver, ISO, and other styles
47

Mahalani, Annisa Juwita, and Nur Azizah Komara Rifai. "Least Absolute Shrinkage and Selection Operator (LASSO) untuk Mengatasi Multikolinearitas pada Model Regresi Linear Berganda." Bandung Conference Series: Statistics 2, no. 2 (July 28, 2022): 119–25. http://dx.doi.org/10.29313/bcss.v2i2.3438.

Full text
Abstract:
Abstract. Linear regression analysis is one of them used to see the intersection between non-free variables and free variables. If there is more than one free variable, multiple linear regression analysis is used. Regression models are said to be good if the assumptions are met. One of them is the absence of multicollinearity, namely a high correlation or relationship between independent variables. In the event of multicholinearity, it can be resolved using the Least Absolute Shrinkage And Selection Operator (LASSO) regression method. The LASSO regression method was first introduced by Robert Tibshirani in 1996. This method can select variables and overcome multicholinearity. To make it easier to computate LASSO, you can use the Least Angle Regression (LARS) algorithm. In this study, we will discuss the application of the LASSO regression method to data on the number of poor people in Indonesia in 2020. By looking at the correlation coefficient and VIF values, there are 3 variables that indicate multicollinearity problems. The gross regional domestic product variable is the variable that is first entered in the model, meaning that the variable is most correlated with error. Abstrak. Analisis regresi linear salah satunya digunakan untuk melihat perngaruh antara variabel tak bebas dengan variabel bebas. Apabila variabel bebas lebih dari satu maka digunakan analisis regresi linear berganda. Model regresi dikatakan baik apabila asumsinya terpenuhi. Salah satunya adalah tidak adanya multikolinearitas yaitu korelasi atau hubungan yang tinggi diantara variabel bebas. Apabila terjadi multikolinearitas, dapat diatasi dengan menggunakan metode regresi Least Absolute Shrinkage And Selection Operator (LASSO). Metode regresi LASSO diperkenalkan pertama kali oleh Robert Tibshirani pada tahun 1996. Metode ini dapat menyeleksi variabel dan mengatasi multikolinearitas. Untuk mempermudah komputasi LASSO, dapat menggunakan algoritma Least Angle Regression (LARS). Dalam penelitian ini, akan dibahas penerapan metode regresi LASSO pada data jumlah penduduk miskin di Indonesia tahun 2020. Dengan melihat nilai koefisien korelasi dan nilai VIF, terdapat 3 variabel yang terindikasi masalah multikolinearitas. Variabel produk domestik regional bruto adalah variabel yang pertama kali masuk pada model artinya variabel tersebut paling berkorelasi dengan galat.
APA, Harvard, Vancouver, ISO, and other styles
48

Przednowek, Krzysztof, Zbigniew Barabasz, Maria Zadarko-Domaradzka, Karolina Przednowek, Edyta Nizioł-Babiarz, Maciej Huzarski, Klaudia Sibiga, Bartosz Dziadek, and Emilian Zadarko. "Predictive Modeling of VO2max Based on 20 m Shuttle Run Test for Young Healthy People." Applied Sciences 8, no. 11 (November 10, 2018): 2213. http://dx.doi.org/10.3390/app8112213.

Full text
Abstract:
This study presents mathematical models for predicting VO2max based on a 20 m shuttle run and anthropometric parameters. The research was conducted with data provided by 308 young healthy people (aged 20.6 ± 1.6). The research group includes 154 females (aged 20.3 ± 1.2) and 154 males (aged 20.8 ± 1.8). Twenty-four variables were used to build the models, including one dependent variable and 23 independent variables. The predictive methods of analysis include: the classical model of ordinary least squares (OLS) regression, regularized methods such as ridge regression and Lasso regression, artificial neural networks such as the multilayer perceptron (MLP) and radial basis function (RBF) network. All models were calculated in R software (version 3.5.0, R Foundation for Statistical Computing, Vienna, Austria). The study also involved variable selection methods (Lasso and stepwise regressions) to identify optimum predictors for the analysed study group. In order to compare and choose the best model, leave-one-out cross-validation (LOOCV) was used. The paper presents three types of models: for females, males and the whole group. An analysis has revealed that the models for females ( RMSE C V = 4.07 mL·kg−1·min−1) are characterised by a smaller degree of error as compared to male models ( RMSE C V = 5.30 mL·kg−1·min−1). The model accounting for sex generated an error level of RMSE C V = 4.78 mL·kg−1·min−1.
APA, Harvard, Vancouver, ISO, and other styles
49

Xie, Xiaodong, and Shaozhi Zheng. "Group MCP for Cox Models with Time-Varying Coefficients." Journal of Systems Science and Information 4, no. 5 (October 25, 2016): 476–88. http://dx.doi.org/10.21078/jssi-2016-476-13.

Full text
Abstract:
AbstractCox’s proportional hazard models with time-varying coefficients have much flexibility for modeling the dynamic of covariate effects. Although many variable selection procedures have been developed for Coxs proportional hazard model, the study of such models with time-varying coefficients appears to be limited. The variable selection methods involving nonconvex penalty function, such as the minimax concave penalty (MCP), introduces numerical challenge, but they still have attractive theoretical properties and were indicated that they are worth to be alternatives of other competitive methods. We propose a group MCP method that uses B-spline basis to expand coefficients and maximizes the log partial likelihood with nonconvex penalties on regression coefficients in groups. A fast, iterative group shooting algorithm is carried out for model selection and estimation. Under some appropriate conditions, the simulated example shows that our method performs competitively with the group lasso method. By comparison, the group MCP method and group lasso select the same amount of important covariates, but group MCP method tends to outperform the group lasso method in selection of unimportant covariates.
APA, Harvard, Vancouver, ISO, and other styles
50

Ornstein, Joseph T. "Stacked Regression and Poststratification." Political Analysis 28, no. 2 (December 23, 2019): 293–301. http://dx.doi.org/10.1017/pan.2019.43.

Full text
Abstract:
I develop a procedure for estimating local-area public opinion called stacked regression and poststratification (SRP), a generalization of classical multilevel regression and poststratification (MRP). This procedure employs a diverse ensemble of predictive models—including multilevel regression, LASSO, k-nearest neighbors, random forest, and gradient boosting—to improve the cross-validated fit of the first-stage predictions. In a Monte Carlo simulation, SRP significantly outperforms MRP when there are deep interactions in the data generating process, without requiring the researcher to specify a complex parametric model in advance. In an empirical application, I show that SRP produces superior local public opinion estimates on a broad range of issue areas, particularly when trained on large datasets.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography