Contents
Academic literature on the topic 'Régressions pénalisées'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Régressions pénalisées.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Régressions pénalisées"
Jacquemin, J., Y. Drouet, J. Le Guevelou, S. Servagi-Vernat, and J. Thariat. "Modélisation de la survie de patients atteints d’un cancer anaplasique de la thyroïde par régression pénalisée LASSO." Revue d'Épidémiologie et de Santé Publique 69 (June 2021): S50. http://dx.doi.org/10.1016/j.respe.2021.04.081.
Full textTernès, N., F. Rotolo, and S. Michiels. "Régression pénalisée pour réduire la sélection de faux positifs dans un modèle de Cox à haute dimension." Revue d'Épidémiologie et de Santé Publique 62 (September 2014): S190—S191. http://dx.doi.org/10.1016/j.respe.2014.06.063.
Full textMansiaux, Y., and F. Carrat. "Détection d’associations dans un grand jeu de données épidémiologie : une comparaison du data mining, de la régression logistique conventionnelle et de la régression logistique pénalisée pour identifier des facteurs associés à la grippe H1N1." Revue d'Épidémiologie et de Santé Publique 62 (August 2014): S124. http://dx.doi.org/10.1016/j.respe.2014.05.026.
Full textDissertations / Theses on the topic "Régressions pénalisées"
Gnanguenon, guesse Girault. "Modélisation et visualisation des liens entre cinétiques de variables agro-environnementales et qualité des produits dans une approche parcimonieuse et structurée." Electronic Thesis or Diss., Montpellier, 2021. http://www.theses.fr/2021MONTS139.
Full textThe development of digital agriculture allows to observe at high frequency the dynamics of production according to the climate. Data from these dynamic observations can be considered as functional data. To analyze this new type of data, it is necessary to extend the usual statistical tools to the functional case or develop new ones.In this thesis, we have proposed a new approach (SpiceFP: Sparse and Structured Procedure to Identify Combined Effects of Functional Predictors) to explain the variations of a scalar response variable by two or three functional predictors in a context of joint influence of these predictors. Particular attention was paid to the interpretability of the results through the use of combined interval classes defining a partition of the observation domain of the explanatory factors. Recent developments around LASSO (Least Absolute Shrinkage and Selection Operator) models have been adapted to estimate the areas of influence in the partition via a generalized penalized regression. The approach also integrates a double selection, of models (among the possible partitions) and of variables (areas inside a given partition) based on AIC and BIC information criteria. The methodological description of the approach, its study through simulations as well as a case study based on real data have been presented in chapter 2 of this thesis.The real data used in this thesis were obtained from a vineyard experiment aimed at understanding the impact of climate change on anthcyanins accumulation in berries. Analysis of these data in chapter 3 using SpiceFP and one extension identified a negative impact of morning combinations of low irradiance (lower than about 100 µmol/s/m2 or 45 µmol/s/m2 depending on the advanced-delayed state of the berries) and high temperature (higher than about 25°C). A slight difference associated with overnight temperature occurred between these effects identified in the morning.In chapter 4 of this thesis, we propose an implementation of the proposed approach as an R package. This implementation provides a set of functions allowing to build the class intervals according to linear or logarithmic scales, to transform the functional predictors using the joint class intervals and finally to execute the approach in two or three dimensions. Other functions help to perform post-processing or allow the user to explore other models than those selected by the approach, such as an average of different models.Keywords: Penalized regressions, Interaction, information criteria, scalar-on-function, interpretable coefficients,grapevine microclimate
Mansiaux, Yohann. "Analyse d'un grand jeu de données en épidémiologie : problématiques et perspectives méthodologiques." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066272/document.
Full textThe increasing size of datasets is a growing issue in epidemiology. The CoPanFlu-France cohort(1450 subjects), intended to study H1N1 pandemic influenza infection risk as a combination of biolo-gical, environmental, socio-demographic and behavioral factors, and in which hundreds of covariatesare collected for each patient, is a good example. The statistical methods usually employed to exploreassociations have many limits in this context. We compare the contribution of data-driven exploratorymethods, assuming the absence of a priori hypotheses, to hypothesis-driven methods, requiring thedevelopment of preliminary hypotheses.Firstly a data-driven study is presented, assessing the ability to detect influenza infection determi-nants of two data mining methods, the random forests (RF) and the boosted regression trees (BRT), ofthe conventional logistic regression framework (Univariate Followed by Multivariate Logistic Regres-sion - UFMLR) and of the Least Absolute Shrinkage and Selection Operator (LASSO), with penaltyin multivariate logistic regression to achieve a sparse selection of covariates. A simulation approachwas used to estimate the True (TPR) and False (FPR) Positive Rates associated with these methods.Between three and twenty-four determinants of infection were identified, the pre-epidemic antibodytiter being the unique covariate selected with all methods. The mean TPR were the highest for RF(85%) and BRT (80%), followed by the LASSO (up to 78%), while the UFMLR methodology wasinefficient (below 50%). A slight increase of alpha risk (mean FPR up to 9%) was observed for logisticregression-based models, LASSO included, while the mean FPR was 4% for the data-mining methods.Secondly, we propose a hypothesis-driven causal analysis of the infection risk, with a structural-equation model (SEM). We exploited the SEM specificity of modeling latent variables to study verydiverse factors, their relative impact on the infection, as well as their eventual relationships. Only thelatent variables describing host susceptibility (modeled by the pre-epidemic antibody titer) and com-pliance with preventive behaviors were directly associated with infection. The behavioral factors des-cribing risk perception and preventive measures perception positively influenced compliance with pre-ventive behaviors. The intensity (number and duration) of social contacts was not associated with theinfection.This thesis shows the necessity of considering novel statistical approaches for the analysis of largedatasets in epidemiology. Data mining and LASSO are credible alternatives to the tools generally usedto explore associations with a high number of variables. SEM allows the integration of variables des-cribing diverse dimensions and the explicit modeling of their relationships ; these models are thereforeof major interest in a multidisciplinary study as CoPanFlu
Detais, Amélie. "Maximum de vraisemblance et moindre carrés pénalisés dans des modèles de durée de vie censurées." Toulouse 3, 2008. http://thesesups.ups-tlse.fr/820/.
Full textLife data analysis is used in various application fields. Different methods have been proposed for modelling such data. In this thesis, we are interested in two distinct modelisation types, the stratified Cox model with randomly missing strata indicators and the right-censored linear regression model. We propose methods for estimating the parameters and establish the asymptotic properties of the obtained estimators in each of these models. First, we consider a generalization of the Cox model, allowing different groups, named strata, of the population to have distinct baseline intensity functions, whereas the regression parameter is shared by all the strata. In this stratified proportional intensity model, we are interested in the parameters estimation when the strata indicator is missing for some of the population individuals. Nonparametric maximum likelihood estimators are proposed for the model parameters and their consistency and asymptotic normality are established. We show the efficiency of the regression parameter and obtain consistent estimators of its variance. The Expectation-Maximization algorithm is proposed and developed for the evaluation of the estimators of the model parameters. Second, we are interested in the regression linear model when the response data is randomly right-censored. We introduce a new estimator of the regression parameter, which minimizes a Kaplan-Meier-weighted penalized least squares criterion. Results of consistency and asymptotic normality are obtained and a simulation study is conducted in order to investigate the small sample properties of this LASSO-type estimator. The bootstrap method is used for the estimation of the asymptotic variance
Soret, Perrine. "Régression pénalisée de type Lasso pour l’analyse de données biologiques de grande dimension : application à la charge virale du VIH censurée par une limite de quantification et aux données compositionnelles du microbiote." Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0254.
Full textIn clinical studies and thanks to technological progress, the amount of information collected in the same patient continues to grow leading to situations where the number of explanatory variables is greater than the number of individuals. The Lasso method proved to be appropriate to circumvent over-adjustment problems in high-dimensional settings.This thesis is devoted to the application and development of Lasso-penalized regression for clinical data presenting particular structures.First, in patients with the human immunodeficiency virus, mutations in the virus's genetic structure may be related to the development of drug resistance. The prediction of the viral load from (potentially large) mutations allows guiding treatment choice.Below a threshold, the viral load is undetectable, data are left-censored. We propose two new Lasso approaches based on the Buckley-James algorithm, which imputes censored values by a conditional expectation. By reversing the response, we obtain a right-censored problem, for which non-parametric estimates of the conditional expectation have been proposed in survival analysis. Finally, we propose a parametric estimation based on a Gaussian hypothesis.Secondly, we are interested in the role of the microbiota in the deterioration of respiratory health. The microbiota data are presented as relative abundances (proportion of each species per individual, called compositional data) and they have a phylogenetic structure.We have established a state of the art methods of statistical analysis of microbiota data. Due to the novelty, few recommendations exist on the applicability and effectiveness of the proposed methods. A simulation study allowed us to compare the selection capacity of penalization methods proposed specifically for this type of data.Then we apply this research to the analysis of the association between bacteria / fungi and the decline of pulmonary function in patients with cystic fibrosis from the MucoFong project
Sorba, Olivier. "Pénalités minimales pour la sélection de modèle." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS043/document.
Full textL. Birgé and P. Massart proved that the minimum penalty phenomenon occurs in Gaussian model selection when the model family arises from complete variable selection among independent variables. We extend some of their results to discrete Gaussian signal segmentation when the model family corresponds to a sufficiently rich family of partitions of the signal's support. This is the case of regression trees. We show that the same phenomenon occurs in the context of density estimation. The richness of the model family can be related to a certain form of isotropy. In this respect the minimum penalty phenomenon is intrinsic. To corroborate this point of view, we show that the minimum penalty phenomenon occurs when the models are chosen randomly under an isotropic law
Gannaz, Irène. "Estimation par ondelettes dans les modèles partiellement linéaires." Phd thesis, Grenoble 1, 2007. http://www.theses.fr/2007GRE10281.
Full textThis dissertation is concerned with the use of wavelet methods in semiparametric partially linear models. These models are composed by a linear component with unknown regression coefficients and an unknown nonparametric function. The aim is to estimate both of the predictors, possibly under the presence of correlation. A wavelet thresholding based procedure is built to estimate the nonparametric part of the model using a penalized least squares criterion. We establish a connection between different thresholding schemes and M-estimators in linear models with outliers, where the wavelet coefficients of the nonparametric part of the model are considered as outliers. We also propose an estimate for the noise variance. Some asymptotic results of the estimates of both the parametric and the nonparametric part are given. Their behavior is close to optimality, up to a logarithmic factor, under usual restrictions for the correlation between variables. Simulations illustrate the properties of the proposed methodology and compare it with existing methods. An application to real data from functional IRM is also presented. The last part of this work deals with the extension to nonequidistant observations for the nonparametric part, comparing in particular via simulations nonparametric estimation procedures
Moumouni, Kairou. "Etude et conception d'un modèle mixte semiparamétrique stochastique pour l'analyse des données longitudinales environnementales." Phd thesis, Université Rennes 2, 2005. http://tel.archives-ouvertes.fr/tel-00012164.
Full textDans une deuxième partie, une extension de la méthode d'influence locale de Cook au modèle mixte modifié est proposée, elle fournit une analyse de sensibilité permettant de détecter les effets de certaines perturbations sur les composantes structurelles du modèle. Quelques propriétés asymptotiques de la matrice d'influence locale sont exhibées.
Enfin, le modèle proposé est appliqué à deux jeux de données réelles : une analyse des données de concentrations de nitrates issues de différentes stations de mesures d'un bassin versant, puis une analyse de la pollution bactériologiques d'eaux de baignades.
Gannaz, Irène. "Estimation par ondelettes dans les modèles partiellement linéaires." Phd thesis, Université Joseph Fourier (Grenoble), 2007. http://tel.archives-ouvertes.fr/tel-00197146.
Full textNguyen, Thi Le Thu. "Sequential Monte-Carlo sampler for Bayesian inference in complex systems." Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10058/document.
Full textIn many problems, complex non-Gaussian and/or nonlinear models are required to accurately describe a physical system of interest. In such cases, Monte Carlo algorithms are remarkably flexible and extremely powerful to solve such inference problems. However, in the presence of high-dimensional and/or multimodal posterior distribution, standard Monte-Carlo techniques could lead to poor performance. In this thesis, the study is focused on Sequential Monte-Carlo Sampler, a more robust and efficient Monte Carlo algorithm. Although this approach presents many advantages over traditional Monte-Carlo methods, the potential of this emergent technique is however largely underexploited in signal processing. In this thesis, we therefore focus our study on this technique by aiming at proposing some novel strategies that will improve the efficiency and facilitate practical implementation of the SMC sampler. Firstly, we propose an automatic and adaptive strategy that selects the sequence of distributions within the SMC sampler that approximately minimizes the asymptotic variance of the estimator of the posterior normalization constant. Secondly, we present an original contribution in order to improve the global efficiency of the SMC sampler by introducing some correction mechanisms that allow the use of the particles generated through all the iterations of the algorithm (instead of only particles from the last iteration). Finally, to illustrate the usefulness of such approaches, we apply the SMC sampler integrating our proposed improvement strategies to two challenging practical problems: Multiple source localization in wireless sensor networks and Bayesian penalized regression
Ternes, Nils. "Identification de biomarqueurs prédictifs de la survie et de l'effet du traitement dans un contexte de données de grande dimension." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS278/document.
Full textWith the recent revolution in genomics and in stratified medicine, the development of molecular signatures is becoming more and more important for predicting the prognosis (prognostic biomarkers) and the treatment effect (predictive biomarkers) of each patient. However, the large quantity of information has rendered false positives more and more frequent in biomedical research. The high-dimensional space (i.e. number of biomarkers ≫ sample size) leads to several statistical challenges such as the identifiability of the models, the instability of the selected coefficients or the multiple testing issue.The aim of this thesis was to propose and evaluate statistical methods for the identification of these biomarkers and the individual predicted survival probability for new patients, in the context of the Cox regression model. For variable selection in a high-dimensional setting, the lasso penalty is commonly used. In the prognostic setting, an empirical extension of the lasso penalty has been proposed to be more stringent on the estimation of the tuning parameter λ in order to select less false positives. In the predictive setting, focus has been given to the biomarker-by-treatment interactions in the setting of a randomized clinical trial. Twelve approaches have been proposed for selecting these interactions such as lasso (standard, adaptive, grouped or ridge+lasso), boosting, dimension reduction of the main effects and a model incorporating arm-specific biomarker effects. Finally, several strategies were studied to obtain an individual survival prediction with a corresponding confidence interval for a future patient from a penalized regression model, while limiting the potential overfit.The performance of the approaches was evaluated through simulation studies combining null and alternative scenarios. The methods were also illustrated in several data sets containing gene expression data in breast cancer