Siga este enlace para ver otros tipos de publicaciones sobre el tema: Heteroscedastic Multivariate Linear Regression.

Tesis sobre el tema "Heteroscedastic Multivariate Linear Regression"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Heteroscedastic Multivariate Linear Regression".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Kuljus, Kristi. "Rank Estimation in Elliptical Models : Estimation of Structured Rank Covariance Matrices and Asymptotics for Heteroscedastic Linear Regression". Doctoral thesis, Uppsala universitet, Matematisk statistik, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-9305.

Texto completo
Resumen
This thesis deals with univariate and multivariate rank methods in making statistical inference. It is assumed that the underlying distributions belong to the class of elliptical distributions. The class of elliptical distributions is an extension of the normal distribution and includes distributions with both lighter and heavier tails than the normal distribution. In the first part of the thesis the rank covariance matrices defined via the Oja median are considered. The Oja rank covariance matrix has two important properties: it is affine equivariant and it is proportional to the inverse of the regular covariance matrix. We employ these two properties to study the problem of estimating the rank covariance matrices when they have a certain structure. The second part, which is the main part of the thesis, is devoted to rank estimation in linear regression models with symmetric heteroscedastic errors. We are interested in asymptotic properties of rank estimates. Asymptotic uniform linearity of a linear rank statistic in the case of heteroscedastic variables is proved. The asymptotic uniform linearity property enables to study asymptotic behaviour of rank regression estimates and rank tests. Existing results are generalized and it is shown that the Jaeckel estimate is consistent and asymptotically normally distributed also for heteroscedastic symmetric errors.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Bai, Xiuqin. "Robust mixtures of regression models". Diss., Kansas State University, 2014. http://hdl.handle.net/2097/18683.

Texto completo
Resumen
Doctor of Philosophy
Department of Statistics
Kun Chen and Weixin Yao
This proposal contains two projects that are related to robust mixture models. In the robust project, we propose a new robust mixture of regression models (Bai et al., 2012). The existing methods for tting mixture regression models assume a normal distribution for error and then estimate the regression param- eters by the maximum likelihood estimate (MLE). In this project, we demonstrate that the MLE, like the least squares estimate, is sensitive to outliers and heavy-tailed error distributions. We propose a robust estimation procedure and an EM-type algorithm to estimate the mixture regression models. Using a Monte Carlo simulation study, we demonstrate that the proposed new estimation method is robust and works much better than the MLE when there are outliers or the error distribution has heavy tails. In addition, the proposed robust method works comparably to the MLE when there are no outliers and the error is normal. In the second project, we propose a new robust mixture of linear mixed-effects models. The traditional mixture model with multiple linear mixed effects, assuming Gaussian distribution for random and error parts, is sensitive to outliers. We will propose a mixture of multiple linear mixed t-distributions to robustify the estimation procedure. An EM algorithm is provided to and the MLE under the assumption of t- distributions for error terms and random mixed effects. Furthermore, we propose to adaptively choose the degrees of freedom for the t-distribution using profile likelihood. In the simulation study, we demonstrate that our proposed model works comparably to the traditional estimation method when there are no outliers and the errors and random mixed effects are normally distributed, but works much better if there are outliers or the distributions of the errors and random mixed effects have heavy tails.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Solomon, Mary Joanna. "Multivariate Analysis of Korean Pop Music Audio Features". Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1617105874719868.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Zuber, Verena. "A Multivariate Framework for Variable Selection and Identification of Biomarkers in High-Dimensional Omics Data". Doctoral thesis, Universitätsbibliothek Leipzig, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-101223.

Texto completo
Resumen
In this thesis, we address the identification of biomarkers in high-dimensional omics data. The identification of valid biomarkers is especially relevant for personalized medicine that depends on accurate prediction rules. Moreover, biomarkers elucidate the provenance of disease, or molecular changes related to disease. From a statistical point of view the identification of biomarkers is best cast as variable selection. In particular, we refer to variables as the molecular attributes under investigation, e.g. genes, genetic variation, or metabolites; and we refer to observations as the specific samples whose attributes we investigate, e.g. patients and controls. Variable selection in high-dimensional omics data is a complicated challenge due to the characteristic structure of omics data. For one, omics data is high-dimensional, comprising cellular information in unprecedented details. Moreover, there is an intricate correlation structure among the variables due to e.g internal cellular regulation, or external, latent factors. Variable selection for uncorrelated data is well established. In contrast, there is no consensus on how to approach variable selection under correlation. Here, we introduce a multivariate framework for variable selection that explicitly accounts for the correlation among markers. In particular, we present two novel quantities for variable importance: the correlation-adjusted t (CAT) score for classification, and the correlation-adjusted (marginal) correlation (CAR) score for regression. The CAT score is defined as the Mahalanobis-decorrelated t-score vector, and the CAR score as the Mahalanobis-decorrelated correlation between the predictor variables and the outcome. We derive the CAT and CAR score from a predictive point of view in linear discriminant analysis and regression; both quantities assess the weight of a decorrelated and standardized variable on the prediction rule. Furthermore, we discuss properties of both scores and relations to established quantities. Above all, the CAT score decomposes Hotelling’s T 2 and the CAR score the proportion of variance explained. Notably, the decomposition of total variance into explained and unexplained variance in the linear model can be rewritten in terms of CAR scores. To render our approach applicable on high-dimensional omics data we devise an efficient algorithm for shrinkage estimates of the CAT and CAR score. Subsequently, we conduct extensive simulation studies to investigate the performance of our novel approaches in ranking and prediction under correlation. Here, CAT and CAR scores consistently improve over marginal approaches in terms of more true positives selected and a lower model error. Finally, we illustrate the application of CAT and CAR score on real omics data. In particular, we analyze genomics, transcriptomics, and metabolomics data. We ascertain that CAT and CAR score are competitive or outperform state of the art techniques in terms of true positives detected and prediction error.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Mahmoud, Mahmoud A. "The Monitoring of Linear Profiles and the Inertial Properties of Control Charts". Diss., Virginia Tech, 2004. http://hdl.handle.net/10919/29544.

Texto completo
Resumen
The Phase I analysis of data when the quality of a process or product is characterized by a linear function is studied in this dissertation. It is assumed that each sample collected over time in the historical data set consists of several bivariate observations for which a simple linear regression model is appropriate, a situation common in calibration applications. Using a simulation study, the researcher compares the performance of some of the recommended approaches used to assess the stability of the process. Also in this dissertation, a method based on using indicator variables in a multiple regression model is proposed. This dissertation also proposes a change point approach based on the segmented regression technique for testing the constancy of the regression parameters in a linear profile data set. The performance of the proposed change point method is compared to that of the most effective Phase I linear profile control chart approaches using a simulation study. The advantage of the proposed change point method over the existing methods is greatly improved detection of sustained step changes in the process parameters. Any control chart that combines sample information over time, e.g., the cumulative sum (CUSUM) chart and the exponentially weighted moving average (EWMA) chart, has an ability to detect process changes that varies over time depending on the past data observed. The chart statistics can take values such that some shifts in the parameters of the underlying probability distribution of the quality characteristic are more difficult to detect. This is referred to as the "inertia problem" in the literature. This dissertation shows under realistic assumptions that the worst-case run length performance of control charts becomes as informative as the steady-state performance. Also this study proposes a simple new measure of the inertial properties of control charts, namely the signal resistance. The conclusions of this study support the recommendation that Shewhart limits should be used with EWMA charts, especially when the smoothing parameter is small. This study also shows that some charts proposed by Pignatiello and Runger (1990) and Domangue and Patch (1991) have serious disadvantages with respect to inertial properties.
Ph. D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Ramaboa, Kutlwano. "Contributions to Linear Regression diagnostics using the singular value decompostion: Measures to Indentify Outlying Observations, Influential Observations and Collinearity in Multivariate Data". Doctoral thesis, University of Cape Town, 2010. http://hdl.handle.net/11427/4391.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Souza, Aline Campos Reis de. "Modelos de regressão linear heteroscedásticos com erros t-Student : uma abordagem bayesiana objetiva". Universidade Federal de São Carlos, 2016. https://repositorio.ufscar.br/handle/ufscar/7540.

Texto completo
Resumen
Submitted by Luciana Sebin (lusebin@ufscar.br) on 2016-09-26T18:57:40Z No. of bitstreams: 1 DissACRS.pdf: 1390452 bytes, checksum: a5365fdbf745228c0174f2643b3f7267 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-27T19:59:56Z (GMT) No. of bitstreams: 1 DissACRS.pdf: 1390452 bytes, checksum: a5365fdbf745228c0174f2643b3f7267 (MD5)
Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-09-27T20:00:01Z (GMT) No. of bitstreams: 1 DissACRS.pdf: 1390452 bytes, checksum: a5365fdbf745228c0174f2643b3f7267 (MD5)
Made available in DSpace on 2016-09-27T20:00:08Z (GMT). No. of bitstreams: 1 DissACRS.pdf: 1390452 bytes, checksum: a5365fdbf745228c0174f2643b3f7267 (MD5) Previous issue date: 2016-02-18
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
In this work , we present an extension of the objective bayesian analysis made in Fonseca et al. (2008), based on Je reys priors for linear regression models with Student t errors, for which we consider the heteroscedasticity assumption. We show that the posterior distribution generated by the proposed Je reys prior, is proper. Through simulation study , we analyzed the frequentist properties of the bayesian estimators obtained. Then we tested the robustness of the model through disturbances in the response variable by comparing its performance with those obtained under another prior distributions proposed in the literature. Finally, a real data set is used to analyze the performance of the proposed model . We detected possible in uential points through the Kullback -Leibler divergence measure, and used the selection model criterias EAIC, EBIC, DIC and LPML in order to compare the models.
Neste trabalho, apresentamos uma extensão da análise bayesiana objetiva feita em Fonseca et al. (2008), baseada nas distribuicões a priori de Je reys para o modelo de regressão linear com erros t-Student, para os quais consideramos a suposicão de heteoscedasticidade. Mostramos que a distribuiçãoo a posteriori dos parâmetros do modelo regressão gerada pela distribuição a priori e própria. Através de um estudo de simulação, avaliamos as propriedades frequentistas dos estimadores bayesianos e comparamos os resultados com outras distribuições a priori encontradas na literatura. Além disso, uma análise de diagnóstico baseada na medida de divergência Kullback-Leiber e desenvolvida com analidade de estudar a robustez das estimativas na presença de observações atípicas. Finalmente, um conjunto de dados reais e utilizado para o ajuste do modelo proposto.
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Júnior, Antônio Carlos Pacagnella. "A inovação tecnológica nas indústrias do Estado de São Paulo: uma análise dos indicadores da PAEP". Universidade de São Paulo, 2006. http://www.teses.usp.br/teses/disponiveis/96/96132/tde-25072006-151430/.

Texto completo
Resumen
A inovação tecnológica desempenha um papel fundamental no desenvolvimento de empresas, regiões e mesmo de países. Especificamente no estado de São Paulo, estudar os aspectos relevantes a este tema é de suma importância por se tratar do estado mais industrializado e mais importante economicamente no Brasil. Dentro deste contexto, este estudo visa analisar especificamente aspectos ligados à inovação tecnológica nas empresas dos diversos setores de atividade industrial, utilizando para isto ndicadores de inovação tecnológica e de dados empresariais da Pesquisa de Atividade Econômica Paulista (PAEP), realizada pela fundação Sistema Estadual de Análise de Dados Estatísticos (SEADE), sobre o período de 1999 a 2001.
The technological innovation performs a fundamental part in the development process of companies, regions and even countries. Specifically in the state of São Paulo, the study of relevant aspects to this theme is of summary importance because it is the most industrialized and economically important in this country. Within of this context, this study aim to analyze specifically some aspects linked to the technological innovation in different sections of industrial activity, using to this, technological innovation indicators and business results obtained by the Paulista Research of Economic Activities (PAEP), that was realized by SEADE foundation over the period of 1999 to 2001.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Delmonde, Marcelo Vinicius Felizatti. "Eletro-oxidação oscilatória de moléculas orgânicas pequenas: produção de espécies voláteis e desempenho catalítico". Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/75/75134/tde-19042016-153123/.

Texto completo
Resumen
A emergência frequente de oscilações de corrente e potencial durante a eletro-oxidação de moléculas orgânicas pequenas tem implicações mecanísticas importantes, como por exemplo, na conversão reacional global e, portanto, no desempenho de dispositivos práticos de conversão de energia. Orientado nesse sentido, este trabalho desenvolveu-se por meio de duas frentes relacionadas: (a) utilizando-se medidas obtidas por meio do acoplamento de uma célula eletroquímica a um espectrômetro de massas, estudou-se a dinâmica da produção de espécies voláteis durante a eletro-oxidação oscilatória de ácido fórmico, metanol e etanol. Além da apresentação de resultados experimentais ainda não relatados, introduz-se o uso de regressão linear multivariada para se comparar a corrente faradaica total estimada, com a proveniente da produção de espécies voláteis detectáveis: dióxido de carbono para ácido fórmico, dióxido de carbono e metilformiato para metanol e, dióxido de carbono e acetaldeído para etanol. A análise fornece a melhor combinação das correntes iônicas detectadas para se representar a corrente global ou a máxima contribuição faradaica possível devido à produção de espécies voláteis. Os resultados foram discutidos em conexão com aspectos do mecanismo reacional de cada molécula. A incompatibilidade entre a corrente faradaica total estimada e a obtida pela melhor combinação das correntes parciais provenientes da produção de espécies voláteis foi pequena para ácido fórmico, quatro e cinco vezes maior para etanol e metanol, respectivamente, evidenciando, nestes dois últimos casos, o aumento do papel desempenhado por espécies solúveis parcialmente oxidadas; (b) investigou-se características gerais da eletro-oxidação de formaldeído, ácido fórmico e metanol sobre platina em meio ácido, com ênfase na comparação do desempenho eletrocatalítico global sob condições estacionária e oscilatória. A comparação procedeu-se por meio da interpretação de resultados tratados de diferentes formas e generalizada pela utilização das mesmas condições experimentais em todos os casos. Para todos os sistemas, o baixo potencial alcançado durante as oscilações evidenciou uma considerável diminuição do sobrepotencial associado à reação anódica, se comparado com o obtido na ausência de oscilações. Além do mais, o processo de reativação superficial do catalisador que ocorre durante as oscilações amplia o desempenho de todos os sistemas em termos de atividade eletrocatalítica. Por fim, também são discutidos alguns aspectos do mecanismo reacional das moléculas estudadas.
The frequent emergence of current/potential oscillations during the electrooxidation of small organic molecules has implications on mechanistic aspects such as, for example, on the overall reaction conversion, and thus on the performance of practical devices of energy conversion. In this direction, this work is divided in two parts: (a) by means of on line Differential Electrochemical Mass Spectrometry (DEMS) it was studied the production of volatile species during the electrooxidation of formic acid, methanol and ethanol. Besides the presentation of previously unreported DEMS results on the oscillatory dynamics of such systems, it was introduced the use of multivariate linear regression to compare the estimated total faradaic current with the one comprising the production of volatile detectable species, namely: carbon dioxide for formic acid, carbon dioxide and methylformate for methanol and, carbon dioxide and acetaldehyde for ethanol. The introduced analysis provided the best combination of the DEMS ion currents to represent the total faradaic current or the maximum possible faradaic contribution of the volatile products for the global current. The results were discussed in connection with mechanistic aspects for each system. The mismatch between estimated total current and the one obtained by the best combination of partial currents of volatile products was found to be small for formic acid, 4 and 5 times bigger for ethanol and methanol, respectively, evidencing the increasing role played by partially oxidized soluble species in each case; (b) it was investigated general features of the electro-oxidation of formaldehyde, formic acid and methanol on platinum and in acid media, with emphasis on the comparison of the performance under stationary and oscillatory regimes. The comparison is carried out by different means and generalized by the use of identical experimental conditions in all cases. In all three systems studied, the occurrence of potential oscillations is associated with excursions of the electrode potentials to lower values, which considerable decreases the overpotential of the anodic reaction, when compared to that in the absence of oscillations. In addition, the reactivation of catalyst surface benefits the performance of all systems in terms of electrocatalytic activity. Finally, some mechanistic aspects of the studied reactions are also discussed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Maier, Marco J. "DirichletReg: Dirichlet Regression for Compositional Data in R". WU Vienna University of Economics and Business, 2014. http://epub.wu.ac.at/4077/1/Report125.pdf.

Texto completo
Resumen
Dirichlet regression models can be used to analyze a set of variables lying in a bounded interval that sum up to a constant (e.g., proportions, rates, compositions, etc.) exhibiting skewness and heteroscedasticity, without having to transform the data. There are two parametrization for the presented model, one using the common Dirichlet distribution's alpha parameters, and a reparametrization of the alpha's to set up a mean-and-dispersion-like model. By applying appropriate link-functions, a GLM-like framework is set up that allows for the analysis of such data in a straightforward and familiar way, because interpretation is similar to multinomial logistic regression. This paper gives a brief theoretical foundation and describes the implementation as well as application (including worked examples) of Dirichlet regression methods implemented in the package DirichletReg (Maier, 2013) in the R language (R Core Team, 2013). (author's abstract)
Series: Research Report Series / Department of Statistics and Mathematics
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Rivers, Derick Lorenzo. "Dynamic Bayesian Approaches to the Statistical Calibration Problem". VCU Scholars Compass, 2014. http://scholarscompass.vcu.edu/etd/3599.

Texto completo
Resumen
The problem of statistical calibration of a measuring instrument can be framed both in a statistical context as well as in an engineering context. In the first, the problem is dealt with by distinguishing between the "classical" approach and the "inverse" regression approach. Both of these models are static models and are used to estimate "exact" measurements from measurements that are affected by error. In the engineering context, the variables of interest are considered to be taken at the time at which you observe the measurement. The Bayesian time series analysis method of Dynamic Linear Models (DLM) can be used to monitor the evolution of the measures, thus introducing a dynamic approach to statistical calibration. The research presented employs the use of Bayesian methodology to perform statistical calibration. The DLM framework is used to capture the time-varying parameters that may be changing or drifting over time. Dynamic based approaches to the linear, nonlinear, and multivariate calibration problem are presented in this dissertation. Simulation studies are conducted where the dynamic models are compared to some well known "static'" calibration approaches in the literature from both the frequentist and Bayesian perspectives. Applications to microwave radiometry are given.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Zanon, Mattia. "Non-Invasive Continuous Glucose Monitoring: Identification of Models for Multi-Sensor Systems". Doctoral thesis, Università degli studi di Padova, 2013. http://hdl.handle.net/11577/3423010.

Texto completo
Resumen
Diabetes is a disease that undermines the normal regulation of glucose levels in the blood. In people with diabetes, the body does not secrete insulin (Type 1 diabetes) or derangements occur in both insulin secretion and action (Type 2 diabetes). In spite of the therapy, which is mainly based on controlled regimens of insulin and drug administration, diet, and physical exercise, tuned according to self-monitoring of blood glucose (SMBG) levels 3-4 times a day, blood glucose concentration often exceeds the normal range thresholds of 70-180 mg/dL. While hyperglycaemia mostly affects long-term complications (such as neuropathy, retinopathy, cardiovascular, and heart diseases), hypoglycaemia can be very dangerous in the short-term and, in the worst-case scenario, may bring the patient into hypoglycaemic coma. New scenarios in diabetes treatment have been opened in the last 15 years, when continuous glucose monitoring (CGM) sensors, able to monitor glucose concentration continuously (i.e. with a reading every 1 to 5 min) over several days, entered clinical research. CGM sensors can be used both retrospectively, e.g., to optimize the metabolic control, and in real-time applications, e.g., in the "smart" CGM sensors, able to generate alerts when glucose concentrations are predicted to exceed the normal range thresholds or in the so-called "artificial pancreas". Most CGM sensors exploit needles and are thus invasive, although minimally. In order to improve patients comfort, Non-Invasive Continuous Glucose Monitoring (NI-CGM) technologies have been widely investigated in the last years and their ability to monitor glucose changes in the human body has been demonstrated under highly controlled (e.g. in-clinic) conditions. As soon as these conditions become less favourable (e.g. in daily-life use) several problems have been experienced that can be associated with physiological and environmental perturbations. To tackle this issue, the multisensor concept received greater attention in the last few years. A multisensor consists in the embedding of sensors of different nature within the same device, allowing the measurement of endogenous (glucose, skin perfusion, sweating, movement, etc.) as well as exogenous (temperature, humidity, etc.) factors. The main glucose related signals and those measuring specific detrimental processes have to be combined through a suitable mathematical model with the final goal of estimating glucose non-invasively. White-box models, where differential equations are used to describe the internal behavior of the system, can be rarely considered to combine multisensor measurements because a physical/mechanistic model linking multisensor data to glucose is not easily available. A more viable approach considers black-box models, which do not describe the internal mechanisms of the system under study, but rather depict how the inputs (channels from the non-invasive device) determine the output (estimated glucose values) through a transfer function (which we restrict to the class of multivariate linear models). Unfortunately, numerical problems usually arise in the identication of model parameters, since the multisensor channels are highly correlated (especially for spectroscopy based devices) and for the potentially high dimension of the measurement space. The aim of the thesis is to investigate and evaluate different techniques usable for the identication of the multivariate linear regression models parameters linking multisensor data and glucose. In particular, the following methods are considered: Ordinary Least Squares (OLS); Partial Least Squares (PLS); the Least Absolute Shrinkage and Selection Operator (LASSO) based on l1 norm regularization; Ridge regression based on l2 norm regularization; Elastic Net (EN), based on the combination of the two previous norms. As a case study, we consider data from the Multisensor device mainly based on dielectric and optical sensors developed by Solianis Monitoring AG (Zurich, Switzerland) which partially sponsored the PhD scholarship. Solianis Monitoring AG IP portfolio is now held by Biovotion AG (Zurich, Switzerland). Forty-five recording sessions provided by Solianis Monitoring AG and collected in 6 diabetic human beings undertaken hypo and hyperglycaemic protocols performed at the University Hospital Zurich are considered. The models identified with the aforementioned techniques using a data subset are then assessed against an independent test data subset. Results show that methods controlling complexity outperform OLS during model test. In general, regularization techniques outperform PLS, especially those embedding the l1 norm (LASSO end EN), because they set many channel weights to zero thus resulting more robust to occasional spikes occurring in the Multisensor channels. In particular, the EN model results the best one, sharing both the properties of sparseness and the grouping effect induced by the l1 and l2 norms respectively. In general, results indicate that, although the performance, in terms of overall accuracy, is not yet comparable with that of SMBG enzyme-based needle sensors, the Multisensor platform combined with the Elastic-Net (EN) models is a valid tool for the real-time monitoring of glycaemic trends. An effective application concerns the complement of sparse SMBG measures with glucose trend information within the recently developed concept of dynamic risk for the correct judgment of dangerous events such as hypoglycaemia. The body of the thesis is organized into three main parts: Part I (including Chapters 1 to 4), first gives an introduction of the diabetes disease and of the current technologies for NI-CGM (including the Multisensor device by Solianis) and then states the aims of the thesis; Part II (which includes Chapters 5 to 9), first describes some of the issues to be faced in high dimensional regression problems, and then presents OLS, PLS, LASSO, Ridge and EN using a tutorial example to highlight their advantages and drawbacks; Finally, Part III (including Chapters 10-12), presents the case study with the data set and results. Some concluding remarks and possible future developments end the thesis. In particular, a Monte Carlo procedure to evaluate robustness of the calibration procedure for the Solianis Multisensor device is proposed, together with a new cost function to be used for identifying models.
Il diabete e una malattia che compromette la normale regolazione dei livelli di glucosio nel sangue. Nelle persone diabetiche, il corpo non secerne insulina (diabete di tipo 1) o si vericano delle alterazioni sia nella secrezione che nell'azione dell'insulina stessa (diabete di tipo 2). La terapia si basa principalmente su somministrazione di insulina e farmaci, dieta ed esercizio fisico, modulati in base alla misurazione dei livelli di glucosio nel sangue 3-4 volte al giorno attraverso metodi finger-prick. Nonostante ciò, la concentrazione di glucosio nel sangue supera spesso le soglie di normalita di 70-180 mg/dL. Mentre l'iperglicemia implica complicanze a lungo termine (come ad esempio neuropatia, retinopatia, malattie cardiovascolari e cardiache), l'ipoglicemia puo essere molto pericolosa nel breve termine e, nel peggiore dei casi, portare il paziente in coma ipoglicemico. Nuovi scenari nella cura del diabete si sono affacciati negli ultimi 10 anni, quando sensori per il monitoraggio continuo della glucemia sono entrati nella fase di sperimentazione clinica. Questi sensori sono in grado di monitorare le concentrazioni di glucosio nel sangue con una lettura ogni 1-5 minuti per diversi giorni, permettendo un analisi sia retrospettiva, ad esempio per ottimizzare il controllo metabolico, che in tempo reale, per generare avvisi quando viene predetta l'uscita dalla normale banda euglicemica, e nel cosiddetto "pancreas artificiale". La maggior parte di questi sensori per il monitoraggio continuo della glicemia sono minimatmente invasivi perche sfruttano un piccolo ago inserito sottocute. Gli ultimi anni hanno visto un crescente interesse verso tecnologie non invasive per il monitoraggio continuo della glicemia, con l'obiettivo di migliorare il comfort del paziente. La loro capacità di monitorare i cambiamenti di glucosio nel corpo umano e stata dimostrata in condizioni altamente controllate tipiche di un'infrastruttura clinica. Non appena queste condizioni diventano meno favorevoli (ad esempio durante un uso quotidiano di queste tecnologie), sorgono diversi problemi associati a perturbazioni fisiologiche ed ambientali. Per affrontare questo problema, negli ultimi anni il concetto di "multisensore" ha ottenuto un crescente interesse. Esso consiste nell'integrazione di sensori di diversa natura all'interno dello stesso dispositivo, permettendo la misurazione di fattori endogeni (glucosio, perfusione del sangue, sudorazione, movimento, ecc) ed esogeni (temperatura, umidita, ecc). I segnali maggiormente correlati con il glucosio e quelli legati agli altri processi sono combinati con un opportuno modello matematico con l'obiettivo finale di stimare la glicemia in modo non invasivo. Modelli di sistema (o a "scatola bianca"), nei quali equazioni differenziali descrivono il comportamento interno del sistema, possono essere considerati raramente. Infatti, un modello fisico/meccanicistico legante i dati misurati dal multisensore con il glucosio non e facilmente disponibile. Un differente approccio vede l'impiego di modelli di dati (o a "scatola nera") che descrivono il sistema in esame in termini di ingressi (canali misurati dal dispositivo non invasivo), uscita (valori stimati di glucosio) e funzione di trasferimento (che in questa tesi si limita alla classe dei modelli di regressione lineari multivariati). In fase di identificazione dei parametri del modello potrebbero insorgere problemi numerici legati alla collinearita tra sottoinsiemi dei canali misurati dal multisensore (in particolare per i dispositivi basati su spettroscopia) e per la dimensione potenzialmente elevata dello spazio delle misure. L'obiettivo della tesi di dottorato e di investigare e valutare diverse tecniche per l'identicazione del modello di regressione lineare multivariata con lo scopo di stimare i livelli di glicemia non invasivamente. In particolare, i seguenti metodi sono considerati: Ordinary Least Squares (OLS), Partial Least Squares (PLS), the Least Absolute Shrinkage and Selection Operator (LASSO) basato sulla regolarizzazione con norma l1; Ridge basato sulla regolarizzazione con norma l2; Elastic-Net (EN) basato sulla combinazione delle due norme precedenti. Come caso di studio per l'applicazione delle metodologie proposte, consideriamo i dati misurati dal dispositivo multisensore, principalmente basato su sensori dielettrici ed ottici, sviluppato dall'azienda Solianis Monitoring AG (Zurigo, Svizzera), che ha parzialmente sostenuto gli oneri finanziari legati al progetto di dottorato durante il quale questa tesi e stata sviluppata. La tecnologia del multisensore e la proprietà intellettuale di Solianis sono ora detenute da Biovotion AG (Zurigo, Svizzera). Solianis Monitoring AG ha fornito quarantacinque sessioni sperimentali collezionate da 6 pazienti soggetti a protocolli ipo ed iperglicemici presso l'University Hospital Zurich. I modelli identificati con le tecniche di cui sopra, sono testati con un insieme di dati diverso da quello utilizzato per l'identicazione dei modelli stessi. I risultati dimostrano chei metodi di controllo della complessita hanno accuratezza maggiore rispetto ad OLS. In generale, le tecniche basate su regolarizzazione sono migliori rispetto a PLS. In particolare, quelle che sfruttano la norma l1 (LASSO ed EN), pongono molti coefficienti del modello a zero rendendo i profili stimati di glucosio piu robusti a rumore occasionale che interessa alcuni canali del multi-sensore. In particolare, il modello EN risulta il migliore, condividendo sia le proprietà di sparsita e l'effetto raggruppamento indotte rispettivamente dalle norme l1 ed l2. In generale, i risultati indicano che, anche se le prestazioni, in termini di accuratezza dei profili di glucosio stimati, non sono ancora confrontabili con quelle dei sensori basati su aghi, la piattaforma multisensore combinata con il modello EN è un valido strumento per il monitoraggio in tempo reale dei trend glicemici. Una possibile applicazione si basa sull'utilizzo del'informazione dei trend glicemici per completare misure rade effettuate con metodi finger-prick. Sfruttando il concetto di rischio dinamico recentemente sviluppato, e' possibile dare una corretta valutazione di eventi potenzialmente pericolosi come l'ipoglicemia. La tesi si articola in tre parti principali: Parte I (che comprende i Capitoli 1-4), fornisce inizialmente un'introduzione sul diabete, una recensione delle attuali tecnologie per il monitoraggio non-invasivo della glicemia (incluso il dispositivo multisensore di Solianis) e gli obiettivi della tesi; Parte II (che comprende i Capitoli 5-9), presenta alcune delle difficoltà affrontate quando si lavora con problemi di regressione su dati di grandi dimensioni, per poi presentare OLS, PLS, LASSO, Ridge e EN sfruttando un esempio tutorial per evidenziarne vantaggi e svantaggi. Infine, Parte III, (Capitoli 10-12) presenta il set di dati del caso di studio ed i risultati. Alcune note conclusive e possibili sviluppi futuri terminano la tesi. In particolare, vengono brevemente illustrate una metodologia basata su simulazioni Monte Carlo per valutare la robustezza della calibrazione del modello e l'utilizzo di un nuova nuova funzione obiettivo per l'identicazione dei modelli.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

COSTA, Ismael Gaião da. "Desempenho agroindustrial, adaptabilidade, estabilidade e divergência genética entre clones RB de cana-de-açúcar em Pernambuco". Universidade Federal Rural de Pernambuco, 2011. http://www.tede2.ufrpe.br:8080/tede2/handle/tede2/6411.

Texto completo
Resumen
Submitted by (ana.araujo@ufrpe.br) on 2017-02-17T11:47:54Z No. of bitstreams: 1 Ismael Gaiao da Costa.pdf: 2381457 bytes, checksum: 1ddfca5789915a115c0404f1571561e5 (MD5)
Made available in DSpace on 2017-02-17T11:47:54Z (GMT). No. of bitstreams: 1 Ismael Gaiao da Costa.pdf: 2381457 bytes, checksum: 1ddfca5789915a115c0404f1571561e5 (MD5) Previous issue date: 2011-02-24
Brazil is the world's largest producer of sugarcane (Saccharum spp.), whose culture interacts with the most varied environments. The replacement of varieties has contributed greatly to an effective increase in productivity. Thus it studies of genotype x environment (G x E) interaction, the analysis of phenotypic adaptability and stability, and the selection of parents for hybridization are essential for the indication of varieties suited to different soil and climatic conditions. The objective of this research was to evaluate the agribusiness behavior, adaptability and phenotypic stability of 11 RB sugarcane clones in the final phase of the trial, in sugarcane micro regions in the State of Pernambuco, Brazil Northeast, for three consecutives harvests, as well as assisting the selection of potential parents to be used in future crossings by conducted by Sugarcane Breeding Program (PMGCA) of Network for the Development of Alcohol and Sugar (RIDESA) of Experimental Station Sugarcane Carpina (EECAC) of Federal Rural University of Pernambuco (UFRPE). The experiments were carried out in five Pernambuco sugar mills, in the months of july and august 2006, using the experimental design of randomized blocks with four replications and plots with five eight-meter furrows and spacing of 1.0 m. The results were subjected to analysis of variance, comparison of averages by Scott & Knott test and studies of adaptability, stability and genetic divergence. In each section the variables were measured as ton of pol per hectare (TPH), ton of cane per hectare (TCH); fibre (FIB), Pol% corrected (PCC), purity (PZA), soluble solids (BRIX) and total recoverable sugar (TRS). Based on the results, the best RB genotypes of sugarcane were G1, G6 and G9 in environment I, G1 and G11 in environment II, G1 and G9 in environment III, G3 for environment IV and G1 the environment V. Among the best clones, those with wide adaptability are: G1 and G11, and those with adaptability to environments are: G6 and G9. The genotypes most indicated for use in hybridizations are G1 and G6, as they showed the greatest genetic dissimilarity.
O Brasil é o maior produtor mundial de cana-de-açúcar (Saccharum spp.), cuja cultura interage com os mais variados ambientes. A substituição de variedades tem contribuído bastante para um eficiente aumento na produtividade. Neste sentido, os estudos da interação genótipo x ambiente (G x A), as análises de adaptabilidade e estabilidade fenotípica, e a seleção de parentais para cruzamentos são imprescindíveis para a indicação de variedades adequadas às diversas condições edafoclimáticas. Objetivou-se com esta pesquisa avaliar o comportamento agroindustrial, a adaptabilidade e a estabilidade fenotípica de 11 clones RB de cana-de-açúcar, na fase final da experimentação, em microrregiões canavieiras do Estado de Pernambuco, por três colheitas consecutivas, bem como auxiliar a seleção de progenitores potenciais a serem utilizados em futuros cruzamentos pelo Programa de Melhoramento Genético da Cana-de-açúcar (PMGCA) da Rede Interuniversitária para o Desenvolvimento do Setor Sucroalcooleiro (RIDESA) conduzido pela Estação Experimental de Cana-de-açúcar (EECAC) da Universidade Federal Rural de Pernambuco (UFRPE). Os experimentos foram instalados em cinco usinas de Pernambuco, nos meses de julho e agosto de 2006, utilizando-se o delineamento experimental de blocos casualizados, com quatro repetições e parcelas com cinco sulcos de oito metros com espaçamento de 1,0 m. Os resultados foram submetidos à análise de variância, à comparação de médias pelo teste de Scott & Knott e a estudos de adaptabilidade, estabilidade e divergência genética. Em cada corte foram mensuradas as variáveis tonelada de pol por hectare (TPH), tonelada de cana por hectare (TCH); Pol% corrigido (PCC), fibra (FIB), pureza (PZA), teor de sólidos solúveis (BRIX) e açúcar total recuperável (ATR). Com base nos resultados, os genótipos RB de cana-de-açúcar mais produtivos foram G1, G6 e G9; para o ambiente I, G11 e G1 para o ambiente II, G9 e G1 para o ambiente III, G3 para o ambiente IV e G1 para o ambiente V. Dentre os melhores clones, aqueles com adaptabilidade ampla são: G1 e G11; e aqueles com adaptabilidade para ambientes favoráveis são: G6 e G9. Os genótipos mais indicados para utilização em hibridações são G1 e G6, pois estes apresentaram a maior dissimilaridade genética.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Martins, Natália da Silva. "Método Shenon (Shelf-life prediction for Non-accelarated Studies) na predição do tempo de vida útil de alimentos". Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-05012017-182429/.

Texto completo
Resumen
A determinação do tempo de vida útil dos alimentos é importante, pois garante que estes estejam adequados para o consumo se ingeridos dentro do período estipulado. Os alimentos que apresentam vida curta têm seu tempo de vida útil determinado por estudos não acelerados, os quais demandam métodos multivariados de análise para uma boa predição. Considerando isso, este estudo objetiva: propor um método estatístico multivariado capaz de predizer o tempo de vida útil de alimentos, em estudos não acerados; avaliar sua estabilidade e sensibilidade frente a perturbações provocadas nas variáveis de entrada, utilizando técnicas de simulação bootstrap; apresentar o método SheNon para dados experimentais por meio da construção de contrastes dos tempos preditos dos tratamentos de interesse e compará-los com uma diferença mínima significativa (DMS) obtida empiricamente por meio do método bootstrap. Com os resultados provenientes deste estudo constatou-se que o método proposto mostra-se promissor e estável para a predição do tempo de vida útil de alimentos em estudos não acelerados. O método mostrou-se sensível ao número de tempos (tamanho da amostra) em que o alimento foi observado. Verificou-se, também, bom desempenho na análise de dados experimentais, uma vez que após predição do tempo de vida útil para cada tratamento considerado, pode-se inferir sobre a igualdade dos tempos de vida de diferentes tratamentos.
Consumers are increasingly demanding about the quality of food and expectation that this quality is maintained at high level during the period between purchase and consumption. These expectations are a consequence not only of the requirement that the food should stay safe, but also the need to minimize the unwanted changes in their sensory qualities. Considering food safety and consumer demands this study aims to propose a multivariate statistical method to predict the shelf life of time not accelerated studies, the method SheNon. The development of multivariate method for predicting the shelf life of a food, considering all attributes and their natures describes a new concept of data analysis for estimating the degradation mechanisms that govern food and determines the time period in which these foods retain their characteristics within acceptable levels. The proposed method allows to include microbiological, physical, chemical and sensory attributes, which leads to a more accurate prediction of shelf life of food. The method SheNon features easy interpretation, its main advantages include the ability to combine information from different natures and can be generalized to data with experimental structure. The method SheNon was applied to eggplants minimally processed predicting a lifetime of around 9.6 days.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Lazar, Ann A. "Determining when time response curves differ in the presence of censorship /". Connect to abstract via ProQuest. Full text is not available online, 2008.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Hosler, Deborah Susan. "Models and Graphics in the Analysis of Categorical Variables: The Case of the Youth Tobacco Survey". [Johnson City, Tenn. : East Tennessee State University], 2002. http://etd-submit.etsu.edu/etd/theses/available/etd-0716102-095453/unrestricted/HoslerD080202.pdf.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Fausti, Giovanni, Gustaf Sandelin y Adam Bratt. "Stock Splits And The Impact On Abnormal Return : A Quantitative Research on Nasdaq Stockholm". Thesis, Stockholms universitet, Företagsekonomiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-194741.

Texto completo
Resumen
Throughout history stock splits have only been seen as a cosmetic change on how a firm express its market value of equity. This study investigates if abnormal return occurs in connection with stock split announcements on Nasdaq Stockholm and how the variations may be explained by selected factors. An event study is performed on 83 stock splits during the time period 2010-2020 to establish if abnormal return is present. With a multivariate linear regression, split quota, firm size and trading volume are the selected factors which may explain the variations in abnormal return. The results from the event study establish abnormal return one day prior to the announcement and the event day itself. Further, the regression confirms at a statistically significant level the negative relationship between firm size and abnormal return. For trading volume, the regression finds no statistically significant result and thereby it does not explain the variations in abnormal return. As for split quota, no conclusion can be drawn whether it affects abnormal return or not. The study concludes the occurrence of abnormal return in connection with stock split announcements on Nasdaq Stockholm and firm size as one of the factors explaining the variations.
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Sousa, Rhelcris Salvino de. "Algoritmo evolutivo com representação inteira para seleção de características". Universidade Federal de Goiás, 2017. http://repositorio.bc.ufg.br/tede/handle/tede/7395.

Texto completo
Resumen
Submitted by JÚLIO HEBER SILVA (julioheber@yahoo.com.br) on 2017-05-31T17:56:45Z No. of bitstreams: 2 Dissertação - Rhelcris Salvino de Sousa -2017.pdf: 12280322 bytes, checksum: 2985f69ec9d4b79ed4266baba761bd15 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2017-06-01T11:00:44Z (GMT) No. of bitstreams: 2 Dissertação - Rhelcris Salvino de Sousa -2017.pdf: 12280322 bytes, checksum: 2985f69ec9d4b79ed4266baba761bd15 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2017-06-01T11:00:44Z (GMT). No. of bitstreams: 2 Dissertação - Rhelcris Salvino de Sousa -2017.pdf: 12280322 bytes, checksum: 2985f69ec9d4b79ed4266baba761bd15 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-04-20
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
Machine learning problems usually involve a large number of features or variables. In this context, feature selection algorithms have the challenge of determining a reduced subset from the original set. The main difficulty in this task is the high number of solutions available in the search space. In this context, genetic algorithm is one of the most used techniques in this type of problem due to its implicit parallelism in the exploration of the search space of the problem considered. However, a binary type representation is usually used to encode the solutions. This work proposes an implementation solution that makes use of integer representation called intEA-MLR instead of binary. The integer representation optimizes the understanding of the data, as the features to be selected are represented by integer values, reducing the size of the chromosome used in the search process. The intEA-MLR in this context is presented as an alternative way of solving high dimensional problems in regression problems. As a case study, three different sets of data are used concerning problems involving determination of properties of interest in samples of 1) Grain Wheat, 2) Medicine tablets and 3) petroleum. Such sets were used in competitions held at the International Diffuse Reflectance Conference (IDRC) (http://cnirs.clubexpress.com/content.aspx?page_id=22&club_ id=409746&module_id=190211), in the years 2008, 2012 and 2014, respectively. The results showed that the proposed solution was able to improve the obtained solutions when compared to the classical implementation that makes use of binary coding, with both more accurate prediction models and with reduced number of features. IntEA-MLR also outperformed the competition winners, reaching 91.17% better than the competition winner for the petroleum data set. In addition, the results also indicated that the computation time required by the intEA-MLR is relatively smaller as more features are available.
Problemas de aprendizado de máquina geralmente envolvem um grande número de características ou variáveis. Nesse contexto, algoritmos de seleção de características tem como desafio determinar um subconjunto reduzido a partir do conjunto original. A principal dificuldade nesta tarefa é o elevado número de soluções disponíveis no espaço de busca. Nesse contexto, algoritmo genético é uma das técnicas mais utilizadas nesse tipo de problema em razão de seu paralelismo implícito na exploração do espaço de busca do problema considerado. Entretanto, geralmente utiliza-se uma representação do tipo biná- ria para codificar as soluções. Neste trabalho é proposto uma solução de implementação que faz uso de representação inteira denominada intEA-MLR em detrimento da binária. A representação inteira otimiza o entendimento dos dados, na medida em que as características a serem selecionadas são determinadas por valores inteiros reduzindo o tamanho do cromossomo utilizado no processo de busca. O intEA-MLR nesse contexto, se apresenta como uma forma alternativa de resolução de problemas de alta dimensionalidade em problemas de regressão. Como estudo de caso, utiliza-se três diferentes conjuntos de dados referente a problemas envolvendo determinação de propriedades de interesse em amostra de 1) Grãos de Trigo, 2) Comprimidos de remédio e 3) Petróleo. Tais conjuntos foram utilizados nas competições realizadas no International Diffuse Reflectance Conference (IDRC) (http://cnirs.clubexpress.com/content.aspx?page_id=22&club_ id=409746&module_id=190211), nos anos de 2008, 2012 e 2014, respectivamente. Os resultados mostraram que a solução proposta foi capaz de aprimorar as soluções obtidas quando comparadas com a implementação clássica que faz uso da codificação binária, tanto com modelos de predição mais acurados quanto com número reduzido de características. intEA-MLR também obteve resultados superiores aos dos vencedores das competições, chegando a obter soluções 91,17% melhores do que o vencedor da competição para o conjunto de dados de petróleo. Adicionalmente, os resultados também indicaram que o tempo de computação requerido pelo intEA-MLR é relativamente menor a medida em que um número maior de características estão disponíveis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Dieste, Andrés. "Colour development in Pinus radiata D. Don. under kiln-drying conditions". Thesis, University of Canterbury. Chemical and Process Engineering, 2002. http://hdl.handle.net/10092/1134.

Texto completo
Resumen
This study quantifies discolouration on the surface of Pinus radiata boards during kiln drying, particularly kiln brown stain (KBS), and models it as a function of chemical compounds present in the wood closest to the surface. The discolouration was investigated with two experimental factors: drying time, which consisted in drying at 70/120 ℃ for 0, 8, 16 and 24 hours; and leaching, done at three levels, noleaching, mild and severe, to reduce the soluble compounds present in wood suspected of developing coloured compounds. The colour change was quantified using a reflectance photometer (colour system CIE Yxy, brightness) and by the analysis of digital photographs (colour system CIE Lab). The chemical analysis of the wood closest to the surface of the boards determined fructose, glucose, sucrose (HPLC), total sugar (sum of fructose, glucose and sucrose), total nitrogen (combustion gas analysis), and phenols discriminated by molecular weight (Folin-Ciocalteu method). In the cause-effect analysis, colour was the dependent variable, and drying time and the determinations of chemical compounds were independent variables. After statistical analysis (ANOVA and MANOVA) the dependent variables to be included in the models were luminance factor (Y), brightness (R457 and the blue-to-yellow scale of CIE Lab (b); and the independent variables were drying time, nitrogen, total sugar, and high-molecular-weight phenols. Linear (multivariate regression) and non-linear models (Neural Networks) showed that discolouration during kiln drying was best predicted when the luminance factor (Y) was used to quantify colour change as a function of the content of nitrogen-containing compounds and drying time. Furthermore, the data were fitted into an empirical model based on simple reaction kinetics that considered the rate of discolouration as a function of nitrogen concentration. The results suggest that nitrogen could act as a limiting reactant in Maillard-type reactions that produce colour during kiln drying.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Soares, Sófacles Figueiredo Carreiro. "Um novo critério para seleção de variáveis usando o Algoritmo das Projeções Sucessivas". Universidade Federal da Paraí­ba, 2010. http://tede.biblioteca.ufpb.br:8080/handle/tede/7184.

Texto completo
Resumen
Made available in DSpace on 2015-05-14T13:21:51Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 2432134 bytes, checksum: aeda44e0d999a92b980354a5ea66ce01 (MD5) Previous issue date: 2010-09-22
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
This study proposes a modification in the Successive Projections Algorithm (SPA), that makes models of Multiple Linear Regression (MLR) more robust in terms of interference. In SPA, subsets of variables are compared based on their root mean square errors for the validation set. By taking into account the statistical prediction error obtained for the calibration set, and dividing by the statistical prediction error obtained for the prediction set, SPA can be improved. Also taken into account is the leverage associated with each sample. Three case studies involving; simulated analytic determinations, food colorants (UV-VIS spectrometry), and ethanol in gasoline (NIR spectrometry) are discussed. The results were evaluated using the root mean square error for an independent prediction set (Root Mean Square Error of Prediction - RMSEP), graphs of the variables, and the statistical tests t and F. The MLR models obtained by the selection using the new function were called SPE-SPA-MLR. When an interferent was present in the prediction spectra, almost all of the models performed better than both SPA-MLR and PLS. The models when compared to SPA-MLR showed that the change promoted better models in all cases giving smaller RMSEPs and variable numbers. The SPE-SPA-MLR was not better in some cases, than PLS models. The variables selected by SPA-SPE-MLR when observed in the spectra were detected in regions where interference was the at its smallest, revealing great potential. The modifications presented here make a useful tool for the basic formulation of the SPA.
Este trabalho propõe uma modificação no Algoritmo das Projeções Sucessivas (Sucessive Projection Algorithm - SPA), com objetivo de aumentar a robustez a interferentes nos modelos de Regressão Linear Múltipla (Multiple Linear Regression - MLR) construídos. Na formulação original do SPA, subconjuntos de variáveis são comparados entre si com base na raiz do erro quadrático médio obtido em um conjunto de validação. De acordo com o critério aqui proposto, a comparação é feita também levando em conta o erro estatístico de previsão (Statistical Prediction Error SPE) obtido para o conjunto de calibração dividido pelo erro estatístico de previsão obtido para o conjunto de previsão. Tal métrica leva em conta a leverage associada a cada amostra. Três estudos de caso envolvendo a determinação de analitos simulados, corantes alimentícios por espectrometria UV-VIS e álcool em gasolinas por espectrometria NIR são discutidos. Os resultados são avaliados em termos da raiz do erro quadrático médio em um conjunto de previsão independente (Root Mean Square Error of Prediction - RMSEP), dos gráficos das variáveis selecionadas e através do testes estatísticos t e F. Os modelos MLR obtidos a partir da seleção usando a nova função custo foram chamados aqui de SPA-SPE-MLR. Estes modelos foram comparados com o SPA-MLR e PLS. Os desempenhos de previsão do SPA-SPEMLR apresentados foram melhores em quase todos os modelos construídos quando algum interferente estava presente nos espectros de previsão. Estes modelos quando comparados ao SPA-MLR, revelou que a mudança promoveu melhorias em todos os casos fornecendo RMSEPs e números de variáveis menores. O SPA-SPE-MLR só não foi melhor que alguns modelos PLS. As variáveis selecionadas pelo SPA-SPE-MLR quando observadas nos espectros se mostraram em regiões onde a ação do interferente foi à menor possível revelando o grande potencial que tal mudança provocou. Desta forma a modificação aqui apresentada pode ser considerada como uma ferramenta útil para a formulação básica do SPA.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Olid, Pilar. "Making Models with Bayes". CSUSB ScholarWorks, 2017. https://scholarworks.lib.csusb.edu/etd/593.

Texto completo
Resumen
Bayesian statistics is an important approach to modern statistical analyses. It allows us to use our prior knowledge of the unknown parameters to construct a model for our data set. The foundation of Bayesian analysis is Bayes' Rule, which in its proportional form indicates that the posterior is proportional to the prior times the likelihood. We will demonstrate how we can apply Bayesian statistical techniques to fit a linear regression model and a hierarchical linear regression model to a data set. We will show how to apply different distributions to Bayesian analyses and how the use of a prior affects the model. We will also make a comparison between the Bayesian approach and the traditional frequentist approach to data analyses.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

D’ávila, Rodrigo Souza. "APLICAÇÃO DE REGRESSÃO LINEAR MÚLTIPLA NA ANÁLISE DA DINÂMICA DE CÁTIONS TROCÁVEIS EM UM SISTEMA SOLO-PLANTA IRRIGADO COM ÁGUA RESIDUÁRIA". UNIVERSIDADE ESTADUAL DE PONTA GROSSA, 2013. http://tede2.uepg.br/jspui/handle/prefix/123.

Texto completo
Resumen
Made available in DSpace on 2017-07-21T14:19:22Z (GMT). No. of bitstreams: 1 Rodrigo Souza.pdf: 360141 bytes, checksum: 6bf9d8f9ce30fb6fa717ad9798736d1e (MD5) Previous issue date: 2013-07-22
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
The competition of water in different regions of the world, between agriculture and the human needs, has led to restrictions in the increase of food production, resulting in search for alternative sources. The use of effluent from secondary treatment of sewage (ETSE) has been a common practice in several seasonal situations. The aims of this work were: (i) create regression models to assist in the understanding of the dynamics of acidity (current, exchangeable and total), the exchangeable bases and the exchangeable sodium percentage (ESP) in the soil, through the use of multiple linear regression (RLM), considering variables of soil, soil solution, plant, ETSE, weather and complementary variables, and (ii) compare the generated models with the standard method and the models generated from selecting variables. For the construction of the MLR models, the method of stepwise variable selection, forward and backward were used and compared with the standard method through the index adjusted determination coefficient (R2adj) and the variance inflation factor (VIF). The models developed from the method of variables selection were the most indicated. All the attributes in the scenarios and layers of the studied soils were not explained by the same group of variables. In general the results were consistent as far as the pH increased, the H + Al (total acidity) and Al (potential acidity) concentration decreased and Ca (calcium), Mg (magnesium) were increased. Because of the low-K (potassium) in the soil, the contribution of this nutrient by irrigation with ETSE cause little influence in the concentrations of this element. Due to the high sodium absorption ratio (SAR) in the effluent concentrations of this element, as well as PST were increased over time in soil. The accumulation and export of Na (sodium) by plants was not sufficient to prevent the increase in the concentrations of exchangeable Na and ESP in all studied scenarios and layers.
A concorrência de água entre o setor agrícola e as necessidades humanas em diversas regiões do mundo tem ocasionado restrições no incremento da produção de alimentos, implicando em buscas por fontes alternativas. A utilização de efluente de tratamento secundário de esgoto (ETSE) tem sido uma prática comum em várias situações sazonais. Objetivou-se neste trabalho:(i) criar modelos de regressão para auxiliar no entendimento da dinâmica da acidez (trocável e total), bases trocáveis e percentual de sódio trocável (PST) no solo, através do uso de regressão linear múltipla (RLM), considerando variáveis de solo, solução no solo, planta, ETSE, meteorológicas e variáveis complementares; e (ii) comparar os modelos gerados com método padrão e os modelos gerados com seleção de variáveis. Para construção dos modelos de RLM foram utilizados o método de seleção de variáveis stepwise, forward e backward e comparados com o método padrão, através dos índices de coeficiente de determinação ajustado (R2adj) e do fator de inflação de variância (FIV). Os modelos desenvolvidos a partir do método de seleção de variáveis foram os mais indicados. Todos os atributos nos cenários e camadas de solos estudados não foram explicadas por um mesmo grupo de variáveis. De modo geral, os resultados foram coerentes, pois na medida em que o pH aumentou, as concentrações H+Al e Al diminuíram e as de Ca e Mg foram incrementadas. O baixo teor de K no solo, evidenciou que o aporte desse nutriente pela irrigação com ETSE pouco influência as concentrações desse elemento. Devido à alta razão de adsorção de sódio (RAS) no ETSE as concentrações deste elemento, bem como PST foram aumentadas ao longo do tempo no solo. O acúmulo e a exportação de Na pelas plantas não foi suficiente para evitar o incremento nas concentrações de Na trocável e PST em todos os cenários e camadas estudados.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Salawu, Emmanuel Oluwatobi. "Spatiotemporal Variations in Coexisting Multiple Causes of Death and the Associated Factors". ScholarWorks, 2018. https://scholarworks.waldenu.edu/dissertations/6108.

Texto completo
Resumen
The study and practice of epidemiology and public health benefit from the use of mortality statistics, such as mortality rates, which are frequently used as key health indicators. Furthermore, multiple causes of death (MCOD) data offer important information that could not possibly be gathered from other mortality data. This study aimed to describe the interrelationships between various causes of death in the United States in order to improve the understanding of the coexistence of MCOD and thereby improve public health and enhance longevity. The social support theory was used as a framework, and multivariate linear regression analyses were conducted to examine the coexistence of MCOD in approximately 80 million death cases across the United States from 1959 to 2005. The findings showed that in the United States, there is a statistically significant relationship between the number of coexisting MCOD, race, education, and the state of residence. Furthermore, age, gender, and marital status statistically influence the average number of coexisting MCOD. The results offer insights into how the number of coexisting MCOD vary across the United States, races, education levels, gender, age, and marital status and lay a foundation for further investigation into what people are dying from. The results have the long-term potential of helping public health practitioners identify individuals or communities that are at higher risks of death from a number of coexisting MCOD such that actions could be taken to lower the risks to improve people's wellbeing, enhance longevity, and contribute to positive social change.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Franksson, Rikard. "Private Equity Portfolio Management and Positive Alphas". Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-275666.

Texto completo
Resumen
This project aims to analyze Nordic companies active in the sector of Information and Communications Technology (ICT), and does this in two parts. Part I entails analyzing public companies to construct a valuation model aimed at predicting the enterprise value of private companies. Part II deals with analyzing private companies to determine if there are opportunities providing excess returns as compared to investments in public companies. In part I, a multiple regression approach is utilized to identify suitable valuation models. In doing so, it is revealed that 1-factor models provide best statistical results in terms of significance and prediction error. In descending order, in terms of prediction accuracy, these are (1) total assets, (2) turnover, (3) EBITDA, and (4) cash flow. Part II uses model (1) and finds that Nordic ICT private equity does provide opportunities for positive alphas, and that it is possible to construct portfolio strategies that increase this alpha. However, with regards to previous research, it seems as though the returns offered by the private equity market analyzed does not adequately compensate investors for the additional risks related to investing in private equity.
Det här projektet analyserar nordiska bolag aktiva inom Informations- och Kommunikationsteknologi (ICT) i två delar. Del I behandlar analys av publika bolag för att konstruera en värderingsmodell avsedd att förutsäga privata bolags enterprise value. Del II analyserar privata bolag för att undersöka huruvida det finns möjligheter att uppnå överavkastning jämfört med investeringar i publika bolag. I del I utnyttjas multipel regressionsanalys för att identifiera tillämpliga värderingsmodeller. I den processen påvisas att modeller med enbart en faktor ger bäst statistiska resultat i fråga om signifikans och förutsägelsefel. I fallande ordning, med avseende på precision i förutsägelser, är dessa modeller (1) totala tillgångar, (2) omsättning, (3) EBITDA, och (4) kassaflöde. Del II använder modell (1) och finner att den nordiska marknaden för privata ICT-bolag erbjuder möjligheter för överavkastning jämfört med motsvarande publika marknad, samt att det är möjligt att konstruera portföljstrategier som ökar avkastningen ytterligare. Dock, med hänsyn till tidigare forskning, verkar det som att de möjligheter för avkastning som går att finna på marknaden av privata bolag som undersökts inte kompenserar investerare tillräckligt för de ytterligare risker som är relaterade till investeringar i privata bolag.
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Paula, Lauro Cássio Martins de. "Paralelização de algoritmos APS e Firefly para seleção de variáveis em problemas de calibração multivariada". Universidade Federal de Goiás, 2014. http://repositorio.bc.ufg.br/tede/handle/tede/3418.

Texto completo
Resumen
Submitted by Jaqueline Silva (jtas29@gmail.com) on 2014-10-21T18:36:43Z No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Approved for entry into archive by Jaqueline Silva (jtas29@gmail.com) on 2014-10-21T18:37:00Z (GMT) No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5)
Made available in DSpace on 2014-10-21T18:37:00Z (GMT). No. of bitstreams: 2 Dissertação - Lauro Cássio Martins de Paula - 2014.pdf: 2690755 bytes, checksum: 3f2c0a7c51abbf9cd88f38ffbe54bb67 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2014-07-15
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES
The problem of variable selection is the selection of attributes for a given sample that best contribute to the prediction of the property of interest. Traditional algorithms as Successive Projections Algorithm (APS) have been quite used for variable selection in multivariate calibration problems. Among the bio-inspired algorithms, we note that the Firefly Algorithm (AF) is a newly proposed method with potential application in several real world problems such as variable selection problem. The main drawback of these tasks lies in them computation burden, as they grow with the number of variables available. The recent improvements of Graphics Processing Units (GPU) provides to the algorithms a powerful processing platform. Thus, the use of GPUs often becomes necessary to reduce the computation time of the algorithms. In this context, this work proposes a GPU-based AF (AF-RLM) for variable selection using multiple linear regression models (RLM). Furthermore, we present two APS implementations, one using RLM (APSRLM) and the other sequential regressions (APS-RS). Such implementations are aimed at improving the computational efficiency of the algorithms. The advantages of the parallel implementations are demonstrated in an example involving a large number of variables. In such example, gains of speedup were obtained. Additionally we perform a comparison of AF-RLM with APS-RLM and APS-RS. Based on the results obtained we show that the AF-RLM may be a relevant contribution for the variable selection problem.
O problema de seleção de variáveis consiste na seleção de atributos de uma determinada amostra que melhor contribuem para a predição da propriedade de interesse. O Algoritmo das Projeções Sucessivas (APS) tem sido bastante utilizado para seleção de variáveis em problemas de calibração multivariada. Entre os algoritmos bioinspirados, nota-se que o Algoritmo Fire f ly (AF) é um novo método proposto com potencial de aplicação em vários problemas do mundo real, tais como problemas de seleção de variáveis. A principal desvantagem desses dois algoritmos encontra-se em suas cargas computacionais, conforme seu tamanho aumenta com o número de variáveis. Os avanços recentes das Graphics Processing Units (GPUs) têm fornecido para os algoritmos uma poderosa plataforma de processamento e, com isso, sua utilização torna-se muitas vezes indispensável para a redução do tempo computacional. Nesse contexto, este trabalho propõe uma implementação paralela em GPU de um AF (AF-RLM) para seleção de variáveis usando modelos de Regressão Linear Múltipla (RLM). Além disso, apresenta-se duas implementações do APS, uma utilizando RLM (APS-RLM) e uma outra que utiliza a estratégia de Regressões Sequenciais (APS-RS). Tais implementações visam melhorar a eficiência computacional dos algoritmos. As vantagens das implementações paralelas são demonstradas em um exemplo envolvendo um número relativamente grande de variáveis. Em tal exemplo, ganhos de speedup foram obtidos. Adicionalmente, realiza-se uma comparação do AF-RLM com o APS-RLM e APS-RS. Com base nos resultados obtidos, mostra-se que o AF-RLM pode ser uma contribuição relevante para o problema de seleção de variáveis.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Louredo, Graciliano Márcio Santos. "Estimação via EM e diagnóstico em modelos misturas assimétricas com regressão". Universidade Federal de Juiz de Fora (UFJF), 2018. https://repositorio.ufjf.br/jspui/handle/ufjf/6662.

Texto completo
Resumen
Submitted by Geandra Rodrigues (geandrar@gmail.com) on 2018-04-10T15:11:39Z No. of bitstreams: 1 gracilianomarciosantoslouredo.pdf: 1813142 bytes, checksum: b79d02006212c4f63d6836c9a417d4bc (MD5)
Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-04-11T15:25:36Z (GMT) No. of bitstreams: 1 gracilianomarciosantoslouredo.pdf: 1813142 bytes, checksum: b79d02006212c4f63d6836c9a417d4bc (MD5)
Made available in DSpace on 2018-04-11T15:25:36Z (GMT). No. of bitstreams: 1 gracilianomarciosantoslouredo.pdf: 1813142 bytes, checksum: b79d02006212c4f63d6836c9a417d4bc (MD5) Previous issue date: 2018-02-26
FAPEMIG - Fundação de Amparo à Pesquisa do Estado de Minas Gerais
O objetivo deste trabalho é apresentar algumas contribuições para a melhoria do processo de estimação por máxima verossimilhança via algoritmo EM em modelos misturas assimétricas com regressão, além de realizar neles a análise de influência local e global. Essas contribuições, em geral de natureza computacional, visam à resolução de problemas comuns na modelagem estatística de maneira mais eficiente. Dentre elas está a substituição de métodos utilizados nas versões dos algoritmos GEM por outras que reduzem o problema aproximadamente a um algoritmo EM clássico nos principais exemplos das distribuições misturas de escala assimétricas de normais. Após a execução do processo de estimação, discutiremos ainda as principais técnicas existentes para o diagnóstico de pontos influentes com as adaptações necessárias aos modelos em foco. Desejamos com tal abordagem acrescentar ao tratamento dessa classe de modelos estatísticos a análise de regressão nas distribuições mais recentes na literatura. Também esperamos abrir caminho para o uso de técnicas similares em outras classes de modelos.
The objective of this work is to present some contributions to improvement the process of maximum likelihood estimation via the EM algorithm in skew mixtures models with regression, as well as to execute in them the global and local influence analysis. These contributions, usually with computational nature, aim to solving common problems in statistical modeling more efficiently. Among them is the replacement of used methods in the versions of the GEM algorithm by other techniques that reduce the problem approximately to a classic EM algorithm in the main examples of skew scale mixtures of normals distributions. After performing the estimation process, we will also discuss the main existing techniques for the diagnosis of influential points with the necessaries adaptations to the models in focus. We wish with this approach to add for the treatment of this statistical model class the regression analysis in the most recent distributions in the literature. We too hope to paving the way for use of similar techniques in other models classes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Kuhnert, Petra Meta. "New methodology and comparisons for the analysis of binary data using Bayesian and tree based methods". Thesis, Queensland University of Technology, 2003.

Buscar texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Nahangi, Arian A. "Modeling and Solving the Outsourcing Risk Management Problem in Multi-Echelon Supply Chains". DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2321.

Texto completo
Resumen
Worldwide globalization has made supply chains more vulnerable to risk factors, increasing the associated costs of outsourcing goods. Outsourcing is highly beneficial for any company that values building upon its core competencies, but the emergence of the COVID-19 pandemic and other crises have exposed significant vulnerabilities within supply chains. These disruptions forced a shift in the production of goods from outsourcing to domestic methods. This paper considers a multi-echelon supply chain model with global and domestic raw material suppliers, manufacturing plants, warehouses, and markets. All levels within the supply chain network are evaluated from a holistic perspective, calculating a total cost for all levels with embedded risk. We formulate the problem as a mixed-integer linear model programmed in Excel Solver linear to solve smaller optimization problems. Then, we create a Tabu Search algorithm that solves problems of any size. Excel Solver considers three small-scale supply chain networks of varying sizes, one of which maximizes the decision variables the software can handle. In comparison, the Tabu Search program, programmed in Python, solves an additional ten larger-scaled supply chain networks. Tabu Search’s capabilities illustrate its scalability and replicability. A quadratic multi-regression analysis interprets the input parameters (iterations, neighbors, and tabu list size) associated with total supply chain cost and run time. The analysis shows iterations and neighbors to minimize total supply chain cost, while the interaction between iterations x neighbors increases the run time exponentially. Therefore, increasing the number of iterations and neighbors will increase run time but provide a more optimal result for total supply chain cost. Tabu Search’s input parameters should be set high in almost every practical case to achieve the most optimal result. This work is the first to incorporate risk and outsourcing into a multi-echelon supply chain, solved using an exact (Excel Solver) and metaheuristic (Tabu Search) solution methodology. From a practical case, managers can visualize supply chain networks of any size and variation to estimate the total supply chain cost in a relatively short time. Supply chain managers can identify suppliers and pick specific suppliers based on cost or risk. Lastly, they can adjust for risk according to external or internal risk factors. Future research directions include expanding or simplifying the supply chain network design, considering multiple parts, and considering scrap or defective products. In addition, one could incorporate a multi-product dynamic planning horizon supply chain. Overall, considering a hybrid method combining Tabu Search with genetic algorithms, particle swarm optimization, simulated annealing, CPLEX, GUROBI, or LINGO, could provide better results in a faster computational time.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Tomek, Peter. "Approximation of Terrain Data Utilizing Splines". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236488.

Texto completo
Resumen
Pro optimalizaci letových trajektorií ve velmi malé nadmorské výšce, terenní vlastnosti musí být zahrnuty velice přesne. Proto rychlá a efektivní evaluace terenních dat je velice důležitá vzhledem nato, že čas potrebný pro optimalizaci musí být co nejkratší. Navyše, na optimalizaci letové trajektorie se využívájí metody založené na výpočtu gradientu. Proto musí být aproximační funkce terenních dat spojitá do určitého stupne derivace. Velice nádejná metoda na aproximaci terenních dat je aplikace víceroměrných simplex polynomů. Cílem této práce je implementovat funkci, která vyhodnotí dané terenní data na určitých bodech spolu s gradientem pomocí vícerozměrných splajnů. Program by měl vyčíslit více bodů najednou a měl by pracovat v $n$-dimensionálním prostoru.
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Ostrowska, Alicja. "War is Peace : A Study of Relationship Between Gender Equality and Peacefulness of a State". Thesis, Högskolan för lärande och kommunikation, Högskolan i Jönköping, HLK, Globala studier, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-27663.

Texto completo
Resumen
Based on the previous studies, the hypothesis of this research is that the higher the level of gender equality in a state, the higher level of its peacefulness. It is a quantitative study using linear regression analysis with three variables, namely Global Peace Index (GPI) as a dependent variable, Gender Inequality Index (GII) as an independent variable and Human Development Index (HDI) as a control variable. The data of 139 states from year 2013 were submitted into Statistical Package for Social Sciences (SPSS) software. The result shows a significant and positive linear relationship between gender inequality and a high level of conflict, which confirms the hypothesis. However, HDI shows to be less reliable as a control variable due to issues with multicollinearity (heavily related independent variables). Further studies should replace the HDI with another control variable.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Kossaï, Mohamed. "Les Technologies de L’Information et des Communications (TIC), le capital humain, les changements organisationnels et la performance des PME manufacturières". Thesis, Paris 9, 2013. http://www.theses.fr/2013PA090035/document.

Texto completo
Resumen
Les TIC sont un facteur clé de performance dans les pays développés. Cette thèse s’intéresse à l’adoption des TIC et leur impact sur la performance des PME manufacturières d’un pays en développement. A la suite d’une première partie qui présente le cadre théorique et conceptuel, le reste de la thèse est organisé en trois études empiriques. La première étude propose une modélisation Probit afin d’identifier les déterminants d’adoption des TIC. Le capital humain est la variable explicative la plus significative. Se basant sur la régression linéaire à variables muettes, la causalité de Granger, le test de Kruskal-Wallis et le test de l’ANOVA de Welch, suivis des tests post-hoc correspondants, la deuxième étude met en évidence l’existence d’un fort lien statistique significatif entre le niveau d’adoption des TIC et la rentabilité. Dans une troisième étude, plusieurs modélisations Probit (simple, ordonné et multivarié) ont été testées sur différentes mesures de performance. Nous montrons, premièrement, que les TIC ont un impact positif sur la productivité, la rentabilité et la compétitivité. Deuxièmement, les TIC, le capital humain et la formation sont les déterminants de la performance globale. Enfin, la contribution des TIC à la performance globale est forte lorsqu’elles sont combinées au capital humain qualifié. En définitive, nos résultats empiriques ont montré un effet positif des TIC, du capital humain et du changement organisationnel sur la performance des PME
ICT is a key performance factor in developed countries. This PhD thesis focuses on the adoption of ICTs and their impact on the performance of manufacturing SMEs in a developing country. Following a first part covering the theoretical and conceptual framework, the rest of the thesis is organized in three empirical studies. The first study uses a Probit model in order to identify the determinants of ICT adoption. Human capital seems to be the most significant explanatory variable. Based on linear regression of dummy variables, Granger causality, Kruskal-Wallis test, ANOVA test of Welch, followed by corresponding post-hoc tests, the second study highlights the existence of a strong statistically significant relationship between the level of ICT adoption and profitability. In a third study, many Probit models (simple, ordered and multivariate) were tested on different measures of performance. Firstly, we show that ICT have a positive impact on productivity, profitability and competitiveness of SMEs. Secondly, ICT, human capital and training are determinants of firm overall performance. Thirdly, when combined together, ICT and highly skilled human resources have an important contribution to the global performance. In conclusion, our empirical results demonstrate a positive impact of ICT, human capital and organizational change on firm performance
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Assareh, Hassan. "Bayesian hierarchical models in statistical quality control methods to improve healthcare in hospitals". Thesis, Queensland University of Technology, 2012. https://eprints.qut.edu.au/53342/1/Hassan_Assareh_Thesis.pdf.

Texto completo
Resumen
Quality oriented management systems and methods have become the dominant business and governance paradigm. From this perspective, satisfying customers’ expectations by supplying reliable, good quality products and services is the key factor for an organization and even government. During recent decades, Statistical Quality Control (SQC) methods have been developed as the technical core of quality management and continuous improvement philosophy and now are being applied widely to improve the quality of products and services in industrial and business sectors. Recently SQC tools, in particular quality control charts, have been used in healthcare surveillance. In some cases, these tools have been modified and developed to better suit the health sector characteristics and needs. It seems that some of the work in the healthcare area has evolved independently of the development of industrial statistical process control methods. Therefore analysing and comparing paradigms and the characteristics of quality control charts and techniques across the different sectors presents some opportunities for transferring knowledge and future development in each sectors. Meanwhile considering capabilities of Bayesian approach particularly Bayesian hierarchical models and computational techniques in which all uncertainty are expressed as a structure of probability, facilitates decision making and cost-effectiveness analyses. Therefore, this research investigates the use of quality improvement cycle in a health vii setting using clinical data from a hospital. The need of clinical data for monitoring purposes is investigated in two aspects. A framework and appropriate tools from the industrial context are proposed and applied to evaluate and improve data quality in available datasets and data flow; then a data capturing algorithm using Bayesian decision making methods is developed to determine economical sample size for statistical analyses within the quality improvement cycle. Following ensuring clinical data quality, some characteristics of control charts in the health context including the necessity of monitoring attribute data and correlated quality characteristics are considered. To this end, multivariate control charts from an industrial context are adapted to monitor radiation delivered to patients undergoing diagnostic coronary angiogram and various risk-adjusted control charts are constructed and investigated in monitoring binary outcomes of clinical interventions as well as postintervention survival time. Meanwhile, adoption of a Bayesian approach is proposed as a new framework in estimation of change point following control chart’s signal. This estimate aims to facilitate root causes efforts in quality improvement cycle since it cuts the search for the potential causes of detected changes to a tighter time-frame prior to the signal. This approach enables us to obtain highly informative estimates for change point parameters since probability distribution based results are obtained. Using Bayesian hierarchical models and Markov chain Monte Carlo computational methods, Bayesian estimators of the time and the magnitude of various change scenarios including step change, linear trend and multiple change in a Poisson process are developed and investigated. The benefits of change point investigation is revisited and promoted in monitoring hospital outcomes where the developed Bayesian estimator reports the true time of the shifts, compared to priori known causes, detected by control charts in monitoring rate of excess usage of blood products and major adverse events during and after cardiac surgery in a local hospital. The development of the Bayesian change point estimators are then followed in a healthcare surveillances for processes in which pre-intervention characteristics of patients are viii affecting the outcomes. In this setting, at first, the Bayesian estimator is extended to capture the patient mix, covariates, through risk models underlying risk-adjusted control charts. Variations of the estimator are developed to estimate the true time of step changes and linear trends in odds ratio of intensive care unit outcomes in a local hospital. Secondly, the Bayesian estimator is extended to identify the time of a shift in mean survival time after a clinical intervention which is being monitored by riskadjusted survival time control charts. In this context, the survival time after a clinical intervention is also affected by patient mix and the survival function is constructed using survival prediction model. The simulation study undertaken in each research component and obtained results highly recommend the developed Bayesian estimators as a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances as well as industrial and business contexts. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The empirical results and simulations indicate that the Bayesian estimators are a strong alternative in change point estimation within quality improvement cycle in healthcare surveillances. The superiority of the proposed Bayesian framework and estimators are enhanced when probability quantification, flexibility and generalizability of the developed model are also considered. The advantages of the Bayesian approach seen in general context of quality control may also be extended in the industrial and business domains where quality monitoring was initially developed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

COLPO, MARCO. "The relationship between food intake and depressive symptoms". Doctoral thesis, 2018. https://hdl.handle.net/2158/1288738.

Texto completo
Resumen
Depression is one of the most prevalent diseases and an important risk factor for general public health. In particular is very common in elder persons. Accordin- gly, strategies to prevent depression are needed. A growing body of empirical evidence suggests the key role of diet in the prevention of depression. This fact is the basis of the European MooDFOOD project, which is aimed to study the multifaceted links of food intake with depression. The thesis will be focused on the identification of the impact of depressive symp- toms on the food intake behavior. The main challenge to be addressed is to handle with food intake, which are usually collected in large matrices of nutrien- ts and foods. In the nutritional framework, two approaches are usually adopted: a priori and a posteriori patterns. A priori patterns use expertise knowledge to implement nutritional scores. As opposite, food patterns were derived by the study of the covariance matrix. The main goal of this thesis is to understand how the food intake behavior is affected by the occurrence of depressive symptoms. In particular, two dedicated statistical methodologies are proposed to estimate the covariance/precision ma- trix conditionally to a set of covariates. As first approach a heteroscedastic multivariate regression model is implemented. Dedicated prior distributions were specified to infer in a Bayesian framework. Parametrization of the conditional covariance is implemented to favor the inter- pretation of results. A Metropolis within Gibbs sampling scheme is implemented for posterior computation. Moreover, a correlation adjustment is proposed in or- der to ensure the positive estimates. A second approach based on multiple graphical model is implemented. A high dimensional framework was considered. Accordingly, a sparse joint estimation procedure was adopted to estimate the partial correlation matrix. This approach takes into account of a possible common structure across groups. Moreover, a strategy based on an interpolation method (kriging) is implemented for graphs selection. The proposed statistical methodologies provided comparable results in explai- ning the food intake behavior of a sample of participants to the InCHIANTI study, which is an epidemiological study place in the Chianti area of Tuscany. In particular it was evidenced how diets may changes before to be classified as depressed. In both application, the intake of olive oil resulted central. The heteroscedastic multivariate regression model is useful to provide interpre- table results given its smart parameterization. It is able to have different kind of covariates. However, it could be affected by a risk of over parametrization. Moreover, an intense computational effort is required to obtain posterior di- stribution. The joint sparse estimation of multiple graphical models allows to understand the conditional association between food groups in different data- sets. It provided better results in term of computational cost but is limited to analyze only groups of subjects. Furthermore, even if a bootstrap procedure is implemented to select more clear graphs, interpretation can still remain unclear. In conclusion, even if some issue remains open for both methodologies, these approaches could be introduced in the nutritional framework as alternative way to analyze dietary data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

王仁聖. "Generalized inference in heteroscedastic multivariate linear models". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/90735848617742765093.

Texto completo
Resumen
博士
國立交通大學
統計學研究所
96
Our main subject in this dissertation is applying the generalized method to deal with regression model with heteroscedastic AR(1) covariance matrices. The concepts of the generalized p-values and the generalized confidence intervals proposed by Tsui and Weerahandi (1989) and Weerahandi (1993), respectively, provide an alternative way to handle with heteroscedasticity. We extend these concepts to further consider the standardized expression of the generalized multivariate test variable. Lin and Lee (2003) applied the generalized method to deal with the MANOVA model with unequal uniform covariance structures among multiple groups. We utilize their process with modifications to deal with regression model with heteroscedastic serial dependence. The coverage probabilities and expected areas based on our proposed procedure display satisfactory results. Besides, we also find that our method can be applied to the uniform structures without the special design matrices assumption.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Huang, Min-Chia y 黃敏嘉. "Multivariate Function-on-Function Linear Regression". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/78499496560023216042.

Texto completo
Resumen
碩士
國立中興大學
統計學研究所
105
Functional linear regression is an important tool to analyze longitudinal data. In longitudinal data analysis, the observations are made on irregular time points (or locations) with measurement error. Moreover, observations of the same subject are correlated. Our method is suitable for the mentioned situations. Our method aims at improving function-on-function linear regression. Traditionally, the regression coefficients for function-on-function linear regression models are estimated by the first few principal components of both the predictor and response functions. However, some useful information might be treated as error term and thus be discarded if we just adapt the first few important principal components. Consequently, we resolve this issue by estimating the coefficients directly from the covariance functions of predictor and response functions. The proposed estimation approach can be used in multiple and multidimensional function-on-function linear regression models as well.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Yeh, Shing-Hung y 葉世弘. "Adaptive Group Lasso for Multivariate Linear Regression". Thesis, 2009. http://ndltd.ncl.edu.tw/handle/90910161360611684952.

Texto completo
Resumen
碩士
國立成功大學
統計學系碩博士班
97
In traditional statistical method, estimation and variable selection are almost discussed separately. LASSO (Tibshirani, 1996) is a new method for estimation in linear model, it can estimate parameters and variable selection simultaneously. But Lasso is inconsistent for variable selection, Adaptive Lasso (Zou 2006) overcomes these problems and enjoys the oracle properties. In linear regression when categorical predictors (factors) are present, the Lasso solution only selects individual dummy variables instead of whole factors. The group Lasso(Yuan and Lin 2006) overcomes these problems. Group lasso is a natural extension of lasso and selects variable in a grouped manner, group lasso suffers from estimation inefficiency and selection inconsistency. Adaptive Group Lasso (Wang and Leng 2006) show it’s estimator can be as efficient as oracle. We propose the adaptive group lasso for multivariate linear regression. In our study, the definition of grouped variable is different with the definition defined by formed study, which is regard one column of model matrix as a group. We consider one row of parametric matrix as one group for finding the significant variable on Y.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Lin, Jin-Sying. "Linear regression analysis for multivariate failure time observations". 1991. http://catalog.hathitrust.org/api/volumes/oclc/26228833.html.

Texto completo
Resumen
Thesis (Ph. D.)--University of Wisconsin--Madison, 1991.
Typescript. Vita. eContent provider-neutral record in process. Description based on print version record. Includes bibliographical references (leaves 88-93).
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Chen, Lianfu. "Topics on Regularization of Parameters in Multivariate Linear Regression". Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10644.

Texto completo
Resumen
My dissertation mainly focuses on the regularization of parameters in the multivariate linear regression under different assumptions on the distribution of the errors. It consists of two topics where we develop iterative procedures to construct sparse estimators for both the regression coefficient and scale matrices simultaneously, and a third topic where we develop a method for testing if the skewness parameter in the skew-normal distribution is parallel to one of the eigenvectors of the scale matrix. In the first project, we propose a robust procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for the correlations of the response variables. Robustness to outliers is achieved using heavy-tailed t distributions for the multivariate response, and shrinkage is introduced by adding to the negative log-likelihood l1 penalties on the entries of both the regression coefficient matrix and the precision matrix of the responses. Taking advantage of the hierarchical representation of a multivariate t distribution as the scale mixture of normal distributions and the EM algorithm, the optimization problem is solved iteratively where at each EM iteration suitably modified multivariate regression with covariance estimation (MRCE) algorithms proposed by Rothman, Levina and Zhu are used. We propose two new optimization algorithms for the penalized likelihood, called MRCEI and MRCEII, which differ from MRCE in the way that the tuning parameters for the two matrices are selected. Estimating the degrees of freedom when penalizing the entries of the matrices presents new computational challenges. A simulation study and real data analysis demonstrate that the MRCEII, which selects the tuning parameter of the precision matrix of the multiple responses using the Cp criterion, generally does the best among all methods considered in terms of the prediction error, and MRCEI outperforms the MRCE methods when the regression coefficient matrix is less sparse. The second project is motivated by the existence of the skewness in the data for which the symmetric distribution assumption on the errors does not hold. We extend the procedure we have proposed to the case where the errors in the multivariate linear regression follow a multivariate skew-normal or skew-t distribution. Based on the convenient representation of skew-normal and skew-t as well as the EM algorithm, we develop an optimization algorithm, called MRST, to iteratively minimize the negative penalized log-likelihood. We also carry out a simulation study to assess the performance of the method and illustrate its application with one real data example. In the third project, we discuss the asymptotic distributions of the eigenvalues and eigenvectors for the MLE of the scale matrix in a multivariate skew-normal distribution. We propose a statistic for testing whether the skewness vector is proportional to one of the eigenvectors of the scale matrix based on the likelihood ratio. Under the alternative, the likelihood is maximized numerically with two different ways of parametrization for the scale matrix: Modified Cholesky Decomposition (MCD) and Givens Angle. We conduct a simulation study and show that the statistic obtained using Givens Angle parametrization performs well and is more reliable than that obtained using MCD.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Lin, Lung-Shun y 林隆舜. "Multivariate Linear Regression Models with Censored and Missing Responses". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/36387812662217809093.

Texto completo
Resumen
碩士
逢甲大學
統計學系統計與精算碩士班
104
During the past few decades, statistical methods for continuous longitudinal data, which are repeatedly collected on each subject over a period of time, have received considerable attention via in the literature, especially in biomedical studies and clinical trials. In longitudinal research, missing data occur frequently due to many reasons, such as missed visits, withdrawal from a study, loss to follow-up, and so on. Besides, left and/or right censored observations, which are not exactly quantified, exist in the data due to certain lower and/or upper detection limits. For analyzing longitudinal data with missing values and censored responses simultaneously, this thesis proposes the multivariate linear regression model with censored and missing responses (MLRCM). The MLRCM approach includes the multivariate linear regression with censored responses (MLRC), multivariate linear regression with missing responses (MLRM) and multivariate linear regression (MLR) as special cases, which are also discussed in this thesis. A computational flexible expectation conditional maximization (ECM) algorithm is provided to carry out maximum likelihood estimation of model parameters. The standard errors of estimates of regression coefficients are calculated by a information-based method. A series of simulation studies are conducted to examine the finite sample property of the proposed model. We illustrate our methodology with a real-data example.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Lee, Wen-wei y 李文偉. "Modeling Construction Unit Rate Estimation Using Multivariate Linear Regression Analysis". Thesis, 2002. http://ndltd.ncl.edu.tw/handle/36024282213244411336.

Texto completo
Resumen
碩士
國立中央大學
土木工程研究所
90
Estimation of unit rate is essential for determination of an activity’s duration and its corresponding cost. Traditionally the activity unit rate is estimated by expert’s experience or by simply the statistical average of historical data. Both are rough and inaccurate. This study develops a multivariate linear regression model for the estimation of activity unit rate. More than 400 records of data for activities of concrete placement and steel rebar in bridge superstructure construction were collected and used for the analysis. Two multivariate linear regression model, one for the concrete placement (R2*= 0.8558) and the other for the steel rebar (R2*= 0.7196), are developed as a result. The crew size, unit work quantity, % of foreign workers, and findings are reported.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Wu, Chung-Fu y 吳重孚. "Building the Multivariate Linear Regression Model of DPP-IV Inhibitors". Thesis, 2012. http://ndltd.ncl.edu.tw/handle/19999224791197635235.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

陳易駿. "Multivariate Multiple Linear Profile Monitoring Based on Partial Least Squares Regression". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/69003909692295256622.

Texto completo
Resumen
碩士
國立清華大學
統計學研究所
102
Quality control of the manufacturing process and how to monitor the process more effectively are important issues in the recent. In many practical manufacturing processes, the quality can be expressed by a function of one or more explanatory variables and response variables, and this kind of data is known as profile data. There are many literatures talking about the methods of linear profile monitoring today, but the current methods of linear profile monitoring apply only to the case that the number of observations is sufficient to estimate all regression parameters. For the case that the number of observations is not sufficient to estimate all parameters, there still have no effective methods to monitor the linear profile. For the multivariate multiple linear profile monitoring in the case that the number of observations is not sufficient, this article would propose a control chart based on the partial least squares. We would also use the proposed control chart to perform linear profile monitoring, and then perform the statistical simulation to assess the efficiency. Finally, we would use an example to illustrate how to monitor the linear profile using the proposed control chart.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Liu, Kuo-Chuan y 劉國傳. "A general results for variable selection in multivariate linear regression models". Thesis, 1994. http://ndltd.ncl.edu.tw/handle/91225231234956085304.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Tsai, Chung-Ting y 蔡忠廷. "Analysis of Variance and Hypothesis Testing for Multivariate Local Linear Regression Models". Thesis, 2017. http://ndltd.ncl.edu.tw/handle/jd639e.

Texto completo
Resumen
碩士
國立清華大學
統計學研究所
105
In linear models, it is common to test the difference between two nested models by measuring the difference of their error sums of squares and performing an F-test. Huang and Chen (2008) [7] have extended the structure of this F-test to local polynomial regression (LPR) models (see Fan and Gijbels, 1996 [3]), constructed local and global ANOVA decompositions for LPR models, and defined an F-statistic to test whether a model function fitted by LPR is significant. This thesis extends this F-test to multivariate local linear regression (MLLR) models (see Ruppert and Wand, 1994 [17]) by mimicking a similar framework proposed by Huang and Chen (2008) [7]. We establish local and global ANOVA decompositions for MLLR models, and define two F-statistics corresponding to the following two hypotheses: (i) whether a model function fitted by MLLR is significant, and (ii) whether a model function fitted by MLLR with covariates X_2,..., X_d is more appropriate than a model function fitted by MLLR with covariates X_1,..., X_d. In the bivariate case (d = 2), the type I error and power for these two F-tests are investigated by simulations under different settings of sample sizes, correlations of covariates, values of bandwidth, and signals of rejection, while practical issues of implementing these two F-tests are also discussed, including normalization for the product kernel function. At last, these two F-tests are applied to the analysis of Boston house-price data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Cheng, Ho-Ming y 鄭賀名. "Developing Multivariate Linear Regression Models to Predict the Electrochemical Performance of Lithium Ion Batteries Based on Material Property Parameters". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/bh5524.

Texto completo
Resumen
博士
國立臺灣科技大學
材料科學與工程系
106
Predicting the electrochemical performance of active materials before their assembly in lithium ion batteries would be a path to cutting costs and time for assembling coin cells and running charging and discharging tests. Therefore, it is valuable to establish a statistical model to precisely predict the electrochemical performance of active materials in lithium ion batteries before cell assembly. In this study, we employed 11 different LiFePO4 powders prepared by manufacturers as the cathode active material and measured its properties, and then prepared cathode electrodes and ran electrochemical experiments. The acquired material property parameters and the electrochemical scores were correlated using multivariate linear regression models. We first used XRD, FTIR, and EA techniques to measure the crystal structure, vibration of PO43- functional group, and the carbon content, respectively. Next we made the cathode electrodes using these 11 LiFePO4 products and assembled them into coin cells, we then ran capacity tests at various current rates and cycleability tests at a 2 C current rate for 1,000 cycles. Estimates of the regression coefficients in the regression models were calculated by the least squares method, and thus the regression models were established. We expect to popularize this powerful material science statistical predictive strategy, to allow future researchers to predict performance of products in a cost-effective and timely manner. In the second analysis of this study, a regression model for predicting polarization potential in CV measurements of LiFePO4/C cathodes was developed based on several material property parameters. In order to assess that whether the predicted values are about the same with the observed data with a 95 % level of confidence, a paired t-test was employed to compare the means of the 2 populations. Moreover, an F-test was applied to examine the ratio of the variance of the two datasets for confirming that the variables of the 2 populations have about the same expectation of the squared deviations from their means. Sample size calculation technique was adopted for evaluating the required minimum sample amount for achieving the predetermined power and significance level for our hypothesis tests in the third analysis. The 4th topic is to use the constrained optimization method to figure out the lowest anticipated overpotential in the CV tests and the corresponding material property parameters according to the fitted equation we established in part 3. In the 5th subject, cycle life tendencies of cathode materials were simulated based on the time series analysis. Time series analysis is beneficial for battery management system to monitor the battery health precisely, and thus would be helpful for improving the safety of LIB. Grey model was furthur employed to construct the prediction equation for assessing the degradation of the cell capacity during long-term cycling. Moreover, the idea based on information entropy also helps us develope the combination model for forecasting, and thus the precision of the resultant model can be improved even more. In the 6th work, principal component analysis was performed on the variables obtained from XRD measurements of all the samples. The original data would be transformed into uncorrelated principal components, and the unimportant vectors can thus be eliminated. Therefore, the remaining principal components can explain the variance of the original data as much as possible. The 7th topic is to perform the factor analysis on a few predictor variables we selected, as a result we can extract the unobserved latent variables which might exist. In the 8th analysis, we made use of the structure equation modelling to visually present the statistical correlations among the observed variables and the latent factors intuitively. This path analysis skill clearly manifested the relative importance of various variables in the regression model. The partial least squares regression method was performed on our experimental results in the 9th subject, consequently we are able to construct the fitted equation when the sample amount we collected is less than the variables we would like to investigate. While the variables were standardized so the regression coefficients can be compared directly, PLS regression model transforms the predictor variables into principal components as well so the variance of the data can be thoroughly accounted for as possible. Within this study, we have utilized 9 statistical analyses to resolve the correlations among the measured experimental data. A number of topics researched in this work include: development of regression equations for forecasting electrochemical performances according to material properties, accuracy verifications of the regression models we established, trend anticipation of the cycle lives of the studied cathode samples, the best battery capability and the associated variable values indicated by the prediction function, and the solutions of the principal components which are able to express the data dispersion as much as possible, etc. We expect these mathematical tools can lead the scientific community to perfect their achievements in the future.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Chen, Ruidi. "Distributionally Robust Learning under the Wasserstein Metric". Thesis, 2019. https://hdl.handle.net/2144/38236.

Texto completo
Resumen
This dissertation develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data using Distributionally Robust Optimization (DRO) under the Wasserstein metric. The learning problems that are studied include: (i) Distributionally Robust Linear Regression (DRLR), which estimates a robustified linear regression plane by minimizing the worst-case expected absolute loss over a probabilistic ambiguity set characterized by the Wasserstein metric; (ii) Groupwise Wasserstein Grouped LASSO (GWGL), which aims at inducing sparsity at a group level when there exists a predefined grouping structure for the predictors, through defining a specially structured Wasserstein metric for DRO; (iii) Optimal decision making using DRLR informed K-Nearest Neighbors (K-NN) estimation, which selects among a set of actions the optimal one through predicting the outcome under each action using K-NN with a distance metric weighted by the DRLR solution; and (iv) Distributionally Robust Multivariate Learning, which solves a DRO problem with a multi-dimensional response/label vector, as in Multivariate Linear Regression (MLR) and Multiclass Logistic Regression (MLG), generalizing the univariate response model addressed in DRLR. A tractable DRO relaxation for each problem is being derived, establishing a connection between robustness and regularization, and obtaining upper bounds on the prediction and estimation errors of the solution. The accuracy and robustness of the estimator is verified through a series of synthetic and real data experiments. The experiments with real data are all associated with various health informatics applications, an application area which motivated the work in this dissertation. In addition to estimation (regression and classification), this dissertation also considers outlier detection applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Burombo, Emmanuel Chamunorwa. "Statistical modelling of return on capital employed of individual units". Diss., 2014. http://hdl.handle.net/10500/19627.

Texto completo
Resumen
Return on Capital Employed (ROCE) is a popular financial instrument and communication tool for the appraisal of companies. Often, companies management and other practitioners use untested rules and behavioural approach when investigating the key determinants of ROCE, instead of the scientific statistical paradigm. The aim of this dissertation was to identify and quantify key determinants of ROCE of individual companies listed on the Johannesburg Stock Exchange (JSE), by comparing classical multiple linear regression, principal components regression, generalized least squares regression, and robust maximum likelihood regression approaches in order to improve companies decision making. Performance indicators used to arrive at the best approach were coefficient of determination ( ), adjusted ( , and Mean Square Residual (MSE). Since the ROCE variable had positive and negative values two separate analyses were done. The classical multiple linear regression models were constructed using stepwise directed search for dependent variable log ROCE for the two data sets. Assumptions were satisfied and problem of multicollinearity was addressed. For the positive ROCE data set, the classical multiple linear regression model had a of 0.928, an of 0.927, a MSE of 0.013, and the lead key determinant was Return on Equity (ROE),with positive elasticity, followed by Debt to Equity (D/E) and Capital Employed (CE), both with negative elasticities. The model showed good validation performance. For the negative ROCE data set, the classical multiple linear regression model had a of 0.666, an of 0.652, a MSE of 0.149, and the lead key determinant was Assets per Capital Employed (APCE) with positive effect, followed by Return on Assets (ROA) and Market Capitalization (MC), both with negative effects. The model showed poor validation performance. The results indicated more and less precision than those found by previous studies. This suggested that the key determinants are also important sources of variability in ROCE of individual companies that management need to work with. To handle the problem of multicollinearity in the data, principal components were selected using Kaiser-Guttman criterion. The principal components regression model was constructed using dependent variable log ROCE for the two data sets. Assumptions were satisfied. For the positive ROCE data set, the principal components regression model had a of 0.929, an of 0.929, a MSE of 0.069, and the lead key determinant was PC4 (log ROA, log ROE, log Operating Profit Margin (OPM)) and followed by PC2 (log Earnings Yield (EY), log Price to Earnings (P/E)), both with positive effects. The model resulted in a satisfactory validation performance. For the negative ROCE data set, the principal components regression model had a of 0.544, an of 0.532, a MSE of 0.167, and the lead key determinant was PC3 (ROA, EY, APCE) and followed by PC1 (MC, CE), both with negative effects. The model indicated an accurate validation performance. The results showed that the use of principal components as independent variables did not improve classical multiple linear regression model prediction in our data. This implied that the key determinants are less important sources of variability in ROCE of individual companies that management need to work with. Generalized least square regression was used to assess heteroscedasticity and dependences in the data. It was constructed using stepwise directed search for dependent variable ROCE for the two data sets. For the positive ROCE data set, the weighted generalized least squares regression model had a of 0.920, an of 0.919, a MSE of 0.044, and the lead key determinant was ROE with positive effect, followed by D/E with negative effect, Dividend Yield (DY) with positive effect and lastly CE with negative effect. The model indicated an accurate validation performance. For the negative ROCE data set, the weighted generalized least squares regression model had a of 0.559, an of 0.548, a MSE of 57.125, and the lead key determinant was APCE and followed by ROA, both with positive effects.The model showed a weak validation performance. The results suggested that the key determinants are less important sources of variability in ROCE of individual companies that management need to work with. Robust maximum likelihood regression was employed to handle the problem of contamination in the data. It was constructed using stepwise directed search for dependent variable ROCE for the two data sets. For the positive ROCE data set, the robust maximum likelihood regression model had a of 0.998, an of 0.997, a MSE of 6.739, and the lead key determinant was ROE with positive effect, followed by DY and lastly D/E, both with negative effects. The model showed a strong validation performance. For the negative ROCE data set, the robust maximum likelihood regression model had a of 0.990, an of 0.984, a MSE of 98.883, and the lead key determinant was APCE with positive effect and followed by ROA with negative effect. The model also showed a strong validation performance. The results reflected that the key determinants are major sources of variability in ROCE of individual companies that management need to work with. Overall, the findings showed that the use of robust maximum likelihood regression provided more precise results compared to those obtained using the three competing approaches, because it is more consistent, sufficient and efficient; has a higher breakdown point and no conditions. Companies management can establish and control proper marketing strategies using the key determinants, and results of these strategies can see an improvement in ROCE.
Mathematical Sciences
M. Sc. (Statistics)
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Sandström, Sara. "Modellering av volym samt max- och medeldjup i svenska sjöar : en statistisk analys med hjälp av geografiska informationssystem". Thesis, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325822.

Texto completo
Resumen
Lake volume and lake depth are important variables that defines a lake and its ecosystem. Sweden has around 100 000 lakes, but only around 8000 lakes has measured data for volume, max- and mean-depth. To collect data for the rest of the lakes is presently too time consuming and expensive, therefore a predictive method is needed. Previous studies by Sobek et al. (2011) have found a model predicting lake volume from map-derived parameters with high degrees of explanation for mean volume of 15 lakes or more. However, the predictions for one individual lake, as well as max- and mean-depth, were not accurate enough. The purpose with this study was to derive better models based on new map material with higher resolution. Variables used was derived using GIS-based calculations and then analyzed with multivariate statistical analysis with PCA, PLS-regression and multiple linear regression. A model predicting lake volume for one individual lake with better accuracy than previous studies was found. The variables best explaining the variations in lake volume was lake area and the median slope of an individual zone around each lake (R2=0.87, p<0.00001). Also, the model predicting max-depth from lake area, median slope of an individual zone around each lake and height differences in the closest area surrounding each lake, had higher degrees of explanation than in previous studies (R2=0.42). The mean-depth had no significant correlation with map-derived parameters, but showed strong correlation with max-depth. Reference Sobek, S., Nisell, J. & Fölster J. (2011). Predicting the volume and depths of lakes from map-derived parameters. Inland Waters, vol. 1, ss. 177-184.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Reis, Marco Paulo Seabra. "Monitorização, modelação e melhoria de processos químicos : abordagem multiescala baseada em dados". Doctoral thesis, 2006. http://hdl.handle.net/10316/7375.

Texto completo
Resumen
Tese de doutoramento em Engenharia Química (Processos Químicos) apresentada à Faculdade de Ciências e Tecnologia da Univ. de Coimbra
Processes going on in modern chemical processing plants are typically very complex, and this complexity is also present in collected data, which contain the cumulative effect of many underlying phenomena and disturbances, presenting different patterns in the time/frequency domain. Such characteristics motivate the development and application of data-driven multiscale approaches to process analysis, with the ability of selectively analyzing the information contained at different scales, but, even in these cases, there is a number of additional complicating features that can make the analysis not being completely successful. Missing and multirate data structures are two representatives of the difficulties that can be found, to which we can add multiresolution data structures, among others. On the other hand, some additional requisites should be considered when performing such an analysis, in particular the incorporation of all available knowledge about data, namely data uncertainty information. In this context, this thesis addresses the problem of developing frameworks that are able to perform the required multiscale decomposition analysis while coping with the complex features present in industrial data and, simultaneously, considering measurement uncertainty information. These frameworks are proven to be useful in conducting data analysis in these circumstances, representing conveniently data and the associated uncertainties at the different relevant resolution levels, being also instrumental for selecting the proper scales for conducting data analysis. In line with efforts described in the last paragraph and to further explore the information processed by such frameworks, the integration of uncertainty information on common single-scale data analysis tasks is also addressed. We propose developments in this regard in the fields of multivariate linear regression, multivariate statistical process control and process optimization. The second part of this thesis is oriented towards the development of intrinsically multiscale approaches, where two such methodologies are presented in the field of process monitoring, the first aiming to detect changes in the multiscale characteristics of profiles, while the second is focused on analysing patterns evolving in the time domain.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Ouellette, Marie-Hélène. "L’arbre de régression multivariable et les modèles linéaires généralisés revisités : applications à l’étude de la diversité bêta et à l’estimation de la biomasse d’arbres tropicaux". Thèse, 2011. http://hdl.handle.net/1866/5906.

Texto completo
Resumen
En écologie, dans le cadre par exemple d’études des services fournis par les écosystèmes, les modélisations descriptive, explicative et prédictive ont toutes trois leur place distincte. Certaines situations bien précises requièrent soit l’un soit l’autre de ces types de modélisation ; le bon choix s’impose afin de pouvoir faire du modèle un usage conforme aux objectifs de l’étude. Dans le cadre de ce travail, nous explorons dans un premier temps le pouvoir explicatif de l’arbre de régression multivariable (ARM). Cette méthode de modélisation est basée sur un algorithme récursif de bipartition et une méthode de rééchantillonage permettant l’élagage du modèle final, qui est un arbre, afin d’obtenir le modèle produisant les meilleures prédictions. Cette analyse asymétrique à deux tableaux permet l’obtention de groupes homogènes d’objets du tableau réponse, les divisions entre les groupes correspondant à des points de coupure des variables du tableau explicatif marquant les changements les plus abrupts de la réponse. Nous démontrons qu’afin de calculer le pouvoir explicatif de l’ARM, on doit définir un coefficient de détermination ajusté dans lequel les degrés de liberté du modèle sont estimés à l’aide d’un algorithme. Cette estimation du coefficient de détermination de la population est pratiquement non biaisée. Puisque l’ARM sous-tend des prémisses de discontinuité alors que l’analyse canonique de redondance (ACR) modélise des gradients linéaires continus, la comparaison de leur pouvoir explicatif respectif permet entre autres de distinguer quel type de patron la réponse suit en fonction des variables explicatives. La comparaison du pouvoir explicatif entre l’ACR et l’ARM a été motivée par l’utilisation extensive de l’ACR afin d’étudier la diversité bêta. Toujours dans une optique explicative, nous définissons une nouvelle procédure appelée l’arbre de régression multivariable en cascade (ARMC) qui permet de construire un modèle tout en imposant un ordre hiérarchique aux hypothèses à l’étude. Cette nouvelle procédure permet d’entreprendre l’étude de l’effet hiérarchisé de deux jeux de variables explicatives, principal et subordonné, puis de calculer leur pouvoir explicatif. L’interprétation du modèle final se fait comme dans une MANOVA hiérarchique. On peut trouver dans les résultats de cette analyse des informations supplémentaires quant aux liens qui existent entre la réponse et les variables explicatives, par exemple des interactions entres les deux jeux explicatifs qui n’étaient pas mises en évidence par l’analyse ARM usuelle. D’autre part, on étudie le pouvoir prédictif des modèles linéaires généralisés en modélisant la biomasse de différentes espèces d’arbre tropicaux en fonction de certaines de leurs mesures allométriques. Plus particulièrement, nous examinons la capacité des structures d’erreur gaussienne et gamma à fournir les prédictions les plus précises. Nous montrons que pour une espèce en particulier, le pouvoir prédictif d’un modèle faisant usage de la structure d’erreur gamma est supérieur. Cette étude s’insère dans un cadre pratique et se veut un exemple pour les gestionnaires voulant estimer précisément la capture du carbone par des plantations d’arbres tropicaux. Nos conclusions pourraient faire partie intégrante d’un programme de réduction des émissions de carbone par les changements d’utilisation des terres.
In ecology, in ecosystem services studies for example, descriptive, explanatory and predictive modelling all have relevance in different situations. Precise circumstances may require one or the other type of modelling; it is important to choose the method properly to insure that the final model fits the study’s goal. In this thesis, we first explore the explanatory power of the multivariate regression tree (MRT). This modelling technique is based on a recursive bipartitionning algorithm. The tree is fully grown by successive bipartitions and then it is pruned by resampling in order to reveal the tree providing the best predictions. This asymmetric analysis of two tables produces homogeneous groups in terms of the response that are constrained by splitting levels in the values of some of the most important explanatory variables. We show that to calculate the explanatory power of an MRT, an appropriate adjusted coefficient of determination must include an estimation of the degrees of freedom of the MRT model through an algorithm. This estimation of the population coefficient of determination is practically unbiased. Since MRT is based upon discontinuity premises whereas canonical redundancy analysis (RDA) models continuous linear gradients, the comparison of their explanatory powers enables one to distinguish between those two patterns of species distributions along the explanatory variables. The extensive use of RDA for the study of beta diversity motivated the comparison between its explanatory power and that of MRT. In an explanatory perspective again, we define a new procedure called a cascade of multivariate regression trees (CMRT). This procedure provides the possibility of computing an MRT model where an order is imposed to nested explanatory hypotheses. CMRT provides a framework to study the exclusive effect of a main and a subordinate set of explanatory variables by calculating their explanatory powers. The interpretation of the final model is done as in nested MANOVA. New information may arise from this analysis about the relationship between the response and the explanatory variables, for example interaction effects between the two explanatory data sets that were not evidenced by the usual MRT model. On the other hand, we study the predictive power of generalized linear models (GLM) to predict individual tropical tree biomass as a function of allometric shape variables. Particularly, we examine the capacity of gaussian and gamma error structures to provide the most precise predictions. We show that for a particular species, gamma error structure is superior in terms of predictive power. This study is part of a practical framework; it is meant to be used as a tool for managers who need to precisely estimate the amount of carbon recaptured by tropical tree plantations. Our conclusions could be integrated within a program of carbon emission reduction by land use changes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía