Dissertations / Theses: 'Applied Statistics'

1

Binter, Roman. "Applied probabilistic forecasting." Thesis, London School of Economics and Political Science (University of London), 2012. http://etheses.lse.ac.uk/559/.

Full text

Abstract:

In any actual forecast, the future evolution of the system is uncertain and the forecasting model is mathematically imperfect. Both, ontic uncertainties in the future (due to true stochasticity) and epistemic uncertainty of the model (reﬂecting structural imperfections) complicate the construction and evaluation of probabilistic forecast. In almost all nonlinear forecast models, the evolution of uncertainty in time is not tractable analytically and Monte Carlo approaches (”ensemble forecasting”) are widely used. This thesis advances our understanding of the construction of forecast densities from ensembles, the evolution of the resulting probability forecasts and methods of establishing skill (benchmarks). A novel method of partially correcting the model error is introduced and shown to outperform a competitive approach. The properties of Kernel dressing, a method of transforming ensembles into probability density functions, are investigated and the convergence of the approach is illustrated. A connection between forecasting and Information theory is examined by demonstrating that Kernel dressing via minimization of Ignorance implicitly leads to minimization of Kulback-Leibler divergence. The Ignorance score is critically examined in the context of other Information theory measures. The method of Dynamic Climatology is introduced as a new approach to establishing skill (benchmarking). Dynamic Climatology is a new, relatively simple, nearest neighbor based model shown to be of value in benchmarking of global circulation models of the ENSEMBLES project. ENSEMBLES is a project funded by the European Union bringing together all major European weather forecasting institutions in order to develop and test state-of-the-art seasonal weather forecasting models. Via benchmarking the seasonal forecasts of the ENSEMBLES models we demonstrate that Dynamic Climatology can help us better understand the value and forecasting performance of large scale circulation models. Lastly, a new approach to correcting (improving) imperfect model is presented, an idea inspired by [63]. The main idea is based on a two-stage procedure where a second stage ‘corrective’ model iteratively corrects systematic parts of forecasting errors produced by a ﬁrst stage ‘core’ model. The corrector is of an iterative nature so that at a given time t the core model forecast is corrected and then used as an input into the next iteration of the core model to generate a time t + 1 forecast. Using two nonlinear systems we demonstrate that the iterative corrector is superior to alternative approaches based on direct (non-iterative) forecasts. While the choice of the corrector model class is ﬂexible, we use radial basis functions. Radial basis functions are frequently used in statistical learning and/or surface approximations and involve a number of computational aspects which we discuss in some detail.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Bo. "Machine Learning on Statistical Manifold." Scholarship @ Claremont, 2017. http://scholarship.claremont.edu/hmc_theses/110.

Full text

Abstract:

This senior thesis project explores and generalizes some fundamental machine learning algorithms from the Euclidean space to the statistical manifold, an abstract space in which each point is a probability distribution. In this thesis, we adapt the optimal separating hyperplane, the k-means clustering method, and the hierarchical clustering method for classifying and clustering probability distributions. In these modifications, we use the statistical distances as a measure of the dissimilarity between objects. We describe a situation where the clustering of probability distributions is needed and useful. We present many interesting and promising empirical clustering results, which demonstrate the statistical-distance-based clustering algorithms often outperform the same algorithms with the Euclidean distance in many complex scenarios. In particular, we apply our statistical-distance-based hierarchical and k-means clustering algorithms to the univariate normal distributions with k = 2 and k = 3 clusters, the bivariate normal distributions with diagonal covariance matrix and k = 3 clusters, and the discrete Poisson distributions with k = 3 clusters. Finally, we prove the k-means clustering algorithm applied on the discrete distributions with the Hellinger distance converges not only to the partial optimal solution but also to the local minimum.

APA, Harvard, Vancouver, ISO, and other styles

3

Bynum, Lucius. "Modeling Subset Behavior: Prescriptive Analytics for Professional Basketball Data." Scholarship @ Claremont, 2018. https://scholarship.claremont.edu/hmc_theses/117.

Full text

Abstract:

Sports analytics problems have become increasingly prominent in the past decade. Modern image processing capabilities allow coaching staff to easily capture detailed game-time statistics on their players, opponents, team configurations, and plays. The challenge is to turn that data into meaningful insights for team managers and coaches. This project uses descriptive and predictive techniques on publicly available NBA basketball data to identify powerful combinations of players and predict how they will perform against other teams.

APA, Harvard, Vancouver, ISO, and other styles

4

Dodson, Huey D. "Applied statistics experience & certification in quality assurance /." Click here to view, 2010. http://digitalcommons.calpoly.edu/statsp/3/.

Full text

Abstract:

Thesis (B.S.)--California Polytechnic State University, 2010.
Project advisor: Heather Smith. Title from PDF title page; viewed on Apr. 20, 2010. Includes bibliographical references. Also available on microfiche.

APA, Harvard, Vancouver, ISO, and other styles

5

Lochner, Michelle Aileen Anne. "New applications of statistics in astronomy and cosmology." Doctoral thesis, University of Cape Town, 2014. http://hdl.handle.net/11427/12864.

Full text

Abstract:

Includes bibliographical references.
Over the last few decades, astronomy and cosmology have become data-driven fields. The parallel increase in computational power has naturally lead to the adoption of more sophisticated statistical techniques for data analysis in these fields, and in particular, Bayesian methods. As the next generation of instruments comes online, this trend should be continued since previously ignored effects must be considered rigorously in order to avoid biases and incorrect scientific conclusions being drawn from the ever-improving data. In the context of supernova cosmology, an example of this is the challenge from contamination as supernova datasets will become too large to spectroscopically confirm the types of all objects. The technique known as BEAMS (Bayesian Estimation Applied to Multiple Species) handles this contamination with a fully Bayesian mixture model approach, which allows unbiased estimates of the cosmological parameters. Here, we extend the original BEAMS formalism to deal with correlated systematics in supernovae data, which we test extensively on thousands of simulated datasets using numerical marginalization and Markov Chain Monte Carlo (MCMC) sampling over the unknown type of the supernova, showing that it recovers unbiased cosmological parameters with good coverage. We then apply Bayesian statistics to the field of radio interferometry. This is particularly relevant in light of the SKA telescope, where the data will be of such high quantity and quality that current techniques will not be adequate to fully exploit it. We show that the current approach to deconvolution of radio interferometric data is susceptible to biases induced by ignored and unknown instrumental effects such as pointing errors, which in general are correlated with the science parameters. We develop an alternative approach - Bayesian Inference for Radio Observations (BIRO) - which is able to determine the joint posterior for all scientific and instrumental parameters. We test BIRO on several simulated datasets and show that it is superior to the standard CLEAN and source extraction algorithms. BIRO fits all parameters simultaneously while providing unbiased estimates - and errors - for the noise, beam width, pointing errors and the fluxes and shapes of the sources.

APA, Harvard, Vancouver, ISO, and other styles

6

Tiani, John P. "Using applied statistics to study a pharmaceutical manufacturing process." Worcester, Mass. : Worcester Polytechnic Institute, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0430104-125344/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Fitzgerald, Damon. "Household Preferences for Financing Hurricane Risk Mitigation: A Survey Based Empirical Analysis." FIU Digital Commons, 2014. http://digitalcommons.fiu.edu/etd/1725.

Full text

Abstract:

After a series of major storms over the last 20 years, the state of financing for U.S. natural disaster insurance has undergone substantial disruptions causing many federal and state backed programs against residential property damage to become severally underfunded. In order to regain actuarial soundness, policy makers have proposed a shift to a system that reflects risk-based pricing for property insurance. We examine survey responses from 1394 single-family homeowners in the state of Florida for support of several natural disaster mitigation policy reforms. Utilizing a partial proportional odds model we test for effects of location, risk perception, socio-economic and housing characteristics on support for policy reforms. Our findings suggest residents across the state, not just risk-prone homeowners, support the current subsidized model. We also examine several other policy questions from the survey to verify our initial results. Finally, the implications of our findings are discussed to provide inputs to policymakers.

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, Xiang. "A Multi-Indexed Logistic Model for Time Series." Digital Commons @ East Tennessee State University, 2016. https://dc.etsu.edu/etd/3140.

Full text

Abstract:

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare their performance using forecast accuracy and an area under the curve score via simulated sine waves with various intensities of Gaussian noise and Standard & Poors 500 historical data. Overall, that MILR outperforms SLR is validated on both realistic and simulated data. Finally, some possible future directions of research are discussed.

APA, Harvard, Vancouver, ISO, and other styles

9

Brännström, Anton. "A Comparison of Three Methods of Estimation Applied to Contaminated Circular Data." Thesis, Umeå universitet, Statistik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-149426.

Full text

Abstract:

This study compares the performance of the Maximum Likelihood estimator (MLE), estimators based on spacings called Generalized Maximum Spacing estimators (GSEs), and the One Step Minimum Hellinger Distance estimator (OSMHD), on data originating from a circular distribution. The purpose of the study is to investigate the different estimators’ performance on directional data. More specifically, we compare the estimators’ ability to estimate parameters of the von Mises distribution, which is determined by a location parameter and a scale parameter. For this study, we only look at the scenario in which one of the parameters is unknown. The main part of the study is concerned with estimating the parameters under the condition, in which the data contain outliers, but a small part is also dedicated to estimation at the true model. When estimating the location parameter under contaminated conditions, the results indicate that some versions of the GSEs tend to outperform the other estimators. It should be noted that these seemingly more robust estimators appear comparatively less optimal at the true model, but this is a tradeoff that must be made on a case by case basis. Under the same contaminated conditions, all included estimators appear to have seemingly greater difficulties estimating the scale parameter. However, for this case, some of the GSEs are able to handle the contamination a bit better than the rest. In addition, there might exist other versions of GSEs, not included in this study, which perform better.

APA, Harvard, Vancouver, ISO, and other styles

10

Brody-Moore, Peter. "Bayesian Hierarchical Meta-Analysis of Asymptomatic Ebola Seroprevalence." Scholarship @ Claremont, 2019. https://scholarship.claremont.edu/cmc_theses/2228.

Full text

Abstract:

The continued study of asymptomatic Ebolavirus infection is necessary to develop a more complete understanding of Ebola transmission dynamics. This paper conducts a meta-analysis of eight studies that measure seroprevalence (the number of subjects that test positive for anti-Ebolavirus antibodies in their blood) in subjects with household exposure or known case-contact with Ebola, but that have shown no symptoms. In our two random effects Bayesian hierarchical models, we find estimated seroprevalences of 8.76% and 9.72%, significantly higher than the 3.3% found by a previous meta-analysis of these eight studies. We also produce a variation of this meta-analysis where we exclude two of the eight studies. In this model, we find an estimated seroprevalence of 4.4%, much lower than our first two Bayesian hierarchical models. We believe a random effects model more accurately reflects the heterogeneity between studies and thus asymptomatic Ebola is more seroprevalent than previously believed among subjects with household exposure or known case-contact. However, a strong conclusion cannot be reached on the seriousness of asymptomatic Ebola without an international testing standard and more data collection using this adopted standard.

APA, Harvard, Vancouver, ISO, and other styles

11

Lesser, Elizabeth Rochelle. "A New Right Tailed Test of the Ratio of Variances." UNF Digital Commons, 2016. http://digitalcommons.unf.edu/etd/719.

Full text

Abstract:

It is important to be able to compare variances efficiently and accurately regardless of the parent populations. This study proposes a new right tailed test for the ratio of two variances using the Edgeworth’s expansion. To study the Type I error rate and Power performance, simulation was performed on the new test with various combinations of symmetric and skewed distributions. It is found to have more controlled Type I error rates than the existing tests. Additionally, it also has sufficient power. Therefore, the newly derived test provides a good robust alternative to the already existing methods.

APA, Harvard, Vancouver, ISO, and other styles

12

Andersson, Carl. "Deep learning applied to system identification : A probabilistic approach." Licentiate thesis, Uppsala universitet, Avdelningen för systemteknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-397563.

Full text

Abstract:

Machine learning has been applied to sequential data for a long time in the field of system identification. As deep learning grew under the late 00's machine learning was again applied to sequential data but from a new angle, not utilizing much of the knowledge from system identification. Likewise, the field of system identification has yet to adopt many of the recent advancements in deep learning. This thesis is a response to that. It introduces the field of deep learning in a probabilistic machine learning setting for problems known from system identification. Our goal for sequential modeling within the scope of this thesis is to obtain a model with good predictive and/or generative capabilities. The motivation behind this is that such a model can then be used in other areas, such as control or reinforcement learning. The model could also be used as a stepping stone for machine learning problems or for pure recreational purposes. Paper I and Paper II focus on how to apply deep learning to common system identification problems. Paper I introduces a novel way of regularizing the impulse response estimator for a system. In contrast to previous methods using Gaussian processes for this regularization we propose to parameterize the regularization with a neural network and train this using a large dataset. Paper II introduces deep learning and many of its core concepts for a system identification audience. In the paper we also evaluate several contemporary deep learning models on standard system identification benchmarks. Paper III is the odd fish in the collection in that it focuses on the mathematical formulation and evaluation of calibration in classification especially for deep neural network. The paper proposes a new formalized notation for calibration and some novel ideas for evaluation of calibration. It also provides some experimental results on calibration evaluation.

APA, Harvard, Vancouver, ISO, and other styles

13

Theisen, Benjamin. "Predicting Turnover Cognition in Applied Behavior Analysis Supervisors." Thesis, The Chicago School of Professional Psychology, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10934198.

Full text

Abstract:

The study looked for predictors of turnover cognition among program supervisors at various applied behavior analysis organizations. A hierarchical regression model (n = 248) tested whether burnout moderated effects of training hours on turnover cognition, and whether burnout moderated effects of training procedures on turnover cognition. The best model (R² = .719, delta F (2, 239) = 3.22, p = .042) did not detect burnout. Results were interpreted using the Conservation of Resources theory. Recommendations for researchers and organizations planning supervisor retention programs were provided.

APA, Harvard, Vancouver, ISO, and other styles

14

Melbourne, Davayne A. "A New method for Testing Normality based upon a Characterization of the Normal Distribution." FIU Digital Commons, 2014. http://digitalcommons.fiu.edu/etd/1248.

Full text

Abstract:

The purposes of the thesis were to review some of the existing methods for testing normality and to investigate the use of generated data combined with observed to test for normality. The approach to testing for normality is in contrast to the existing methods which are derived from observed data only. The test of normality proposed follows a characterization theorem by Bernstein (1941) and uses a test statistic D*, which is the average of the Hoeffding’s D-Statistic between linear combinations of the observed and generated data to test for normality. Overall, the proposed method showed considerable potential and achieved adequate power for many of the alternative distributions investigated. The simulation results revealed that the power of the test was comparable to some of the most commonly used methods of testing for normality. The test is performed with the use of a computer-based statistical package and in general takes a longer time to run than some of the existing methods of testing for normality.

APA, Harvard, Vancouver, ISO, and other styles

15

Saket, Munther Musa. "Cost-significance applied to estimating and control of construction projects." Thesis, University of Dundee, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.276578.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Rosa, Joao Miguel Feu. "Mathematical programming applied to diet problems in a Brazilian region." Thesis, Lancaster University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.332375.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chun, So Yeon. "Hybrid is good: stochastic optimization and applied statistics for or." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44717.

Full text

Abstract:

In the first part of this thesis, we study revenue management in resource exchange alliances. We first show that without an alliance the sellers will tend to price their products too high and sell too little, thereby foregoing potential profit, especially when capacity is large. This provides an economic motivation for interest in alliances, because the hope may be that some of the foregone profit may be captured under an alliance. We then consider a resource exchange alliance, including the effect of the alliance on competition among alliance members. We show that the foregone profit may indeed be captured under such an alliance. The problem of determining the optimal amounts of resources to exchange is formulated as a stochastic mathematical program with equilibrium constraints. We demonstrate how to determine whether there exists a unique equilibrium after resource exchange, how to compute the equilibrium, and how to compute the optimal resource exchange. In the second part of this thesis, we study the estimation of risk measures in risk management. In the financial industry, sell-side analysts periodically publish recommendations of underlying securities with target prices. However, this type of analysis does not provide risk measures associated with underlying companies. In this study, we discuss linear regression approaches to the estimation of law invariant conditional risk measures. Two estimation procedures are considered and compared; one is based on residual analysis of the standard least squares method and the other is in the spirit of the M-estimation approach used in robust statistics. In particular, Value-at-Risk and Average Value-at-Risk measures are discussed in detail. Large sample statistical inference of the estimators is derived. Furthermore, finite sample properties of the proposed estimators are investigated and compared with theoretical derivations in an extensive Monte Carlo study. Empirical results on the real data (different financial asset classes) are also provided to illustrate the performance of the estimators.

APA, Harvard, Vancouver, ISO, and other styles

18

Risk, James Kenneth. "Three Applications of Gaussian Process Modeling in Evaluation of Longevity Risk Management." Thesis, University of California, Santa Barbara, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10620897.

Full text

Abstract:

Longevity risk, the risk associated with people living too long, is an emerging issue in financial markets. Two major factors related to this are with regards to mortality modeling and pricing of life insurance instruments. We propose use of Gaussian process regression, a technique recently populuarized in machine learning, to aid in both of these problems. In particular, we present three works using Gaussian processes in longevity risk applications. The first is related to pricing, where Gaussian processes can serve as a surrogate for conditional expectation needed for Monte Carlo simulations. Second, we investigate value-at-risk calculations in a related framework, introducing a sequential algorithm allowing Gaussian processes to search for the quantile. Lastly, we use Gaussian processes as a spatial model to model mortality rates and improvement.

APA, Harvard, Vancouver, ISO, and other styles

19

Shafie, H. Khalil. "The geometry of Gaussian rotation space random fields /." Thesis, McGill University, 1998. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=35614.

Full text

Abstract:

In recent years, very detailed images of the brain, produced by modern sensor technologies, have given the neuroscientist the opportunity to study the functional activation of the brain under different conditions. The main statistical problem is to locate the isolated regions of the brain where activation has occurred (the signal), and separate them from the rest of the brain where no activation can be detected (the noise). To do this the images are often spatially smoothed before analysis by convolution with a filter f (t) to enhance the signal to noise ratio, where t is a location vector in N dimensional space. The motivation for this comes from the Matched Filter Theorem of signal processing, which states that signal added to white noise is best detected by smoothing with a filter whose shape matches that of the signal. The problem is that the scale of the signal is usually unknown. It is natural to consider searching over filter scale as well as location, that is, to use a filter s-N/2ft/s with scale s varying over a predetermined interval [ s1,s2 ]. This adds an extra dimension to the search space, called scale space (see Poline and Mazoyer, 1994). Siegmund and Worsley (1995) establish the relation between searching over scale space with the problem of testing for a signal with unknown location and scale and find the approximate P-value of the maximum of the scale-space filtered image using the expected Euler characteristic of the excursion set. In this thesis we study the extension of the scale space result to rotating filters of the form | S|--1/4f (S --1/2t), where S is now an N x N positive definite symmetric matrix that rotates and scales the axes of the filter.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhou, Xiaojie. "Optimal designs for change point problems." Thesis, McGill University, 1997. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=35667.

Full text

Abstract:

This thesis investigates Bayesian optimal design for change-point problems. While there is a large optimal design literature for linear and some non-linear problems, this is the first time optimality has been addressed for change-point problems.
In designing a longitudinal study, the decision as to when to collect data can have a large impact on the quality of the final inferences. If a change may occur in the distribution of one or more variables under study, the timing of observations can greatly influence the chances of detecting any effects.
Two classes of problems are considered. First, optimal design for the mixture of densities is investigated. Here, a finite sequence of random variables is available for observation. Each observation may come from one of two distributions with a given probability, which may differ from observation to observation. Such a problem may also be regarded as an application of the multi-path change point problem. Assume subjects may each undergo a single change at random change points with common before and after change point distributions, and at any instant a known proportion of the ensemble of paths will have changed. In either case, the goal is to select which data points to observe, in order to provide the most accurate estimates of the means of both distributions.
Second, we study optimal designs for more classical change point problems. We consider three cases: (i) when only the means of the before and after change point distributions are of interest, (ii) when only the location of the change point is of interest, and (iii) when both the change point and the means of the before and after change point distribution are of interest.
In addressing these problems, both analytic closed form solutions and modern statistical computing algorithms such as Monte Carlo integration and simulated an nealing are used to find the optimal designs. Examples that concern human growth patterns and changes in CFC-12 concentrations in the atmosphere are used to illustrate the methods.

APA, Harvard, Vancouver, ISO, and other styles

21

Peng, Yuanyuan. "On Singular Values of Random Matrices." Kent State University / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=kent1438253068.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Yin, Kai. "Bayesian Uncertainty Quantification for Differential Equation Models Related to Financial Volatility and Disease Transmission." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1623863667837324.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Elkadry, Alaa. "Statistical Analyses of "Randomly Sourced Data"." Thesis, Oakland University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10262484.

Full text

Abstract:

Warner in 1965 introduced randomized response, and since then many extensions and improvements to the Warner model have been done. In this study a randomized response model applicable to continuous data that considers a mixture of two normal distributions is considered and analyzed. This includes a study of the efficiency, an estimation of some unknown parameters and a discussion of contaminated data issues and an application of this method to the problem of estimating Oakland University student income is presented and discussed. Also, this study includes inference for two or more populations of the same structure as the randomized response model introduced.

The impact of this randomized response model on ranking and selection method is quantified for an indifference-zone procedure and a subset selection procedure. A study on how to choose the best population between k distinct populations using an indifference-zone procedure is presented and some tables for the required sample size needed to have a probability of correct selection higher than some specified value in the preference zone for the randomized response model considered are provided. An application of the subset selection procedure on the considered randomized response model is discussed. The subset selection study is provided for 2 configurations, the slippage configuration and the equi-spaced configuration, and tables are provided for both configurations.

Finally, a discussion on the use of the data obtained from the Bayesian Improved Surname and Geocoding analysis (BISG) tool in hypothesis testing for disparity between different populations. Two approaches are provided on how to use the information arising from the BISG.

APA, Harvard, Vancouver, ISO, and other styles

24

Bahuguna, Manoj. "Analytics of Asymmetry and Transformation to Multivariate Normality Through Copula Functions with Applications in Biomedical Sciences and Finance." Thesis, Oakland University, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10263461.

Full text

Abstract:

In this work, we study and develop certain aspects of the analytics of asymmetry for univariate and multivariate data. Accordingly, the above work consists of three separate parts.

In the first part of our work, we introduce a new approach to measure the univariate and multivariate skewness based on quantiles and the properties of odd and even functions. We illustrate through numerous examples and simulations that in the multivariate case the Mardia’s measure of skewness fails to provide consistent and meaningful interpretations. However, our new measure appears to provide an index which is more reasonable.

In the second part of our work, our emphasis is to moderate or eliminate asymmetry of multivariate data when the interest is in the study of dependence. Copula transformation has been used as an all-purpose transformation to introduce multivariate normality. Using this approach, even though information about marginal distributions is lost, we are still able to study dependence based modeling problems for asymmetric data using the technique developed for multivariate normal data. We illustrate a variety of applications in areas such as multiple regression, principal component, factor analysis, partial least squares and structural equation models. The results are promising in that our approach shows improvement over results obtained when asymmetry is ignored.

The last part of this work is based on the applications of our copula transformation to financial data. Specifically, we consider the problem of estimation of “beta risk” associated with a particular financial asset. Taking S&P500 index as a proxy for market, we suggest three versions of “beta estimates” which are useful in situations when the returns of the assets and market proxy do not have the most ideal probability distribution, namely, bivariate normal or when data may contain some very extreme (high or low) returns. Using the copula based methods, developed earlier in this dissertation, and winsorization, we obtain the estimates which in high skewness scenarios perform better than the traditional least square estimate of market beta.

APA, Harvard, Vancouver, ISO, and other styles

25

Rialland, P. C. R. P. "Three essays in applied microeconomics." Thesis, University of Essex, 2018. http://repository.essex.ac.uk/23688/.

Full text

Abstract:

This thesis focuses on three vulnerable groups in Europe that have recently been highlighted both in media and in the economics literature; and that are policy priorities. Chapter 1 is a joint work with Giovanni Mastrobuoni which focuses on prisoners and peer effects in prison. Studies that estimate criminal peer effects need to define the reference group. Researchers usually use the amount of time inmates overlap in prison, sometimes in combination with nationality to define such groups. Yet, there is often little discussion about such assumptions, which could potentially have important effects on the estimates of peer effects. We show that the date of rearrest of inmates who spend time together in prison signals with some error co-offending, and can thus be used to measure reference groups. Exploiting recidivism data on inmates released after a mass pardon with a simple econometric model which adjusts the estimates for the misclassification errors, we document homophily in peer group formation with regards to age, nationality, and degrees of deterrence. There is no evidence of homophily with respect to education and employment status. Chapter 2 evaluates a policy in the English county of Essex that aims to reduce domestic abuse through informing high-risk suspects that they will be put under higher surveillance, hence increasing their probability of being caught in case of recidivism, and encouraging their victims to report. Using a Regression Discontinuity Design (RDD), it underlines that suspects that are targeted by the policy are more 9% more likely to be reported again for domestic abuse. Although increasing reporting is widely seen as essential to identify and protect victims, this paper shows that policies to increase reporting will deter crime only if they give rise to a legal response. Moreover, results highlight that increasing the reporting of events of that do not lead to criminal charges may create escalation and be more detrimental to the victim in the long run. Chapter 3 investigates how migrants in the United Kingdom respond to natural disasters in their home countries. Combining a household panel survey of migrants in the United Kingdom and natural disasters data, this paper first shows, in the UK context, that male migrants are more likely to remit in the wake of natural disasters. Then, it underlines that to fund remittances male migrants also increase labour supply, decrease monthly savings and leisure. By showing how migrants in the UK adjust their economic behaviours in response to an unexpected shocks i.e. natural disasters, this paper demonstrates both how UK migrants may fund remittances and that they have the capacity to adjust their economic behaviours to increase remittances.

APA, Harvard, Vancouver, ISO, and other styles

26

McIntosh, Alasdair. "Interpretable models of genetic drift applied especially to human populations." Thesis, University of Glasgow, 2018. http://theses.gla.ac.uk/30690/.

Full text

Abstract:

This thesis aims to develop and implement population genetic models that are directly interpretable in terms of events such as population fission and admixture. Two competing methods of approximating the Wright--Fisher model of genetic drift are critically examined, one due to Balding and Nichols and another to Nicholson and colleagues. The model of population structure consisting of all present-day subpopulations arising from a common ancestral population at a single fission event (first described by Nicholson et al.) is reimplemented and applied to single-nucleotide polymorphism data from the HapMap project. This Bayesian hierarchical model is then elaborated to allow general phylogenetic representations of the genetic heritage of present-day subpopulations and the performance of this model is assessed on simulated and HapMap data. The drift model of Balding and Nichols is found to be problematic for use in this context as the need for allele fixation to be modelled becomes apparent. The model is then further developed to allow the inclusion of admixture events. This new model is, again, demonstrated using HapMap data and its performance compared to that of the TreeMix model of Pickrell and Pritchard, which is also critically evaluated.

APA, Harvard, Vancouver, ISO, and other styles

27

Donnelly, James P. "NFL Betting Market: Using Adjusted Statistics to Test Market Efficiency and Build a Betting Model." Scholarship @ Claremont, 2013. http://scholarship.claremont.edu/cmc_theses/721.

Full text

Abstract:

The use of statistical analysis has been prevalent in the sports gambling industry for years. More recently, we have seen the emergence of "adjusted statistics", a more sophisticated way to examine each play and each result (further explanation below). And while adjusted statistics have become commonplace for professional and recreational bettors alike, little research has been done to justify their use. In this paper the effectiveness of this data is tested on the most heavily wagered sport in the world – the National Football League (NFL). The results are studied with two central questions in mind: Does the market account for the information provided by adjusted statistics? And, can this data be interpreted to create a profitable betting strategy? First, the Efficient Market Hypothesis is introduced and tested using these new variables. Then, a betting model is built and tested.

APA, Harvard, Vancouver, ISO, and other styles

28

Wardrop, Daniel M. "Optimality criteria applied to certain response surface designs." Diss., Virginia Polytechnic Institute and State University, 1985. http://hdl.handle.net/10919/49960.

Full text

Abstract:

The estimation of a particular matrix of coefficients of a second-order polynomial model was shown to be important in Response Surface Methodology (RSM). This led naturally to designing RSM experiments for best estimation of these coefficients as a primary goal. A design criterion, D_S-optimality, was applied to several classes of RSM designs to find optimal choices of design parameters. Further, previous results on D-optimal RSM designs were extended. The designs resulting from the use of the two criteria were compared. Two other design criteria were also studied. These were IV, the prediction variance of ŷ integrated over a region R, and IV*, sum of the variances of ∂ŷ/∂α again integrated over R. Three different choices of the region R were used. The object of the study was not only to identify optimal choices of design parameters, but also to compare the resulting designs with those obtained using the determinantal criteria. An extension of a method for constructing D-optimal designs was used to construct D_S-optimal central composite designs. This involved viewing the design points as having continuous weights. D_S-best central composite designs were constructed either analytically or numerically for a fixed axial point distance. The results of previous work by other authors were extended for D-optimality by varying the axial point distance. Other design classes studied were Box-Behnken, equiradial, and some small composite designs. The novel study of IV and the extended IV, called IV*, was done for each of the four design classes mentioned previously. The results of the study were presented graphically, or tabularly. The best designs according to IV and IV* were compared with the D_S-best designs. Composite designs performed well in all criteria, with the central composite designs performing best. The Box-Behnken and equiradial seemed to suffer from a lack of flexibility. The D_S-best designs agreed well with the designs suggested by the IV* criteria.
Ph. D.
incomplete_metadata

APA, Harvard, Vancouver, ISO, and other styles

29

Ay, Belit, and Nabiel Efrem. "Benford’s law applied to sale prices on the Swedish housing market." Thesis, Stockholms universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-194865.

Full text

Abstract:

Benford’s law is based on an observation that certain digits occur more often than others in a set of numbers. This have provided researchers to apply the law in different areas including identifying digit patterns and manipulated data. To our knowledge, this have yet not been tested in the Swedish housing market. The purpose of this thesis is to examine whether the sale price for 171 643 tenant-owned apartments in Stockholm, Gothenburg and Malmö follow Benford’s law. Numerous researchers have used this law for testing various types of data but based solely on the first digit distribution of their data. This study will furthermore test the second digit and the first two digits of our data. The tests used to evaluate our data’s conformity to Benford’s law include Kolmogorov-Smirnov test and Mean absolute deviation (MAD) test. We found that the second digit of sale prices did follow Benford’s law, the first digit and the first two digits did not follow the law. The results show that Benford’s law is a good method for identify certain digit patterns and further research is needed to draw the conclusion that sale price does not follow Benford’s law as certain limitations on our data was identified.

APA, Harvard, Vancouver, ISO, and other styles

30

Howard, Marylesa Marie. "Computational methods for support vector machine classification and large-scale Kalman filtering." Thesis, University of Montana, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3568109.

Full text

Abstract:

The first half of this dissertation focuses on computational methods for solving the constrained quadratic program (QP) within the support vector machine (SVM) classifier. One of the SVM formulations requires the solution of bound and equality constrained QPs. We begin by describing an augmented Lagrangian approach which incorporates the equality constraint into the objective function, resulting in a bound constrained QP. Furthermore, all constraints may be incorporated into the objective function to yield an unconstrained quadratic program, allowing us to apply the conjugate gradient (CG) method. Lastly, we adapt the scaled gradient projection method to the SVM QP and compare the performance of these methods with the state-of-the-art sequential minimal optimization algorithm and MATLAB's built in constrained QP solver, quadprog. The augmented Lagrangian method outperforms other state-of-the-art methods on three image test cases.

The second half of this dissertation focuses on computational methods for large-scale Kalman filtering applications. The Kalman filter (KF) is a method for solving a dynamic, coupled system of equations. While these methods require only linear algebra, standard KF is often infeasible in large-scale implementations due to the storage requirements and inverse calculations of large, dense covariance matrices. We introduce the use of the CG and Lanczos methods into various forms of the Kalman filter for low-rank approximations of the covariance matrices, with low-storage requirements. We also use CG for efficient Gaussian sampling within the ensemble Kalman filter method. The CG-based KF methods perform similarly in root-mean-square error when compared to the standard KF methods, when the standard implementations are feasible, and outperform the limited-memory Broyden-Fletcher-Goldfarb-Shanno approximation method.

APA, Harvard, Vancouver, ISO, and other styles

31

Raissi, Maziar. "Multi-fidelity Stochastic Collocation." Thesis, George Mason University, 2013. http://pqdtopen.proquest.com/#viewpdf?dispub=3591697.

Full text

Abstract:

Over the last few years there have been dramatic advances in our understanding of mathematical and computational models of complex systems in the presence of uncertainty. This has led to a growth in the area of uncertainty quantification as well as the need to develop efficient, scalable, stable and convergent computational methods for solving differential equations with random inputs. Stochastic Galerkin methods based on polynomial chaos expansions have shown superiority to other non-sampling and many sampling techniques. However, for complicated governing equations numerical implementations of stochastic Galerkin methods can become non-trivial. On the other hand, Monte Carlo and other traditional sampling methods, are straightforward to implement. However, they do not offer as fast convergence rates as stochastic Galerkin. Other numerical approaches are the stochastic collocation (SC) methods, which inherit both, the ease of implementation of Monte Carlo and the robustness of stochastic Galerkin to a great deal. However, stochastic collocation and its powerful extensions, e.g. sparse grid stochastic collocation, can simply fail to handle more levels of complication. The seemingly innocent Burgers equation driven by Brownian motion is such an example. In this work we propose a novel enhancement to stochastic collocation methods using deterministic model reduction techniques that can handle this pathological example and hopefully other more complicated equations like Stochastic Navier Stokes. Our numerical results show the efficiency of the proposed technique. We also perform a mathematically rigorous study of linear parabolic partial differential equations with random forcing terms. Justified by the truncated Karhunen-Loève expansions, the input data are assumed to be represented by a finite number of random variables. A rigorous convergence analysis of our method applied to parabolic partial differential equations with random forcing terms, supported by numerical results, shows that the proposed technique is not only reliable and robust but also very efficient.

APA, Harvard, Vancouver, ISO, and other styles

32

Ruffin, Michael. "User retention and classification in a mobile gaming environment." Thesis, California State University, Long Beach, 2014. http://pqdtopen.proquest.com/#viewpdf?dispub=1527021.

Full text

Abstract:

Game analytics is a fast growing field where game studios are allocating valuable resources to develop sophisticated statistical models to understand user behavior and monetization habits to optimize game play and performance. Game developers' ability to understand user retention allows for game features that will generate high engagement leading to stronger overall monetization and increased lifetimes of players.

One important industry adopted metric is the percentage of users who log back into the game one day after installation, otherwise known as a one-day retention. Although this is an important metric, game studios typically allocate little resources to determining what user transactions are typically conducted on the day of installation that drive a one-day retention.

In this project, we first conduct a cluster analysis in an attempt to uncover meaningful subgroups based on players' transaction history on their first day of installation. Secondly, we use various classification methods including decision trees, logistic regression, and k-Nearest Neighbor algorithm to determine which behaviors are important in identifying whether a new user will return the following day.

APA, Harvard, Vancouver, ISO, and other styles

33

Eiland, E. Earl. "A Coherent Classifier/Prediction/Diagnostic Problem Framework and Relevant Summary Statistics." Thesis, New Mexico Institute of Mining and Technology, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10617960.

Full text

Abstract:

Classification is a ubiquitous decision activity. Regardless of whether it is predicting the future, e.g., a weather forecast, determining an existing state, e.g., a medical diagnosis, or some other activity, classifier outputs drive future actions. Because of their importance, classifier research and development is an active field.

Regardless of whether one is a classifier developer or an end user, evaluating and comparing classifier output quality is important. Intuitively, classifier evaluation may seem simple, however, it is not. There is a plethora of classifier summary statistics and new summary statistics seem to surface regularly. Summary statistic users appear not to be satisfied with the existing summary statistics. For end users, many existing summary statistics do not provide actionable information. This dissertation addresses the end user's quandary.

The work consists of four parts: 1. Considering eight summary statistics with regard to their purpose (what questions do they quantitatively answer) and efficacy (as defined by measurement theory). 2. Characterizing the classification problem from the end user's perspective and identifying four axioms for end user efficacious classifier evaluation summary statistics. 3. Applying the axia and measurement theory to evaluate eight summary statistics and create two compliant (end user efficacious) summary statistics. 4. Using the compliant summary statistics to show the actionable information they generate.

By applying the recommendations in this dissertation, both end users and researchers benefit. Researchers have summary statistic selection and classifier evaluation protocols that generate the most usable information. End users can also generate information that facilitates tool selection and optimal deployment, if classifier test reports provide the necessary information.

APA, Harvard, Vancouver, ISO, and other styles

34

McCants, Michael. "Efficacy of robust regression applied to fractional factorial treatment structures." Thesis, Kansas State University, 2011. http://hdl.handle.net/2097/9260.

Full text

Abstract:

Master of Science
Department of Statistics
James J. Higgins
Completely random and randomized block designs involving n factors at each of two levels are used to screen for the effects of a large number of factors. With such designs it may not be possible either because of costs or because of time to run each treatment combination more than once. In some cases, only a fraction of all the treatments may be run. With a large number of factors and limited observations, even one outlier can adversely affect the results. Robust regression methods are designed to down-weight the adverse affects of outliers. However, to our knowledge practitioners do not routinely apply robust regression methods in the context of fractional replication of 2^n factorial treatment structures. The purpose of this report is examine how robust regression methods perform in this context.

APA, Harvard, Vancouver, ISO, and other styles

35

Jacobson, Daniel A. "Networks and multivariate statistics as applied to biological datasets and wine-related omics." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019.1/85630.

Full text

Abstract:

Thesis (PhD)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: Introduction: Wine production is a complex biotechnological process aiming at productively coordinating the interactions and outputs of several biological systems, including grapevine and many microorganisms such as wine yeast and wine bacteria. High-throughput data generating tools in the elds of genomics, transcriptomics, proteomics, metabolomics and microbiomics are being applied both locally and globally in order to better understand complex biological systems. As such, the datasets available for analysis and mining include de novo datasets created by collaborators as well as publicly available datasets which one can use to get further insight into the systems under study. In order to model the complexity inherent in and across these datasets it is necessary to develop methods and approaches based on network theory and multivariate data analysis as well as to explore the intersections between these two approaches to data modelling, mining and interpretation. Networks: The traditional reductionist paradigm of analysing single components of a biological system has not provided tools with which to adequately analyse data sets that are attempting to capture systems-level information. Network theory has recently emerged as a new discipline with which to model and analyse complex systems and has arisen from the study of real and often quite large networks derived empirically from the large volumes of data that have collected from communications, internet, nancial and biological systems. This is in stark contrast to previous theoretical approaches to understanding complex systems such as complexity theory, synergetics, chaos theory, self-organised criticality, and fractals which were all sweeping theoretical constructs based on small toy models which proved unable to address the complexity of real world systems. Multivariate Data Analysis: Principle components analysis (PCA) and Partial Least Squares (PLS) regression are commonly used to reduce the dimensionality of a matrix (and amongst matrices in the case of PLS) in which there are a considerable number of potentially related variables. PCA and PLS are variance focused approaches where components are ranked by the amount of variance they each explain. Components are, by de nition, orthogonal to one another and as such, uncorrelated. Aims: This thesis explores the development of Computational Biology tools that are essential to fully exploit the large data sets that are being generated by systems-based approaches in order to gain a better understanding of winerelated organisms such as grapevine (and tobacco as a laboratory-based plant model), plant pathogens, microbes and their interactions. The broad aim of this thesis is therefore to develop computational methods that can be used in an integrated systems-based approach to model and describe di erent aspects of the wine making process from a biological perspective. To achieve this aim, computational methods have been developed and applied in the areas of transcriptomics, phylogenomics, chemiomics and microbiomics. Summary: The primary approaches taken in this thesis have been the use of networks and multivariate data analysis methods to analyse highly dimensional data sets. Furthermore, several of the approaches have started to explore the intersection between networks and multivariate data analysis. This would seem to be a logical progression as both networks and multivariate data analysis are focused on matrix-based data modelling and therefore have many of their roots in linear algebra.
AFRIKAANSE OPSOMMING: Inleiding: Wynproduksie is 'n komplekse biotegnologiese proses wat mik op die produktiewe koördinering van verskeie interaksies en uitsette van verskeie biologiese sisteme. Hierdie sisteme sluit in die wingerd, wat van besondere belang is, asook die wyn gis en wyn bakterieë. Hoë-deurset data generasie word huidiglik beide globaal en plaaslik toegepas in die velde van genomika, transkriptomika, proteomika, metabolomika en mikrobiomika. As sulks is hierdie tipe datastelle beskikbaar vir ontleding, bemyning en verkening. Die datastelle kan de novo gegenereer word, met behulp van medewerkers, of dit kan vanuit die publieke databasisse gewerf word waar sulke datastelle dikwels beskikbaar gemaak word sodat verdere insig verkry kan word met betrekking tot die sisteem onder studie. Die hoë-deurset datastelle onder bespreking bevat 'n hoë mate van inherente kompleksiteit, beide ten opsigte van ditself asook tussen verskeie datastelle. Om ten einde hierdie datastelle en hul inherente kompleksiteit te modelleer is dit nodig om metodes en benaderings te ontwikkel wat gesetel is in netwerk teorie en meerveranderlike statistiek. Verdermeer is dit ook nodig om die kruisings tussen netwerk teorie en meerveranderlike statistiek te verken om sodoende die modellering, bemyning, verkening en interpretasie van data te verbeter. Netwerke: Die tradisionele reduksionistiese paradigma, waarby enkele komponente van 'n biologiese sisteem geontleed word, het tot dusver nie voldoende metodes en gereedskap gelewer waarmee datastelle, wat streef om sisteemvlak informasie te bekom, geontleed kan word nie. Netwerk teorie het na vore gekom as 'n nuwe dissipline wat toegepas kan word vir die model-skepping en ontleding van komplekse sisteme. Dit stem uit die studie van egte, dikwels groot netwerke wat empiries afgelei word uit die groot volumes data wat tans na vore kom vanuit kommunikasie-, internet-, nansiële- en biologiese sisteme. Dit is in skrille kontras met vorige teoretiese benaderings wat gestreef het om komplekse sisteme te verstaan met konsepte soos kompleksiteits teorie, synergetics , chaos teorie, self-georganiseerde kritikaliteit en fraktale. Al die bogeneomde is breë teoretiese konstrukte, gebasseer op relatief kleinskaal modelle, wat nie instaat was om oplossings vir die kompleksiteit van egte-wêreld sisteme te bied nie. Meerveranderlike Data-analise: Hoofkomponente-ontleding (PCA) en Partial Least Squares (PLS) regressie word dikwels gebruik om die dimensionaliteit van 'n matriks (en tussen matrikse in die geval van PLS) te verminder. Hierdie matrikse bevat dikwels 'n aansienlike groot hoeveelheid moontlikverwante veranderlikes. PCA en PLS is variansie gedrewe metodes en behels dat komponente gerang word deur die hoeveelheid variansie wat elke component verduidelik. Komponente is by de nisie ortogonaal ten opsigte van mekaar en as sulks ongekorreleerd. Doelwitte: Hierdie tesis verken die ontwikkeling van verskeie Computational Biology metodes wat noodsaaklik is om ten volle die groot skaal datastelle te benut wat tans deur sisteem-gebasseerde benaderings gegenereer word. Die doel is om beter begrip en kennis van wyn verwante organismes te kry, hierdie organismes sluit in die wingerd (met tabak as laboratorium-gebasseerde plant model), plant patogene en microbes sowel as hulle interaksies. Die breë mikpunt van hierdie tesis is dus om gerekenaardiseerde metodes te ontwikkel wat gebruik kan word in 'n geintergreerde sisteem-gebaseerde benadering tot die modellering en beskrywing van verskillende aspekte van die wynmaak proses vanuit 'n biologiese standpunt. Om die mikpunt te bereik is gerekenaardiseerde metodes ontwikkel en toegepas in die velde van transkriptomika, logenomika, chemiomika en mikrobiomika. Opsomming: Die primêre benadering geneem in hierdie tesis is die gebruik van netwerke en meerveranderlike data-ontleding metodes om hoë-dimensie datastelle te ontleed. Verdermeer, verskeie van die metodes begin om die gemeenskaplike grond tussen netwerke en meerveranderlike data-ontleding te verken. Dit blyk om 'n logiese progressie te wees, aangesien beide netwerke en meerveranderlike data-ontleding gefokus is op matriks-gebaseerde data modellering en dus gewortel is in liniêre algebra.

APA, Harvard, Vancouver, ISO, and other styles

36

Araújo, Daniel Costa. "Channel estimation techniques applied to massive MIMO systems using sparsity and statistics approaches." reponame:Repositório Institucional da UFC, 2016. http://www.repositorio.ufc.br/handle/riufc/23478.

Full text

Abstract:

ARAÚJO, D. C. Channel estimation techniques applied to massive MIMO systems using sparsity and statistics approaches. 2016. 124 f. Tese (Doutorado em Engenharia de Teleinformática)–Centro de Tecnologia, Universidade Federal do Ceará, Fortaleza, 2016.
Submitted by Renato Vasconcelos (ppgeti@ufc.br) on 2017-06-21T13:52:26Z No. of bitstreams: 1 2016_tese_dcaraújo.pdf: 1832588 bytes, checksum: a4bb5d44287b92a9321d5fcc3589f22e (MD5)
Approved for entry into archive by Marlene Sousa (mmarlene@ufc.br) on 2017-06-21T16:17:55Z (GMT) No. of bitstreams: 1 2016_tese_dcaraújo.pdf: 1832588 bytes, checksum: a4bb5d44287b92a9321d5fcc3589f22e (MD5)
Made available in DSpace on 2017-06-21T16:17:55Z (GMT). No. of bitstreams: 1 2016_tese_dcaraújo.pdf: 1832588 bytes, checksum: a4bb5d44287b92a9321d5fcc3589f22e (MD5) Previous issue date: 2016-09-29
Massive MIMO has the potential of greatly increasing the system spectral eﬃciency by employing many individually steerable antenna elements at the base station (BS). This potential can only be achieved if the BS has suﬃcient channel state information (CSI) knowledge. The way of acquiring it depends on the duplexing mode employed by the communication system. Currently, frequency division duplexing (FDD) is the most used in the wireless communication system. However, the amount of overhead necessary to estimate the channel scales with the number of antennas which poses a big challenge in implementing massive MIMO systems with FDD protocol. To enable both operating together, this thesis tackles the channel estimation problem by proposing methods that exploit a compressed version of the massive MIMO channel. There are mainly two approaches used to achieve such a compression: sparsity and second order statistics. To derive sparsity-based techniques, this thesis uses a compressive sensing (CS) framework to extract a sparse-representation of the channel. This is investigated initially in a ﬂat channel and afterwards in a frequency-selective one. In the former, we show that the Cramer-Rao lower bound (CRLB) for the problem is a function of pilot sequences that lead to a Grassmannian matrix. In the frequency-selective case, a novel estimator which combines CS and tensor analysis is derived. This new method uses the measurements obtained of the pilot subcarriers to estimate a sparse tensor channel representation. Assuming a Tucker3 model, the proposed solution maps the estimated sparse tensor to a full one which describes the spatial-frequency channel response. Furthermore, this thesis investigates the problem of updating the sparse basis that arises when the user is moving. In this study, an algorithm is proposed to track the arrival and departure directions using very few pilots. Besides the sparsity-based techniques, this thesis investigates the channel estimation performance using a statistical approach. In such a case, a new hybrid beamforming (HB) architecture is proposed to spatially multiplex the pilot sequences and to reduce the overhead. More speciﬁcally, the new solution creates a set of beams that is jointly calculated with the channel estimator and the pilot power allocation using the minimum mean square error (MMSE) criterion. We show that this provides enhanced performance for the estimation process in low signal-noise ratio (SNR) scenarios.
Pesquisas em sistemas MIMO massivo (do inglês multiple-input multiple-output) ganha- ram muita atenção da comunidade cientíﬁca devido ao seu potencial em aumentar a eﬁciência espectral do sistema comunicações sem-ﬁo utilizando centenas de elementos de antenas na estação de base (EB). Porém, tal potencial só poderá é obtido se a EB possuir suﬁciente informação do estado de canal. A maneira de adquiri-lo depende de como os recursos de comunicação tempo-frequência são empregados. Atualmente, a solução mais utilizada em sistemas de comunicação sem ﬁo é a multiplexação por divisão na frequência (FDD) dos pilotos. Porém, o grande desaﬁo em implementar esse tipo solução é porque a quantidade de tons pilotos exigidos para estimar o canal aumenta com o número de antenas. Isso resulta na perda do eﬁciência espectral prometido pelo sistema massivo. Esta tese apresenta métodos de estimação de canal que demandam uma quantidade de tons pilotos reduzida, mas mantendo alta precisão na estimação do canal. Esta redução de tons pilotos é obtida porque os estimadores propostos exploram a estrutura do canal para obter uma redução das dimensões do canal. Nesta tese, existem essencialmente duas abordagens utilizadas para alcançar tal redução de dimensionalidade: uma é através da esparsidade e a outra através das estatísticas de segunda ordem. Para derivar as soluções que exploram a esparsidade do canal, o estimador de canal é obtido usando a teoria de “compressive sensing” (CS) para extrair a representação esparsa do canal. A teoria é aplicada inicialmente ao problem de estimação de canais seletivos e não-seletivos em frequência. No primeiro caso, é mostrado que limitante de Cramer-Rao (CRLB) é deﬁnido como uma função das sequências pilotos que geram uma matriz Grassmaniana. No segundo caso, CS e a análise tensorial são combinado para derivar um novo algoritmo de estimatição baseado em decomposição tensorial esparsa para canais com seletividade em frequência. Usando o modelo Tucker3, a solução proposta mapeia o tensor esparso para um tensor cheio o qual descreve a resposta do canal no espaço e na frequência. Além disso, a tese investiga a otimização da base de representação esparsa propondo um método para estimar e corrigir as variações dos ângulos de chegada e de partida, causados pela mobilidade do usuário. Além das técnicas baseadas em esparsidade, esta tese investida aquelas que usam o conhecimento estatístico do canal. Neste caso, uma nova arquitetura de beamforming híbrido é proposta para realizar multiplexação das sequências pilotos. A nova solução consite em criar um conjunto de feixes, que são calculados conjuntamente com o estimator de canal e alocação de potência para os pilotos, usand o critério de minimização erro quadrático médio. É mostrado que esta solução reduz a sequencia pilot e mostra bom desempenho e cenários de baixa relação sinal ruído (SNR).

APA, Harvard, Vancouver, ISO, and other styles

37

Gard, Rikard. "Design-based and Model-assisted estimators using Machine learning methods : Exploring the k-Nearest Neighbor metod applied to data from the Recreational Fishing Survey." Thesis, Örebro universitet, Handelshögskolan vid Örebro Universitet, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-72488.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Chernoff, Parker. "Sabermetrics - Statistical Modeling of Run Creation and Prevention in Baseball." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3663.

Full text

Abstract:

The focus of this thesis was to investigate which baseball metrics are most conducive to run creation and prevention. Stepwise regression and Liu estimation were used to formulate two models for the dependent variables and also used for cross validation. Finally, the predicted values were fed into the Pythagorean Expectation formula to predict a team’s most important goal: winning. Each model fit strongly and collinearity amongst offensive predictors was considered using variance inflation factors. Hits, walks, and home runs allowed, infield putouts, errors, defense-independent earned run average ratio, defensive efficiency ratio, saves, runners left on base, shutouts, and walks per nine innings were significant defensive predictors. Doubles, home runs, walks, batting average, and runners left on base were significant offensive regressors. Both models produced error rates below 3% for run prediction and together they did an excellent job of estimating a team’s per-season win ratio.

APA, Harvard, Vancouver, ISO, and other styles

39

Book, Emil, and Linus Ekelöf. "A Multiple Linear Regression Model To Assess The Effects of Macroeconomic Factors On Small and Medium-Sized Enterprises." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254298.

Full text

Abstract:

Small and medium-sized enterprises (SMEs) have long been considered the backbone in any country’s economy for their contribution to growth and prosperity. It is therefore of great importance that the government and legislators adopt policies that optimise the success of SMEs. Recent concerns of an impending recession has made this topic even more relevant since small companies will have greater difficulty withstanding such an event. This thesis will focus on the effects of macroeconomic factors on SMEs in Sweden, with the usage of multiple linear regression. Data was collected for a 10 year period, from 2009 to 2019 at a monthly interval. The end result was a five variable model with an coefficient of determination of 98%.
Små- och medelstora företag (SMEs) har länge varit ansedda som en av de viktigaste komponenterna i ett lands ekonomi, främst för deras bidrag till tillväxt och framgång. Det är därför mycket viktigt att regeringar och lagstiftare för en politik som främjar SMEs optimala tillväxt. Flera år av högkonjunktur och oro över kommande lågkonjunktur har gjort detta ämne ytterst relevant då små företag är de som kommer att drabbas värst av en svårare ekonomisk tillvaro. Denna rapport använder multipel linjär regression för att utvärdera effekterna av olika makroekonomiska faktorer på SMEs i Sverige. Data har insamlats månadsvis för en 10 årsperiod mellan 2009 till 2010. Resultatet blev en modell med fem variabler och en förklaringsgrad på 98%.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhi, Tianchen. "Maximum Likelihood Estimation of Parameters in Exponential Power Distribution with Upper Record Values." FIU Digital Commons, 2017. http://digitalcommons.fiu.edu/etd/3211.

Full text

Abstract:

The exponential power (EP) distribution is a very important distribution that was used by survival analysis and related with asymmetrical EP distribution. Many researchers have discussed statistical inference about the parameters in EP distribution using i.i.d random samples. However, sometimes available data might contain only record values, or it is more convenient for researchers to collect record values. We aim to resolve this problem. We estimated two parameters of the EP distribution by MLE using upper record values. According to simulation study, we used the Bias and MSE of the estimators for studying the efficiency of the proposed estimation method. Then, we discussed the prediction on the next upper record value by known upper record values. The study concluded that MLEs of EP distribution parameters by upper record values has satisfactory performance. Also, prediction of the next upper record value performed well

APA, Harvard, Vancouver, ISO, and other styles

41

Williams, Ulyana P. "On Some Ridge Regression Estimators for Logistic Regression Models." FIU Digital Commons, 2018. https://digitalcommons.fiu.edu/etd/3667.

Full text

Abstract:

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of independent variables, and log odds ratio has been varied in the design of experiment. Simulation results show that under certain conditions, the ridge regression estimators outperform the maximum likelihood estimator. Moreover, an empirical data analysis supports the main findings of this study. This thesis proposed and recommended some good ridge regression estimators of the logistic regression model for the practitioners in the field of health, physical and social sciences.

APA, Harvard, Vancouver, ISO, and other styles

42

Chu, Chi-Yang. "Applied Nonparametric Density and Regression Estimation with Discrete Data| Plug-In Bandwidth Selection and Non-Geometric Kernel Functions." Thesis, The University of Alabama, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10262364.

Full text

Abstract:

Bandwidth selection plays an important role in kernel density estimation. Least-squares cross-validation and plug-in methods are commonly used as bandwidth selectors for the continuous data setting. The former is a data-driven approach and the latter requires a priori assumptions about the unknown distribution of the data. A benefit from the plug-in method is its relatively quick computation and hence it is often used for preliminary analysis. However, we find that much less is known about the plug-in method in the discrete data setting and this motivates us to propose a plug-in bandwidth selector. A related issue is undersmoothing in kernel density estimation. Least-squares cross-validation is a popular bandwidth selector, but in many applied situations, it tends to select a relatively small bandwidth, or undersmooths. The literature suggests several methods to solve this problem, but most of them are the modifications of extant error criterions for continuous variables. Here we discuss this problem in the discrete data setting and propose non-geometric discrete kernel functions as a possible solution. This issue also occurs in kernel regression estimation. Our proposed bandwidth selector and kernel functions perform well in simulated and real data.

APA, Harvard, Vancouver, ISO, and other styles

43

Olsen, Jessica Lyn. "An Applied Investigation of Gaussian Markov Random Fields." BYU ScholarsArchive, 2012. https://scholarsarchive.byu.edu/etd/3273.

Full text

Abstract:

Recently, Bayesian methods have become the essence of modern statistics, specifically, the ability to incorporate hierarchical models. In particular, correlated data, such as the data found in spatial and temporal applications, have benefited greatly from the development and application of Bayesian statistics. One particular application of Bayesian modeling is Gaussian Markov Random Fields. These methods have proven to be very useful in providing a framework for correlated data. I will demonstrate the power of GMRFs by applying this method to two sets of data; a set of temporal data involving car accidents in the UK and a set of spatial data involving Provo area apartment complexes. For the first set of data, I will examine how including a seatbelt covariate effects our estimates for the number of car accidents. In the second set of data, we will scrutinize the effect of BYU approval on apartment complexes. In both applications we will investigate Laplacian approximations when normal distribution assumptions do not hold.

APA, Harvard, Vancouver, ISO, and other styles

44

Albaqshi, Amani Mohammed H. "Generalized Partial Least Squares Approach for Nominal Multinomial Logit Regression Models with a Functional Covariate." Thesis, University of Northern Colorado, 2017. http://pqdtopen.proquest.com/#viewpdf?dispub=10599676.

Full text

Abstract:

Functional Data Analysis (FDA) has attracted substantial attention for the last two decades. Within FDA, classifying curves into two or more categories is consistently of interest to scientists, but multi-class prediction within FDA is challenged in that most classification tools have been limited to binary response applications. The functional logistic regression (FLR) model was developed to forecast a binary response variable in the functional case. In this study, a functional nominal multinomial logit regression (F-NM-LR) model was developed that shifts the FLR model into a multiple logit model. However, the model generates inaccurate parameter function estimates due to multicollinearity in the design matrix. A generalized partial least squares (GPLS) approach with cubic B-spline basis expansions was developed to address the multicollinearity and high dimensionality problems that preclude accurate estimates and curve discrimination with the F-NM-LR model. The GPLS method extends partial least squares (PLS) and improves upon current methodology by introducing a component selection criterion that reconstructs the parameter function with fewer predictors. The GPLS regression estimates are derived via Iteratively ReWeighted Partial Least Squares (IRWPLS), defining a set of uncorrelated latent variables to use as predictors for the F-GPLS-NM-LR model. This methodology was compared to the classic alternative estimation method of principal component regression (PCR) in a simulation study. The performance of the proposed methodology was tested via simulations and applications on a spectrometric dataset. The results indicate that the GPLS method performs well in multi-class prediction with respect to the F-NM-LR model. The main difference between the two approaches was that PCR usually requires more components than GPLS to achieve similar accuracy of parameter function estimates of the F-GPLS-NM-LR model. The results of this research imply that the GPLS method is preferable to the F-NM-LR model, and it is a useful contribution to FDA techniques. This method may be particularly appropriate for practical situations where accurate prediction of a response variable with fewer components is a priority.

APA, Harvard, Vancouver, ISO, and other styles

45

Paciencia, Todd J. "Improving non-linear approaches to anomaly detection, class separation, and visualization." Thesis, Air Force Institute of Technology, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=3667806.

Full text

Abstract:

Linear approaches for multivariate data analysis are popular due to their lower complexity, reduced computational time, and easier interpretation. In many cases, linear approaches produce adequate results; however, non-linear methods may generate more robust transformations, features, and decision boundaries. Of course, these non-linear methods present their own unique challenges that often inhibit their use.

In this research, improvements to existing non-linear techniques are investigated for the purposes of providing better, timely class separation and improved anomaly detection on various multivariate datasets, culminating in application to anomaly detection in hyperspectral imagery. Primarily, kernel-based methods are investigated, with some consideration towards other methods. Improvements to existing linear-based algorithms are also explored. Here, it is assumed that classes in the data have minimal overlap in the originating space or can be made to have minimal overlap in a transformed space, and that class information is unknown a priori. Further, improvements are demonstrated for global anomaly detection on a variety of hyperspectral imagery, utilizing fusion of spatial and spectral information, factor analysis, clustering, and screening. Additionally, new approaches for n-dimensional visualization of data and decision boundaries are developed.

APA, Harvard, Vancouver, ISO, and other styles

46

Fregosi, Anna. "Calibration of Thermal Soil Properties in the Shallow Subsurface." Thesis, North Carolina State University, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10110538.

Full text

Abstract:

We use nonlinear least squares methods and Bayesian inference to calibrate soil properties using models for heat and groundwater transport in the shallow subsurface. We first assume a constant saturation in our domain and use the analytic solution to the heat equation as a model for heat transport. We compare our results to those using the finite element code, Adaptive Hydrology (ADH). We then use ADH to simulate heat and groundwater transport in an unsaturated domain. We use the Model-Independent Parameter Estimation (PEST) software to solve the least squares problem with ADH as our model. In using Bayesian inference, we employ the Delayed Rejection Adaptive Metropolis (DRAM) Markov chain Monte Carlo algorithm to sample from the posterior densities of parameters in both models. We find our results are consistent with those found using soil samples with empirical methods.

APA, Harvard, Vancouver, ISO, and other styles

47

Pipher, Brandon. "Comparison of Regression Methods with Non-Convex Penalties." Kent State University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=kent1573056251025985.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Crk, Vladimir 1958. "Component and system reliability assessment from degradation data." Diss., The University of Arizona, 1998. http://hdl.handle.net/10150/282820.

Full text

Abstract:

Reliability estimation of highly reliable components, subsystems and systems has become very difficult using the traditional accelerated life tests. Therefore, there is a need to develop new models that will determine the reliability of such components, systems or subsystems, one of which is modeling a long term performance degradation. The proposed method is more general than any of the existing ones. It can be applied to any system, subsystem or component whose degradation over time can be identified and measured. It is assumed that the performance degradation is caused by a number, d, of independent degradation mechanisms and each of them is separately modeled by a unique nonlinear, monotonically increasing or decreasing curve as a function of time. The parameters of a degradation model are partitioned into a subset of parameters which are constant for all units and a subset of parameters that vary among units, or a subset of random parameters. To accelerate the degradation processes, random samples of identical units are exposed to stress levels which are higher than use stress levels. To capture the variability among units exposed to the same stress level, the parameters of the degradation model for each unit are estimated first and then the population parameters for a given stress level are estimated. The random parameters are assumed to be multivariate normally distributed, correlated and stress dependent. The multivariate multiple linear regression is applied to the stress dependent parameters and the parameter values at use stress levels are determined. Then, the times to failure are obtained from the degradation model for given degradation mechanisms by extrapolation to the critical level of degradation at which the system, subsystem, or component is considered to be in a failure state. Since the reliability function can not be obtained in a closed form the bootstrap simulation methodology is applied to estimate the system's reliability and the mean life for a single and multiple degradation mechanisms. Two algorithms are developed to obtain the point estimates and confidence intervals for the system's reliability and mean life.

APA, Harvard, Vancouver, ISO, and other styles

49

Smith, Laura. "A Numerical Simulation and Statistical Modeling of High Intensity Radiated Fields Experiment Data." W&M ScholarWorks, 2001. https://scholarworks.wm.edu/etd/1539626330.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Zhao, Yang, and Min Zhang. "The Ising Model on a Heavy Gravity Portfolio Applied to Default Contagion." Thesis, Högskolan i Halmstad, Tillämpad matematik och fysik (MPE-lab), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-16459.

Full text

Abstract:

In this paper we introduce a model of default contagion in the financail market. The structure of the companies are represented by a Heavy Gravity Portfolio, where we assume there are N sectors in the market and in each sector i, there is one big trader and ni supply companies.The supply companies in each sector are directly inuenced by the bigtrader and the big traders are also pairwise interacting with each other.This development of the Ising model is called Heavy gravity portfolioand according to this, the relation between expectation and correlationof the default of companies are derived by means of simulations utilisingthe Gibbs sampler. Finally methods for maximum likelihood estimationand for a likelihood ratio test of the interaction parameter in the modelare derived.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Applied Statistics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles