Dissertations / Theses: 'Empirical Bayes methods'

1

Benhaddou, Rida. "Nonparametric and Empirical Bayes Estimation Methods." Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5765.

Full text

Abstract:

In the present dissertation, we investigate two different nonparametric models; empirical Bayes model and functional deconvolution model. In the case of the nonparametric empirical Bayes estimation, we carried out a complete minimax study. In particular, we derive minimax lower bounds for the risk of the nonparametric empirical Bayes estimator for a general conditional distribution. This result has never been obtained previously. In order to attain optimal convergence rates, we use a wavelet series based empirical Bayes estimator constructed in Pensky and Alotaibi (2005). We propose an adaptive version of this estimator using Lepski's method and show that the estimator attains optimal convergence rates. The theory is supplemented by numerous examples. Our study of the functional deconvolution model expands results of Pensky and Sapatinas (2009, 2010, 2011) to the case of estimating an (r+1)-dimensional function or dependent errors. In both cases, we derive minimax lower bounds for the integrated square risk over a wide set of Besov balls and construct adaptive wavelet estimators that attain those optimal convergence rates. In particular, in the case of estimating a periodic (r+1)-dimensional function, we show that by choosing Besov balls of mixed smoothness, we can avoid the ''curse of dimensionality'' and, hence, obtain higher than usual convergence rates when r is large. The study of deconvolution of a multivariate function is motivated by seismic inversion which can be reduced to solution of noisy two-dimensional convolution equations that allow to draw inference on underground layer structures along the chosen profiles. The common practice in seismology is to recover layer structures separately for each profile and then to combine the derived estimates into a two-dimensional function. By studying the two-dimensional version of the model, we demonstrate that this strategy usually leads to estimators which are less accurate than the ones obtained as two-dimensional functional deconvolutions. Finally, we consider a multichannel deconvolution model with long-range dependent Gaussian errors. We do not limit our consideration to a specific type of long-range dependence, rather we assume that the eigenvalues of the covariance matrix of the errors are bounded above and below. We show that convergence rates of the estimators depend on a balance between the smoothness parameters of the response function, the smoothness of the blurring function, the long memory parameters of the errors, and how the total number of observations is distributed among the channels.
Ph.D.
Doctorate
Mathematics
Sciences
Mathematics

APA, Harvard, Vancouver, ISO, and other styles

2

Brandel, John. "Empirical Bayes methods for missing data analysis." Thesis, Uppsala University, Department of Mathematics, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-121408.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Lönnstedt, Ingrid. "Empirical Bayes Methods for DNA Microarray Data." Doctoral thesis, Uppsala University, Department of Mathematics, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-5865.

Full text

Abstract:

cDNA microarrays is one of the first high-throughput gene expression technologies that has emerged within molecular biology for the purpose of functional genomics. cDNA microarrays compare the gene expression levels between cell samples, for thousands of genes simultaneously.

The microarray technology offers new challenges when it comes to data analysis, since the thousands of genes are examined in parallel, but with very few replicates, yielding noisy estimation of gene effects and variances. Although careful image analyses and normalisation of the data is applied, traditional methods for inference like the Student t or Fisher’s F-statistic fail to work.

In this thesis, four papers on the topics of empirical Bayes and full Bayesian methods for two-channel microarray data (as e.g. cDNA) are presented. These contribute to proving that empirical Bayes methods are useful to overcome the specific data problems. The sample distributions of all the genes involved in a microarray experiment are summarized into prior distributions and improves the inference of each single gene.

The first part of the thesis includes biological and statistical background of cDNA microarrays, with an overview of the different steps of two-channel microarray analysis, including experimental design, image analysis, normalisation, cluster analysis, discrimination and hypothesis testing. The second part of the thesis consists of the four papers. Paper I presents the empirical Bayes statistic B, which corresponds to a t-statistic. Paper II is based on a version of B that is extended for linear model effects. Paper III assesses the performance of empirical Bayes models by comparisons with full Bayes methods. Paper IV provides extensions of B to what corresponds to F-statistics.

APA, Harvard, Vancouver, ISO, and other styles

4

Lönnstedt, Ingrid. "Empirical Bayes methods for DNA microarray data /." Uppsala : Matematiska institutionen, Univ. [distributör], 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-5865.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Jakimauskas, Gintautas. "Analysis and application of empirical Bayes methods in data mining." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140423_090853-72998.

Full text

Abstract:

The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented.
Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju.

APA, Harvard, Vancouver, ISO, and other styles

6

Everitt, Niklas. "Module identification in dynamic networks: parametric and empirical Bayes methods." Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-208920.

Full text

Abstract:

The purpose of system identification is to construct mathematical models of dynamical system from experimental data. With the current trend of dynamical systems encountered in engineering growing ever more complex, an important task is to efficiently build models of these systems. Modelling the complete dynamics of these systems is in general not possible or even desired. However, often, these systems can be modelled as simpler linear systems interconnected in a dynamic network. Then, the task of estimating the whole network or a subset of the network can be broken down into subproblems of estimating one simple system, called module, embedded within the dynamic network. The prediction error method (PEM) is a benchmark in parametric system identification. The main advantage with PEM is that for Gaussian noise, it corresponds to the so called maximum likelihood (ML) estimator and is asymptotically efficient. One drawback is that the cost function is in general nonconvex and a gradient based search over the parameters has to be carried out, rendering a good starting point crucial. Therefore, other methods such as subspace or instrumental variable methods are required to initialize the search. In this thesis, an alternative method, called model order reduction Steiglitz-McBride (MORSM) is proposed. As MORSM is also motivated by ML arguments, it may also be used on its own and will in some cases provide asymptotically efficient estimates. The method is computationally attractive since it is composed of a sequence of least squares steps. It also treats the part of the network of no direct interest nonparametrically, simplifying model order selection for the user. A different approach is taken in the second proposed method to identify a module embedded in a dynamic network. Here, the impulse response of the part of the network of no direct interest is modelled as a realization of a Gaussian process. The mean and covariance of the Gaussian process is parameterized by a set of parameters called hyperparameters that needs to be estimated together with the parameters of the module of interest. Using an empirical Bayes approach, all parameters are estimated by maximizing the marginal likelihood of the data. The maximization is carried out by using an iterative expectation/conditional-maximization scheme, which alternates so called expectation steps with a series of conditional-maximization steps. When only the module input and output sensors are used, the expectation step admits an analytical expression. The conditional-maximization steps reduces to solving smaller optimization problems, which either admit a closed form solution, or can be efficiently solved by using gradient descent strategies. Therefore, the overall optimization turns out computationally efficient. Using markov chain monte carlo techniques, the method is extended to incorporate additional sensors. Apart from the choice of identification method, the set of chosen signals to use in the identification will determine the covariance of the estimated modules. To chose these signals, well known expressions for the covariance matrix could, together with signal constraints, be formulated as an optimization problem and solved. However, this approach does neither tell us why a certain choice of signals is optimal nor what will happen if some properties change. The expressions developed in this part of the thesis have a different flavor in that they aim to reformulate the covariance expressions into a form amenable for interpretation. These expressions illustrate how different properties of the identification problem affects the achievable accuracy. In particular, how the power of the input and noise signals, as well as model structure, affect the covariance.
Systemidentifiering används för att skatta en modell av ett dynamiskt system genom att anpassa modellens parametrar utifrån experimentell mätdata inhämtad från systemet som ska modelleras. Systemen som modelleras tenderar att växa sig så omfattande i skala och så komplexa att direkt modellering varken är genomförbar eller önskad. I många fall går det komplexa systemet att beskriva som en komposition av enklare linära system (moduler) sammakopplade i något vi kallar dynamiska nätverk. Uppgiften att modellera hela eller delar av nätverket kan därmed brytas ner till deluppgiften att modellera en modul i det dynamiska nätverket. Det vanligaste sättet att skatta parametrarna hos en model är genom att minimera det så kallade prediktionsfelet. Den här typen av metod har nyligen anpassats för att identifiera moduler i dynamiska nätverk. Metoden åtnjuter goda egenskaper vad det gäller det modelfel som härrör från stokastisk störningar under experimentet och i de fall där störningarna är normalfördelade sammanfaller metoden med maximum likelihood-metoden. En nackdel med metoden är att functionen som minimeras vanligen är inte är konvex och därmed riskerar metoden att fastna i ett lokalt minimum. Det är därför essentiellt med en bra startpunkt. Andra metoder krävs därmed för att hitta en startpunkt, till exempel kan instrumentvariabelmetoder användas. I den här avhandlingen föreslås en alternativ metod kallad MORSM. MORSM är motiverad med argument hämtade från maximum likelihood och är också asymptotiskt effektiv i vissa fall. MORSM består av steg som kan lösas med minstakvadratmetoden och är därmed beräkningsmässigt attraktiv. Den del av nätverket som är utan intresse skattas enbart ickeparametriskt vilket underlättar valet av modellordning för användaren. En annan utgångspunkt tas i den andra metoden som föreslås för att skatta en modul inbäddad i ett dynamiskt nätverk. Impulssvaret från den del av nätverket som är utan intresse modelleras som realisation av en Gaussisk process. Medelvärdet och kovariansen hos den Gaussiska processen parametriseras av en mängd parametrar kallade hyperparametrar vilka skattas tillsammans med parametrarna för modulen. Parametrarna skattas genom att maximera den marginella likelihood funktionen. Optimeringen utförs iterativt med ECM, en variant av förväntan och maximering algoritmen (EM). Algoritmen har två steg. E-steget har en analytisk lösning medan CM-steget reduceras till delproblem som antingen har analytisk lösning eller har låg dimensionalitet och därmed kan lösas med gradientbaserade metoder. Den övergripande optimeringen är därmed beräkningsmässigt attraktiv. Med hjälp av MCMC tekniker generaliseras metoden till att inkludera ytterligare sensorer vars impulssvar också modelleras som Gaussiska processer. Förutom valet av metod så påverkar valet av signaler vilken nogrannhet eller kovarians den skattade modulen har. Klassiska uttryck för kovariansmatrisen kan användas för att optimera valet av signaler. Dock så ger dessa uttryck ingen insikt i varför valet av vissa signaler är optimalt eller vad som skulle hända om förutsättningarna vore annorlunda. Uttrycken som framställs i den här delen av avhandlingen har ett annat syfte. De försöker i stället uttrycka kovariansen i termer som kan ge insikt i vad som påverkar den nogrannhet som kan uppnås. Mer specifikt uttrycks kovariansen med bland annat avseende på insignalernas spektra, brussignalernas spektra samt modellstruktur.

QC 20170614

APA, Harvard, Vancouver, ISO, and other styles

7

Duan, Xiuwen. "Revisiting Empirical Bayes Methods and Applications to Special Types of Data." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42340.

Full text

Abstract:

Empirical Bayes methods have been around for a long time and have a wide range of applications. These methods provide a way in which historical data can be aggregated to provide estimates of the posterior mean. This thesis revisits some of the empirical Bayesian methods and develops new applications. We first look at a linear empirical Bayes estimator and apply it on ranking and symbolic data. Next, we consider Tweedie’s formula and show how it can be applied to analyze a microarray dataset. The application of the formula is simplified with the Pearson system of distributions. Saddlepoint approximations enable us to generalize several results in this direction. The results show that the proposed methods perform well in applications to real data sets.

APA, Harvard, Vancouver, ISO, and other styles

8

Hort, Molly. "A comparison of hypothesis testing procedures for two population proportions." Manhattan, Kan. : Kansas State University, 2008. http://hdl.handle.net/2097/725.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Kisamore, Jennifer L. "Validity Generalization and Transportability: An Investigation of Distributional Assumptions of Random-Effects Meta-Analytic Methods." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000060.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Jakimauskas, Gintautas. "Duomenų tyrybos empirinių Bajeso metodų tyrimas ir taikymas." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140423_090834-67696.

Full text

Abstract:

Darbo tyrimų objektas yra duomenų tyrybos empiriniai Bajeso metodai ir algoritmai, taikomi didelio matavimų skaičiaus didelių populiacijų duomenų analizei. Darbo tyrimų tikslas yra sudaryti metodus ir algoritmus didelių populiacijų neparametrinių hipotezių tikrinimui ir duomenų modelių parametrų vertinimui. Šiam tikslui pasiekti yra sprendžiami tokie uždaviniai: 1. Sudaryti didelio matavimo duomenų skaidymo algoritmą. 2. Pritaikyti didelio matavimo duomenų skaidymo algoritmą neparametrinėms hipotezėms tikrinti. 3. Pritaikyti empirinį Bajeso metodą daugiamačių duomenų komponenčių nepriklausomumo hipotezei tikrinti su skirtingais matematiniais modeliais, nustatant optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. 4. Sudaryti didelių populiacijų retų įvykių dažnių vertinimo algoritmą panaudojant empirinį Bajeso metodą palyginant Puasono-gama ir Puasono-Gauso matematinius modelius. 5. Sudaryti retų įvykių logistinės regresijos algoritmą panaudojant empirinį Bajeso metodą. Darbo metu gauti nauji rezultatai įgalina atlikti didelio matavimo duomenų skaidymą; atlikti didelio matavimo nekoreliuotų duomenų pasirinktų komponenčių nepriklausomumo tikrinimą; parinkti didelių populiacijų retų įvykių optimalų modelį ir atitinkamą empirinį Bajeso įvertinį. Pateikta nesinguliarumo sąlyga Puasono-gama modelio atveju.
The research object is data mining empirical Bayes methods and algorithms applied in the analysis of large populations of large dimensions. The aim and objectives of the research are to create methods and algorithms for testing nonparametric hypotheses for large populations and for estimating the parameters of data models. The following problems are solved to reach these objectives: 1. To create an efficient data partitioning algorithm of large dimensional data. 2. To apply the data partitioning algorithm of large dimensional data in testing nonparametric hypotheses. 3. To apply the empirical Bayes method in testing the independence of components of large dimensional data vectors. 4. To develop an algorithm for estimating probabilities of rare events in large populations, using the empirical Bayes method and comparing Poisson-gamma and Poisson-Gaussian mathematical models, by selecting an optimal model and a respective empirical Bayes estimator. 5. To create an algorithm for logistic regression of rare events using the empirical Bayes method. The results obtained enables us to perform very fast and efficient partitioning of large dimensional data; testing the independence of selected components of large dimensional data; selecting the optimal model in the estimation of probabilities of rare events, using the Poisson-gamma and Poisson-Gaussian mathematical models and empirical Bayes estimators. The nonsingularity condition in the case of the Poisson-gamma model is presented.

APA, Harvard, Vancouver, ISO, and other styles

11

Piaseckienė, Karolina. "The statistical methods in the analysis of the Lithuanian language complexity." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2014. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2014~D_20140922_141231-96020.

Full text

Abstract:

The target of the work is to apply mathematical and statistical methods in the analysis of the Lithuanian language by identifying and taking into account peculiarities of the Lithuanian language, its heterogeneity, complexity and variability.
Pagrindinis darbo tikslas – pritaikyti matematinius ir statistinius metodus lietuvių kalbos analizėje, identifikuojant ir atsižvelgiant į lietuvių kalbos ypatumus, jos heterogeniškumą, sudėtingumą ir variabilumą.

APA, Harvard, Vancouver, ISO, and other styles

12

Fredette, Marc. "Prediction of recurrent events." Thesis, University of Waterloo, 2004. http://hdl.handle.net/10012/1142.

Full text

Abstract:

In this thesis, we will study issues related to prediction problems and put an emphasis on those arising when recurrent events are involved. First we define the basic concepts of frequentist and Bayesian statistical prediction in the first chapter. In the second chapter, we study frequentist prediction intervals and their associated predictive distributions. We will then present an approach based on asymptotically uniform pivotals that is shown to dominate the plug-in approach under certain conditions. The following three chapters consider the prediction of recurrent events. The third chapter presents different prediction models when these events can be modeled using homogeneous Poisson processes. Amongst these models, those using random effects are shown to possess interesting features. In the fourth chapter, the time homogeneity assumption is relaxed and we present prediction models for non-homogeneous Poisson processes. The behavior of these models is then studied for prediction problems with a finite horizon. In the fifth chapter, we apply the concepts discussed previously to a warranty dataset coming from the automobile industry. The number of processes in this dataset being very large, we focus on methods providing computationally rapid prediction intervals. Finally, we discuss the possibilities of future research in the last chapter.

APA, Harvard, Vancouver, ISO, and other styles

13

Yu, Xue Qin. "Comparing survival from cancer using population-based cancer registry data - methods and applications." Thesis, The University of Sydney, 2007. http://hdl.handle.net/2123/1774.

Full text

Abstract:

Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.

APA, Harvard, Vancouver, ISO, and other styles

14

Yu, Xue Qin. "Comparing survival from cancer using population-based cancer registry data - methods and applications." University of Sydney, 2007. http://hdl.handle.net/2123/1774.

Full text

Abstract:

Doctor of Philosophy
Over the past decade, population-based cancer registry data have been used increasingly worldwide to evaluate and improve the quality of cancer care. The utility of the conclusions from such studies relies heavily on the data quality and the methods used to analyse the data. Interpretation of comparative survival from such data, examining either temporal trends or geographical differences, is generally not easy. The observed differences could be due to methodological and statistical approaches or to real effects. For example, geographical differences in cancer survival could be due to a number of real factors, including access to primary health care, the availability of diagnostic and treatment facilities and the treatment actually given, or to artefact, such as lead-time bias, stage migration, sampling error or measurement error. Likewise, a temporal increase in survival could be the result of earlier diagnosis and improved treatment of cancer; it could also be due to artefact after the introduction of screening programs (adding lead time), changes in the definition of cancer, stage migration or several of these factors, producing both real and artefactual trends. In this thesis, I report methods that I modified and applied, some technical issues in the use of such data, and an analysis of data from the State of New South Wales (NSW), Australia, illustrating their use in evaluating and potentially improving the quality of cancer care, showing how data quality might affect the conclusions of such analyses. This thesis describes studies of comparative survival based on population-based cancer registry data, with three published papers and one accepted manuscript (subject to minor revision). In the first paper, I describe a modified method for estimating spatial variation in cancer survival using empirical Bayes methods (which was published in Cancer Causes and Control 2004). I demonstrate in this paper that the empirical Bayes method is preferable to standard approaches and show how it can be used to identify cancer types where a focus on reducing area differentials in survival might lead to important gains in survival. In the second paper (published in the European Journal of Cancer 2005), I apply this method to a more complete analysis of spatial variation in survival from colorectal cancer in NSW and show that estimates of spatial variation in colorectal cancer can help to identify subgroups of patients for whom better application of treatment guidelines could improve outcome. I also show how estimates of the numbers of lives that could be extended might assist in setting priorities for treatment improvement. In the third paper, I examine time trends in survival from 28 cancers in NSW between 1980 and 1996 (published in the International Journal of Cancer 2006) and conclude that for many cancers, falls in excess deaths in NSW from 1980 to 1996 are unlikely to be attributable to earlier diagnosis or stage migration; thus, advances in cancer treatment have probably contributed to them. In the accepted manuscript, I described an extension of the work reported in the second paper, investigating the accuracy of staging information recorded in the registry database and assessing the impact of error in its measurement on estimates of spatial variation in survival from colorectal cancer. The results indicate that misclassified registry stage can have an important impact on estimates of spatial variation in stage-specific survival from colorectal cancer. Thus, if cancer registry data are to be used effectively in evaluating and improving cancer care, the quality of stage data might have to be improved. Taken together, the four papers show that creative, informed use of population-based cancer registry data, with appropriate statistical methods and acknowledgement of the limitations of the data, can be a valuable tool for evaluating and possibly improving cancer care. Use of these findings to stimulate evaluation of the quality of cancer care should enhance the value of the investment in cancer registries. They should also stimulate improvement in the quality of cancer registry data, particularly that on stage at diagnosis. The methods developed in this thesis may also be used to improve estimation of geographical variation in other count-based health measures when the available data are sparse.

APA, Harvard, Vancouver, ISO, and other styles

15

Rahal, Abbas. "Bayesian Methods Under Unknown Prior Distributions with Applications to The Analysis of Gene Expression Data." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42408.

Full text

Abstract:

The local false discovery rate (LFDR) is one of many existing statistical methods that analyze multiple hypothesis testing. As a Bayesian quantity, the LFDR is based on the prior probability of the null hypothesis and a mixture distribution of null and non-null hypothesis. In practice, the LFDR is unknown and needs to be estimated. The empirical Bayes approach can be used to estimate that mixture distribution. Empirical Bayes does not require complete information about the prior and hyper prior distributions as in hierarchical Bayes. When we do not have enough information at the prior level, and instead of placing a distribution at the hyper prior level in the hierarchical Bayes model, empirical Bayes estimates the prior parameters using the data via, often, the marginal distribution. In this research, we developed new Bayesian methods under unknown prior distribution. A set of adequate prior distributions maybe defined using Bayesian model checking by setting a threshold on the posterior predictive p-value, prior predictive p-value, calibrated p-value, Bayes factor, or integrated likelihood. We derive a set of adequate posterior distributions from that set. In order to obtain a single posterior distribution instead of a set of adequate posterior distributions, we used a blended distribution, which minimizes the relative entropy of a set of adequate prior (or posterior) distributions to a "benchmark" prior (or posterior) distribution. We present two approaches to generate a blended posterior distribution, namely, updating-before-blending and blending-before-updating. The blended posterior distribution can be used to estimate the LFDR by considering the nonlocal false discovery rate as a benchmark and the different LFDR estimators as an adequate set. The likelihood ratio can often be misleading in multiple testing, unless it is supplemented by adjusted p-values or posterior probabilities based on sufficiently strong prior distributions. In case of unknown prior distributions, they can be estimated by empirical Bayes methods or blended distributions. We propose a general framework for applying the laws of likelihood to problems involving multiple hypotheses by bringing together multiple statistical models. We have applied the proposed framework to data sets from genomics, COVID-19 and other data.

APA, Harvard, Vancouver, ISO, and other styles

16

Liley, Albert James. "Statistical co-analysis of high-dimensional association studies." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/270628.

Full text

Abstract:

Modern medical practice and science involve complex phenotypic definitions. Understanding patterns of association across this range of phenotypes requires co-analysis of high-dimensional association studies in order to characterise shared and distinct elements. In this thesis I address several problems in this area, with a general linking aim of making more efficient use of available data. The main application of these methods is in the analysis of genome-wide association studies (GWAS) and similar studies. Firstly, I developed methodology for a Bayesian conditional false discovery rate (cFDR) for levering GWAS results using summary statistics from a related disease. I extended an existing method to enable a shared control design, increasing power and applicability, and developed an approximate bound on false-discovery rate (FDR) for the procedure. Using the new method I identified several new variant-disease associations. I then developed a second application of shared control design in the context of study replication, enabling improvement in power at the cost of changing the spectrum of sensitivity to systematic errors in study cohorts. This has application in studies on rare diseases or in between-case analyses. I then developed a method for partially characterising heterogeneity within a disease by modelling the bivariate distribution of case-control and within-case effect sizes. Using an adaptation of a likelihood-ratio test, this allows an assessment to be made of whether disease heterogeneity corresponds to differences in disease pathology. I applied this method to a range of simulated and real datasets, enabling insight into the cause of heterogeneity in autoantibody positivity in type 1 diabetes (T1D). Finally, I investigated the relation of subtypes of juvenile idiopathic arthritis (JIA) to adult diseases, using modified genetic risk scores and linear discriminants in a penalised regression framework. The contribution of this thesis is in a range of methodological developments in the analysis of high-dimensional association study comparison. Methods such as these will have wide application in the analysis of GWAS and similar areas, particularly in the development of stratified medicine.

APA, Harvard, Vancouver, ISO, and other styles

17

Devarasetty, Prem Chand. "SAFETY IMPROVEMENTS ON MULTILANE ARTERIALS A BEFORE AND AFTER EVALUATION USING THE EMPIRICAL BAYES METHOD." Master's thesis, Orlando, Fla. : University of Central Florida, 2009. http://purl.fcla.edu/fcla/etd/CFE0002723.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Filho, Diógenes Ferreira. "Estudo de expressão gênica em citros utilizando modelos lineares." Universidade de São Paulo, 2010. http://www.teses.usp.br/teses/disponiveis/11/11134/tde-16032010-111945/.

Full text

Abstract:

Neste trabalho apresenta-se uma revisão da metodologia de experimentos de microarray relativas a sua instalação e análise estatística dos dados obtidos. A seguir, aplica-se essa metodologia na análise de dados de expressão gênica em citros, gerados por um experimento de macroarray, utilizando modelos lineares de efeitos fixos considerando a inclusão ou não de diferentes efeitos e considerando ajustes de modelos para cada gene separadamente e para todos os genes simultaneamente. Os experimentos de macroarray são similares aos experimentos de microarray, porém utilizam um menor número de genes. Em geral, são utilizados devido a restrições econômicas. Devido ao fato de terem sido utilizados poucos arrays no experimento analisado neste trabalho foi utilizada uma abordagem bayesiana empírica que utiliza estimativas de variância mais estáveis e que leva em consideração a correlação entre as repetições do gene dentro do array. Também foi utilizado um método de análise não paramétrico para contornar o problema da falta de normalidade para alguns genes. Os resultados obtidos em cada um dos métodos de análise descritos foram então comparados.
This paper presents a review of the methodology of microarray experiments for its installation and statistical analysis of data obtained. Then this methodology is applied in data analysis of gene expression in citrus, generated by a macroarray experiment, using linear models with fixed effects considering the inclusion or exclusion of different effects and considering adjustments of models for each gene separately and for all genes simultaneously. The macroarray experiments are similar to the microarray experiments, but use a smaller number of genes. In general, are used due to economic restrictions. Because they have been used a few arrays in the experiment analyzed in this study it was used a empirical Bayes approach that uses estimates of variance more stable and that takes into account the correlation among replicates of the gene within array. A non parametric analysis method was also used to outline the problem of the non normality for some genes. The results obtained in each of the described methods of analysis were then compared.

APA, Harvard, Vancouver, ISO, and other styles

19

Wahl, Jean-Baptiste. "The Reduced basis method applied to aerothermal simulations." Thesis, Strasbourg, 2018. http://www.theses.fr/2018STRAD024/document.

Full text

Abstract:

Nous présentons dans cette thèse nos travaux sur la réduction d'ordre appliquée à des simulations d'aérothermie. Nous considérons le couplage entre les équations de Navier-Stokes et une équations d'énergie de type advection-diffusion. Les paramètres physiques considérés nous obligent à considéré l'introduction d'opérateurs de stabilisation de type SUPG ou GLS. Le but étant d'ajouter une diffusion numérique dans la direction du champs de convection, afin de supprimer les oscillations non-phyisques. Nous présentons également notre stratégie de résolution basée sur la méthode des bases réduite (RBM). Afin de retrouver une décomposition affine, essentielle pour l'application de la RBM, nous avons implémenté une version discrète de la méthode d'interpolation empirique (EIM). Cette variante permet de la construction d'approximation affine pour des opérateurs complexes. Nous utilisons notamment cette méthode pour la réduction des opérateurs de stabilisations. Cependant, la construction des bases EIM pour des problèmes non-linéaires implique un grand nombre de résolution éléments finis. Pour pallier à ce problème, nous mettons en oeuvre les récents développement de l'algorithme de coconstruction entre EIM et RBM (SER)
We present in this thesis our work on model order reduction for aerothermal simulations. We consider the coupling between the incompressible Navier-Stokes equations and an advection-diffusion equation for the temperature. Since the physical parameters induce high Reynolds and Peclet numbers, we have to introduce stabilization operators in the formulation to deal with the well known numerical stability issue. The chosen stabilization, applied to both fluid and heat equations, is the usual Streamline-Upwind/Petrov-Galerkin (SUPG) which add artificial diffusivity in the direction of the convection field. We also introduce our order reduction strategy for this model, based on the Reduced Basis Method (RBM). To recover an affine decomposition for this complex model, we implemented a discrete variation of the Empirical Interpolation Method (EIM) which is a discrete version of the original EIM. This variant allows building an approximated affine decomposition for complex operators such as in the case of SUPG. We also use this method for the non-linear operators induced by the shock capturing method. The construction of an EIM basis for non-linear operators involves a potentially huge number of non-linear FEM resolutions - depending on the size of the sampling. Even if this basis is built during an offline phase, we usually can not afford such expensive computational cost. We took advantage of the recent development of the Simultaneous EIM Reduced basis algorithm (SER) to tackle this issue

APA, Harvard, Vancouver, ISO, and other styles

20

Laurent, Philippe. "Méthodes d'accéleration pour la résolution numérique en électrolocation et en chimie quantique." Thesis, Nantes, Ecole des Mines, 2015. http://www.theses.fr/2015EMNA0122/document.

Full text

Abstract:

Cette thèse aborde deux thématiques différentes. On s’intéresse d’abord au développement et à l’analyse de méthodes pour le sens électrique appliqué à la robotique. On considère en particulier la méthode des réflexions permettant, à l’image de la méthode de Schwarz, de résoudre des problèmes linéaires à partir de sous-problèmes plus simples. Ces deniers sont obtenus par décomposition des frontières du problème de départ. Nous en présentons des preuves de convergence et des applications. Dans le but d’implémenter un simulateur du problème direct d’électrolocation dans un robot autonome, on s’intéresse également à une méthode de bases réduites pour obtenir des algorithmes peu coûteux en temps et en place mémoire. La seconde thématique traite d’un problème inverse dans le domaine de la chimie quantique. Nous cherchons ici à déterminer les caractéristiques d’un système quantique. Celui-ci est éclairé par un champ laser connu et fixé. Dans ce cadre, les données du problème inverse sont les états avant et après éclairage. Un résultat d’existence locale est présenté, ainsi que des méthodes de résolution numériques
This thesis tackle two different topics.We first design and analyze algorithms related to the electrical sense for applications in robotics. We consider in particular the method of reflections, which allows, like the Schwartz method, to solve linear problems using simpler sub-problems. These ones are obtained by decomposing the boundaries of the original problem. We give proofs of convergence and applications. In order to implement an electrolocation simulator of the direct problem in an autonomous robot, we build a reduced basis method devoted to electrolocation problems. In this way, we obtain algorithms which satisfy the constraints of limited memory and time resources. The second topic is an inverse problem in quantum chemistry. Here, we want to determine some features of a quantum system. To this aim, the system is ligthed by a known and fixed Laser field. In this framework, the data of the inverse problem are the states before and after the Laser lighting. A local existence result is given, together with numerical methods for the solving

APA, Harvard, Vancouver, ISO, and other styles

21

Yang, L., and Daniel Neagu. "Integration strategies for toxicity data from an empirical perspective." 2014. http://hdl.handle.net/10454/10814.

Full text

Abstract:

No
The recent development of information techniques, especially the state-of-the-art “big data” solutions, enables the extracting, gathering, and processing large amount of toxicity information from multiple sources. Facilitated by this technology advance, a framework named integrated testing strategies (ITS) has been proposed in the predictive toxicology domain, in an effort to intelligently jointly use multiple heterogeneous toxicity data records (through data fusion, grouping, interpolation/extrapolation etc.) for toxicity assessment. This will ultimately contribute to accelerating the development cycle of chemical products, reducing animal use, and decreasing development costs. Most of the current study in ITS is based on a group of consensus processes, termed weight of evidence (WoE), which quantitatively integrate all the relevant data instances towards the same endpoint into an integrated decision supported by data quality. Several WoE implementations for the particular case of toxicity data fusion have been presented in the literature, which are collectively studied in this paper. Noting that these uncertainty handling methodologies are usually not simply developed from conventional probability theory due to the unavailability of big datasets, this paper first investigates the mathematical foundations of these approaches. Then, the investigated data integration models are applied to a representative case in the predictive toxicology domain, with the experimental results compared and analysed.

APA, Harvard, Vancouver, ISO, and other styles

22

Zhang, Pengyue. "Study designs and statistical methods for pharmacogenomics and drug interaction studies." Diss., 2016. http://hdl.handle.net/1805/11300.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Adverse drug events (ADEs) are injuries resulting from drug-related medical interventions. ADEs can be either induced by a single drug or a drug-drug interaction (DDI). In order to prevent unnecessary ADEs, many regulatory agencies in public health maintain pharmacovigilance databases for detecting novel drug-ADE associations. However, pharmacovigilance databases usually contain a significant portion of false associations due to their nature structure (i.e. false drug-ADE associations caused by co-medications). Besides pharmacovigilance studies, the risks of ADEs can be minimized by understating their mechanisms, which include abnormal pharmacokinetics/pharmacodynamics due to genetic factors and synergistic effects between drugs. During the past decade, pharmacogenomics studies have successfully identified several predictive markers to reduce ADE risks. While, pharmacogenomics studies are usually limited by the sample size and budget. In this dissertation, we develop statistical methods for pharmacovigilance and pharmacogenomics studies. Firstly, we propose an empirical Bayes mixture model to identify significant drug-ADE associations. The proposed approach can be used for both signal generation and ranking. Following this approach, the portion of false associations from the detected signals can be well controlled. Secondly, we propose a mixture dose response model to investigate the functional relationship between increased dimensionality of drug combinations and the ADE risks. Moreover, this approach can be used to identify high-dimensional drug combinations that are associated with escalated ADE risks at a significantly low local false discovery rates. Finally, we proposed a cost-efficient design for pharmacogenomics studies. In order to pursue a further cost-efficiency, the proposed design involves both DNA pooling and two-stage design approach. Compared to traditional design, the cost under the proposed design will be reduced dramatically with an acceptable compromise on statistical power. The proposed methods are examined by extensive simulation studies. Furthermore, the proposed methods to analyze pharmacovigilance databases are applied to the FDA’s Adverse Reporting System database and a local electronic medical record (EMR) database. For different scenarios of pharmacogenomics study, optimized designs to detect a functioning rare allele are given as well.

APA, Harvard, Vancouver, ISO, and other styles

23

Yang-YuCheng and 鄭暘諭. "Estimation of False Discovery Rate Using Empirical Bayes Method." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/78t3ye.

Full text

Abstract:

碩士
國立成功大學
統計學系
104
In multiple testing problems, if you do not adjust the individual type I error rate and still set the individual significance level α, then the overall type I error rate of m hypotheses will be expanded to be mα. This study assumes that several genes have mixed normal distribution, and parameters have prior distribution. We use the Bayesian posterior distribution and EM algorithm to estimate the proportion of the null hypothesis which is true, then to estimate the number of null hypothesis which is true, and FDR. We compare the performance of these estimators for different parameters through the Monte Carlo algorithm. The estimator using McNemar test proposed by Ma ＆ Chao (2011) may cause estimation error too large as the significance level is set to be α=0.05. The estimator proposed by Benjamini ＆ Hochberg (2000) is unstable when the ratio of gene mutation is set to be random. The estimator using Friedman test proposed by Ma ＆ Tsai (2011) also has the same scenario. When the number of genes and the number of patients both are large and the proportion of true null hypothesis is higher, the proposed EBay estimator has the smaller RMSE. Hence it’s more accurate.

APA, Harvard, Vancouver, ISO, and other styles

24

Lin, I.-Chin, and 林義欽. "Some Applications of Empirical Bayes Method for Selecting Exponential Distributions." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/95597143003863223702.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Lin, Tzu-Yin, and 林姿吟. "An Empirical Bayes Process Monitoring Technique for Categorical Data Utilizing the Likelihood Ratio Method." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/338ub5.

Full text

Abstract:

碩士
國立交通大學
統計學研究所
92
The purpose of the paper is to develop an empirical Bayes process monitoring technique for manufacturing categorical data utilizing the likelihood ratio method. First, assuming the normal-binomial or -multinomial model, an empirical Bayes inference for manufacturing categorical data is discussed. Next, utilizing the likelihood ratio method, an empirical Bayes process monitoring technique for manufacturing categorical data is proposed. Finally, the average run length behavior of the proposed process monitoring scheme is investigated.

APA, Harvard, Vancouver, ISO, and other styles

26

Kuo, Pei-Fen. "Examining the Effects of Site-Selection Criteria for Evaluating the Effectiveness of Traffic Safety Improvement Countermeasures." Thesis, 2012. http://hdl.handle.net/1969.1/ETD-TAMU-2012-05-10841.

Full text

Abstract:

The before-after study is still the most popular method used by traffic engineers and transportation safety analysts for evaluating the effects of an intervention. However, this kind of study may be plagued by important methodological limitations, which could significantly alter the study outcome. They include the regression-to-the-mean (RTM) and site-selection effects. So far, most of the research on these biases has focused on the RTM. Hence, the primary objective of this study consists of presenting a method that can reduce the site-selection bias when an entry criterion is used in before-after studies for continuous (e.g. speed, reaction times, etc.) and count data (e.g. number of crashes, number of fatalities, etc.). The proposed method documented in this research provides a way to adjust the Naive estimator by using the sample data and without relying on the data collected from the control group, since finding enough appropriate sites for the control group is much harder in traffic-safety analyses. In this study, the proposed method, a.k.a. Adjusted method, was compared to commonly used methods in before-after studies. The study results showed that among all methods evaluated, the Naive is the most significantly affected by the selection bias. Using the CG, the ANCOVA, or the EB method based on a control group (EBCG) method can eliminate the site-selection bias, as long as the characteristics of the control group are exactly the same as those for the treatment group. However, control group data that have same characteristics based on a truncated distribution or sample may not be available in practice. Moreover, site-selection bias generated by using a dissimilar control group might be even higher than with using the Naive method. The Adjusted method can partially eliminate site-selection bias even when biased estimators of the mean, variance, and correlation coefficient of a truncated normal distribution are used or are not known with certainty. In addition, three actual datasets were used to evaluate the accuracy of the Adjusted method for estimating site-selection biases for various types of data that have different mean and sample-size values.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Empirical Bayes methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles