Dissertations / Theses: 'Statistical methods'

1

Bai, Yang, and 柏楊. "Statistical analysis for longitudinal data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009. http://hub.hku.hk/bib/B42841756.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Young, G. A. "Data-based statistical methods." Thesis, University of Cambridge, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.383307.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Bridges, M. "Statistical methods in cosmology." Thesis, University of Cambridge, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.596904.

Full text

Abstract:

We outline the application of a new method of evidence calculation called nested sampling Skilling (2004). We use a clustered ellipsoidal bound to restrict the parameter space sampled, that is generic enough to be used for even complex multimodal posteriors. We demonstrate our algorithms, COSMOCLUST makes important savings in computational time when compared with previous methods. The study of the primordial power spectrum, which seeded the structure formation observed in both the CMB and large scale structure, is crucial in unravelling early universe physics. In this thesis we analyse a number of spectral parameterisations based on both physical and observational grounds. Using the evidence we determine the most appropriate model in both WMAP 1 year and WMAP 3 year data (including additionally a selection of high resolution CMB and large scale structure data). We conclude that currently the evidence does suggest the need for a tilt in the spectrum, however the presence of running of the spectral index is dependent on the inclusion of, specifically Ly-α data. Bayesian analysis in cosmology is computationally demanding. We have succeeding in improving the efficiency of inference problems for a wide variety of cosmological applications by training neural networks to ‘learn’ how observables such as the CMB spectrum change with input cosmological parameters. We demonstrate that improvements in speed of several orders of magnitude are possible using our algorithm COSMONET.

APA, Harvard, Vancouver, ISO, and other styles

4

Okasha, Mahmoud Khaled Mohamed. "Statistical methods in dendrochronology." Thesis, University of Sheffield, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.295760.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Muncey, Harriet Jane. "Statistical methods in metabolomics." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24877.

Full text

Abstract:

Metabolomics lies at the fulcrum of the system biology 'omics'. Metabolic profiling offers researchers new insight into genetic and environmental interactions, responses to pathophysi- ological stimuli and novel biomarker discovery. Metabolomics lacks the simplicity of a single data capturing technique; instead, increasingly sophisticated multivariate statistical techniques are required to tease out useful metabolic features from various complex datasets. In this work, two major metabolomics methods are examined: Nuclear Magnetic Resonance (NMR) Spec- troscopy and Liquid Chromatography-Mass Spectrometry (LC-MS). MetAssimulo, an 1H-NMR metabolic-profile simulator, was developed in part by this author and is described in the Chap- ter 2. Peak positional variation is a phenomenon occurring in NMR spectra that complicates metabolomic analysis so Chapter 3 focuses on modelling the effect of pH on peak position. Analysis of LC-MS data is somewhat more complex given its 2-D structure, so I review existing pre-processing and feature detection techniques in Chapter 4 and then attempt to tackle the issue from a Bayesian viewpoint. A Bayesian Partition Model is developed to distinguish chro- matographic peaks representing useful features from chemical and instrumental interference and noise. Another of the LC-MS pre-processing problems, data binning, is also explored as part of H-MS: a pre-processing algorithm incorporating wavelet smoothing and novel Gaussian and Exponentially Modified Gaussian peak detection. The performance of H-MS is compared alongside two existing pre-processing packages: apLC-MS and XCMS.

APA, Harvard, Vancouver, ISO, and other styles

6

Kouba, Pavel. "Možnost zavedení a využívání metody SPC ve výrobě v organizaci s.n.o.p CZ, a.s." Master's thesis, Vysoká škola ekonomická v Praze, 2009. http://www.nusl.cz/ntk/nusl-16319.

Full text

Abstract:

The diploma paper is devoted to verification the application of SPC methods and performs evaluation of statistical stability and process eligibility of steel stampings in the real production process. In the second part is the author of the paper trying to design the optimal form of SPC methods for its use in a specified manufacturing process.

APA, Harvard, Vancouver, ISO, and other styles

7

Postler, Štěpán. "Statistická analýza ve webovém prostředí." Master's thesis, Vysoká škola ekonomická v Praze, 2013. http://www.nusl.cz/ntk/nusl-199226.

Full text

Abstract:

The aim of this thesis is creating a web application that allows dataset import and analyzing data with use of statistical methods. The application uses a user access that allows multiple number of persons manipulate with a single dataset, as well as interact with each other. Data is stored on a remote server and application is accessible from any computer that is connected to the Internet. The application is created in PHP programming language with use of MySQL database system, and user interface is built in HTML language with use of CSS styles. All parts of application are stored on an attached CD in form of text files. In addition to the web application, a part of the thesis is also a text output, which contains a theoretical part in form of description of the chosen statistical analysis methods, and a practical part containing list of application's functions, data model's description and demonstration of data analysis options on specific examples.

APA, Harvard, Vancouver, ISO, and other styles

8

Corrado, Charles J. "Nonparametric statistical methods in financial market research." Diss., The University of Arizona, 1988. http://hdl.handle.net/10150/184608.

Full text

Abstract:

This dissertation presents an exploration of the use of nonparametric statistical methods based on ranks for use in financial market research. Applications to event study methodology and the estimation of security systematic risk are analyzed using a simulation methodology with actual daily security return data. The results indicate that procedures based on ranks are more efficient than normal theory procedures currently in common use.

APA, Harvard, Vancouver, ISO, and other styles

9

Sezgin, Ozge. "Statistical Methods In Credit Rating." Master's thesis, METU, 2006. http://etd.lib.metu.edu.tr/upload/12607625/index.pdf.

Full text

Abstract:

Credit risk is one of the major risks banks and financial institutions are faced with. With the New Basel Capital Accord, banks and financial institutions have the opportunity to improve their risk management process by using Internal Rating Based (IRB) approach. In this thesis, we focused on the internal credit rating process. First, a short overview of credit scoring techniques and validation techniques was given. By using real data set obtained from a Turkish bank about manufacturing firms, default prediction logistic regression, probit regression, discriminant analysis and classification and regression trees models were built. To improve the performances of the models the optimum sample for logistic regression was selected from the data set and taken as the model construction sample. In addition, also an information on how to convert continuous variables to ordered scaled variables to avoid difference in scale problem was given. After the models were built the performances of models for whole data set including both in sample and out of sample were evaluated with validation techniques suggested by Basel Committee. In most cases classification and regression trees model dominates the other techniques. After credit scoring models were constructed and evaluated, cut-off values used to map probability of default obtained from logistic regression to rating classes were determined with dual objective optimization. The cut-off values that gave the maximum area under ROC curve and minimum mean square error of regression tree was taken as the optimum threshold after 1000 simulation. Keywords: Credit Rating, Classification and Regression Trees, ROC curve, Pietra Index

APA, Harvard, Vancouver, ISO, and other styles

10

Jung, Min Kyung. "Statistical methods for biological applications." [Bloomington, Ind.] : Indiana University, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3278454.

Full text

Abstract:

Thesis (Ph.D.)--Indiana University, Dept. of Mathematics, 2007.
Source: Dissertation Abstracts International, Volume: 68-10, Section: B, page: 6740. Adviser: Elizabeth A. Housworth. Title from dissertation home page (viewed May 20, 2008).

APA, Harvard, Vancouver, ISO, and other styles

11

Walls, Frederick George 1976. "Topic detection through statistical methods." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80244.

Full text

Abstract:

Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (p. 77-79).
by Frederick George Walls.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

12

Jones, Hywel Bowden. "Statistical methods for genome mapping." Thesis, University of Cambridge, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627287.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

De, Angelis Daniela. "Statistical methods in AIDS epidemiology." Thesis, University of Cambridge, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.614931.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Marshall, Emma Clare. "Statistical methods for institutional comparisons." Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.624324.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

ZHANG, GE. "STATISTICAL METHODS IN GENETIC ASSOCIATION." University of Cincinnati / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1196099744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Spring, Penny N. "Statistical methods in database marketing." Capelle a/d IJssel : Labyrint Publication, 2001. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=009880745&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Kumphakarm, Ratchaneewan. "Statistical methods for biodiversity assessment." Thesis, University of Kent, 2016. https://kar.kent.ac.uk/60557/.

Full text

Abstract:

This thesis focuses on statistical methods for estimating the number of species which is a natural index for measuring biodiversity. Both parametric and nonparametric approaches are investigated for this problem. Species abundance models including homogeneous and heterogeneous model are explored for species richness estimation. Two new improvements to the Chao estimator are developed using the Good-Turing coverage formula. Although the homogeneous abundance model is the simplest model, the species are collected with different probability in practice. This leads to overdispersed data, zero inflation and a heavy tail. The Poisson-Tweedie distribution, a mixed-Poisson distribution including many special cases such as the negative-binomial distribution, Poisson, Poisson inverse Gaussian, P\'lya-Aeppli and so on, is explored for estimating the number of species. The weighted linear regression estimator based on the ratio of successive frequencies is applied \add{to data generated from} the Poisson-Tweedie distribution. There may be a problem with sparse data which provides zero frequencies for species seen $i$ times. This leads to the weighted linear regression not working. Then, a smoothing technique is considered for improving the performance of the weighted linear regression estimator. Both simulated data and some real data sets are used to study the performance of parametric and nonparametric estimators in this thesis. Finally, the distribution of the number distinct species found in a sample is hard to compute. Many approximations including the Poisson, normal, COM-Poisson Binomial, Altham's multiplicative and additive-binomial and P\'{o}lya distribution are used for approximating the distribution of distinct species. Under various abundance models, Altham's multiplicative-binomial approximation performs well. Building on other recent work, the maximum likelihood and the maximum pseudo-likelihood estimators are applied with Altham's multiplicative-binomial approximation and compared with other estimators.

APA, Harvard, Vancouver, ISO, and other styles

18

Susto, Gian Antonio. "Statistical Methods for Semiconductor Manufacturing." Doctoral thesis, Università degli studi di Padova, 2013. http://hdl.handle.net/11577/3422625.

Full text

Abstract:

In this thesis techniques for non-parametric modeling, machine learning, filtering and prediction and run-to-run control for semiconductor manufacturing are described. In particular, algorithms have been developed for two major applications area: - Virtual Metrology (VM) systems; - Predictive Maintenance (PdM) systems. Both technologies have proliferated in the past recent years in the semiconductor industries, called fabs, in order to increment productivity and decrease costs. VM systems aim of predicting quantities on the wafer, the main and basic product of the semiconductor industry, that may be physically measurable or not. These quantities are usually ’costly’ to be measured in economic or temporal terms: the prediction is based on process variables and/or logistic information on the production that, instead, are always available and that can be used for modeling without further costs. PdM systems, on the other hand, aim at predicting when a maintenance action has to be performed. This approach to maintenance management, based like VM on statistical methods and on the availability of process/logistic data, is in contrast with other classical approaches: - Run-to-Failure (R2F), where there are no interventions performed on the machine/process until a new breaking or specification violation happens in the production; - Preventive Maintenance (PvM), where the maintenances are scheduled in advance based on temporal intervals or on production iterations. Both aforementioned approaches are not optimal, because they do not assure that breakings and wasting of wafers will not happen and, in the case of PvM, they may lead to unnecessary maintenances without completely exploiting the lifetime of the machine or of the process. The main goal of this thesis is to prove through several applications and feasibility studies that the use of statistical modeling algorithms and control systems can improve the efficiency, yield and profits of a manufacturing environment like the semiconductor one, where lots of data are recorded and can be employed to build mathematical models. We present several original contributions, both in the form of applications and methods. The introduction of this thesis will be an overview on the semiconductor fabrication process: the most common practices on Advanced Process Control (APC) systems and the major issues for engineers and statisticians working in this area will be presented. Furthermore we will illustrate the methods and mathematical models used in the applications. We will then discuss in details the following applications: - A VM system for the estimation of the thickness deposited on the wafer by the Chemical Vapor Deposition (CVD) process, that exploits Fault Detection and Classification (FDC) data is presented. In this tool a new clustering algorithm based on Information Theory (IT) elements have been proposed. In addition, the Least Angle Regression (LARS) algorithm has been applied for the first time to VM problems. - A new VM module for multi-step (CVD, Etching and Litography) line is proposed, where Multi-Task Learning techniques have been employed. - A new Machine Learning algorithm based on Kernel Methods for the estimation of scalar outputs from time series inputs is illustrated. - Run-to-Run control algorithms that employ both the presence of physical measures and statistical ones (coming from a VM system) is shown; this tool is based on IT elements. - A PdM module based on filtering and prediction techniques (Kalman Filter, Monte Carlo methods) is developed for the prediction of maintenance interventions in the Epitaxy process. - A PdM system based on Elastic Nets for the maintenance predictions in Ion Implantation tool is described. Several of the aforementioned works have been developed in collaborations with major European semiconductor companies in the framework of the European project UE FP7 IMPROVE (Implementing Manufacturing science solutions to increase equiPment pROductiVity and fab pErformance); such collaborations will be specified during the thesis, underlying the practical aspects of the implementation of the proposed technologies in a real industrial environment.
Nella tesi vengono descritte tecniche di identificazione non-parametrica di modelli, apprendimento automatico, filtraggio e predizione e controllo run-to-run con applicazione all’industria manifatturiera di semiconduttori. In particolare sono stati sviluppati algoritmi per due applicazioni principali: - sistemi di Virtual Metrology (VM), Metrologia Virtuale; - sistemi di Predictive Maintenance (PdM), Manutenzione Predittiva. Entrambe le tecnologie si stanno diffondendo nelle fabbriche di semiconduttori, dette fab, grazie al crescente bisogno di incrementare la produttività e diminuire i costi. I sistemi di VM hanno lo scopo di predire quantità, fisicamente misurabili o non, sul wafer, il principale prodotto dell’industria di semiconduttori. Le quantità predette sono solitamente ’costose’ da misurare, in termini economici o temporali: la predizione viene fatta a partire dalle variabili di processo e/o da informazioni logistiche sulla produzione che, contrariamente, sono sempre disponibili senza costi aggiuntivi per il loro utilizzo. I sistemi di PdM hanno invece lo scopo di predire quando un intervento manutentivo sarà necessario. Quest’approccio alla gestione delle manutenzioni, basato come la VM su metodi statistici e sulla disponibilità di dati di processo/logistici, si contrappone alle classiche filosofie: - Run-to-Failure (R2F), dove non si agisce sulla macchina/processo fintantochè non si verifica una rottura o una violazione delle specifiche di produzione; - Preventive Maintenance (PvM), Manutenzione Preventiva, dove le mantenzioni vengono pianificate in anticipo in base ad intervalli temporali o a cicli produttivi. Entrambi gli approcci sovraccitati non sono ottimali, in quanto non scongiurano rotture e sprechi di wafer e, nel caso della PvM, portano ad effettuare diverse manutenzioni non richieste o ad incrementare il numero di interventi non sfruttando a pieno il potenziale della macchina in esame o del processo. L’obbiettivo principale di questa tesi è quello di dimostrare, attraverso una serie di applicazioni e studi di fattibilità, come l’utilizzo di algoritmi di modellizzazione statistica e di controllo possano migliorare efficienza, produttività e guadagni di un ambiente manifatturiero, come quello dei semiconduttori, in cui si dispone di un ricco insieme di informazioni su processi/macchine che possono essere utilizzate per costruire modelli matematici. Nella tesi vengono presentati diversi contributi originali, sia in termini di applicazione che metodologici. Nella prima parte della tesi viene proposta una panoramica sull’industria di semiconduttori: saranno illustrate le pratiche più diffuse per quanto concerne i sistemi di Advanced Process Control (APC) e le sfide maggiori e più importanti per gli ingegneri e statistici che lavorano in questo settore. Successivamente verrà fornita una carrellata sui metodi e modelli matematici utilizzate nelle applicazioni. Più in dettaglio vengono discussi i seguenti argomenti: - Un sistema di VM per la stima dello spessore depositato dal processo di Chemical Vapor Deposition (CVD) sul wafer, a partire da dati di Fault Detection and Classification (FDC), dove è stato proposto un nuovo algoritmo di clustering basato su elementi di Information Theory (IT). Inoltre, l’algoritmo Least Angle Regression (LARS) è stato per la prima volta applicato in tale applicazione. - Un modulo di VM per una configuzione di multi-processo CVD, Etching e Litografia, dove sono state utilizzate tecniche di Multi-Task Learning. - Un nuovo algoritmo di Machine Learning basato su Kernel Methods per la stima di uscite scalari a partire da ingressi di tipo serie temporale. - Algoritmi di controllo Run-to-Run che sfruttano la presenza di misure statistiche provenienti da sistemi di VM basato su elementi di IT. - Applicazione di tecniche di predizione e filtraggio (filtro di Kalman, metodi Monte Carlo) per la predizione di interventi correttivi per il processo di Epitassia in un modulo PdM. - Sistema PdM basato su Elastic Net per la predizione di rotture in macchine di Ion Implanting. La ricerca che ha portato ai risultati sopra descritti è stata svolta per la maggior parte in collaborazione con importanti aziende di semiconduttori europee, nell’ambito del progetto UE FP7 IMPROVE (Implementing Manufacturing science solutions to increase equiPment pROductiVity and fab pErformance); tali collaborazioni saranno specificate nel corso di questa tesi, cercando di mettere in risalto anche gli aspetti pratici dell’implementazione in una realt´a industriale delle tecnologie descritte.

APA, Harvard, Vancouver, ISO, and other styles

19

Chinyamakobvu, Mutsa Carole. "Eliciting and combining expert opinion : an overview and comparison of methods." Thesis, Rhodes University, 2015. http://hdl.handle.net/10962/d1017827.

Full text

Abstract:

Decision makers have long relied on experts to inform their decision making. Expert judgment analysis is a way to elicit and combine the opinions of a group of experts to facilitate decision making. The use of expert judgment is most appropriate when there is a lack of data for obtaining reasonable statistical results. The experts are asked for advice by one or more decision makers who face a specific real decision problem. The decision makers are outside the group of experts and are jointly responsible and accountable for the decision and committed to finding solutions that everyone can live with. The emphasis is on the decision makers learning from the experts. The focus of this thesis is an overview and comparison of the various elicitation and combination methods available. These include the traditional committee method, the Delphi method, the paired comparisons method, the negative exponential model, Cooke’s classical model, the histogram technique, using the Dirichlet distribution in the case of a set of uncertain proportions which must sum to one, and the employment of overfitting. The supra Bayes approach, the determination of weights for the experts, and combining the opinions of experts where each opinion is associated with a confidence level that represents the expert’s conviction of his own judgment are also considered.

APA, Harvard, Vancouver, ISO, and other styles

20

Chung, Yuk-ka, and 鍾玉嘉. "On the evaluation and statistical analysis of forensic evidence in DNAmixtures." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011. http://hub.hku.hk/bib/B45983586.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Abdalmajid, Mohammed Babekir Elmalik. "An application of factor analysis on a 24-item scale on the attitudes towards AIDS precautions using Pearson, Spearman and Polychoric correlation matrices." Thesis, University of the Western Cape, 2006. http://etd.uwc.ac.za/index.php?module=etd&action=viewtitle&id=gen8Srv25Nme4_8765_1184324798.

Full text

Abstract:

The 24-item scale has been used extensively to assess the attitudes towards AIDS precautions. This study investigated the usefulness and validity of the instrument in a South African setting, fourteen years after the development of the instrument. If a new structure could be found statistically, the HIV/AIDS prevention strategies could be more effective in aiding campaigns to change attitudes and sexual behaviour.

APA, Harvard, Vancouver, ISO, and other styles

22

Ruedin, Laurent. "Statistical mechanical methods and continued fractions /." [S.l.] : [s.n.], 1994. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=10796.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Marco, Almagro Lluís. "Statistical methods in Kansei engineering studies." Doctoral thesis, Universitat Politècnica de Catalunya, 2011. http://hdl.handle.net/10803/85059.

Full text

Abstract:

Aquesta tesi doctoral tracta sobre Enginyeria Kansei (EK), una tècnica per traslladar emocions transmeses per productes en paràmetres tècnics, i sobre mètodes estadístics que poden beneficiar la disciplina. El propòsit bàsic de l'EK és descobrir de quina manera algunes propietats d'un producte transmeten certes emocions als seus usuaris. És un mètode quantitatiu, i les dades es recullen típicament fent servir qüestionaris. S'extreuen conclusions en analitzar les dades recollides, normalment usant algun tipus d'anàlisi de regressió. L'EK es pot situar en l'àrea de recerca del disseny emocional. La tesi comença justificant la importància del disseny emocional. Com que el rang de tècniques usades sota el nom d'EK és extens i no massa clar, la tesi proposa una definició d'EK que serveix per delimitar el seu abast. A continuació, es suggereix un model per desenvolupar estudis d'EK. El model inclou el desenvolupament de l'espai semàntic – el rang d'emocions que el producte pot transmetre – i l'espai de propietats – les variables tècniques que es poden modificar en la fase de disseny. Després de la recollida de dades, l'etapa de síntesi enllaça ambdós espais (descobreix com diferents propietats del producte transmeten certes emocions). Cada pas del model s'explica detalladament usant un estudi d'EK realitzat per aquesta tesi: l'experiment dels sucs de fruites. El model inicial es va millorant progressivament durant la tesi i les dades de l'experiment es van reanalitzant usant noves propostes.Moltes inquietuds pràctiques apareixen quan s'estudia el model per a estudis d'EK esmentat anteriorment (entre d'altres, quants participants són necessaris i com es desenvolupa la sessió de recollida de dades). S'ha realitzat una extensa revisió bibliogràfica amb l'objectiu de respondre aquestes i altres preguntes. Es descriuen també les aplicacions d'EK més habituals, juntament amb comentaris sobre idees particularment interessants de diferents articles. La revisió bibliogràfica serveix també per llistar quines són les eines més comunament utilitzades en la fase de síntesi.La part central de la tesi se centra precisament en les eines per a la fase de síntesi. Eines estadístiques com la teoria de quantificació tipus I o la regressió logística ordinal s'estudien amb detall, i es proposen diverses millores. En particular, es proposa una nova forma gràfica de representar els resultats d'una regressió logística ordinal. S'introdueix una tècnica d'aprenentatge automàtic, els conjunts difusos (rough sets), i s'inclou una discussió sobre la seva idoneïtat per a estudis d'EK. S'usen conjunts de dades simulades per avaluar el comportament de les eines estadístiques suggerides, la qual cosa dóna peu a proposar algunes recomanacions.Independentment de les eines d'anàlisi utilitzades en la fase de síntesi, les conclusions seran probablement errònies quan la matriu del disseny no és adequada. Es proposa un mètode per avaluar la idoneïtat de matrius de disseny basat en l'ús de dos nous indicadors: un índex d'ortogonalitat i un índex de confusió. S'estudia l'habitualment oblidat rol de les interaccions en els estudis d'EK i es proposa un mètode per incloure una interacció, juntament amb una forma gràfica de representar-la. Finalment, l'última part de la tesi es dedica a l'escassament tractat tema de la variabilitat en els estudis d'EK. Es proposen un mètode (basat en l'anàlisi clúster) per segmentar els participants segons les seves respostes emocionals i una forma d'ordenar els participants segons la seva coherència en valorar els productes (usant un coeficient de correlació intraclasse). Com que molts usuaris d'EK no són especialistes en la interpretació de sortides numèriques, s'inclouen representacions visuals per a aquests dos nous mètodes que faciliten el processament de les conclusions.
Esta tesis doctoral trata sobre Ingeniería Kansei (IK), una técnica para trasladar emociones transmitidas por productos en parámetros técnicos, y sobre métodos estadísticos que pueden beneficiar la disciplina. El propósito básico de la IK es descubrir de qué manera algunas propiedades de un producto transmiten ciertas emociones a sus usuarios. Es un método cuantitativo, y los datos se recogen típicamente usando cuestionarios. Se extraen conclusiones al analizar los datos recogidos, normalmente usando algún tipo de análisis de regresión.La IK se puede situar en el área de investigación del diseño emocional. La tesis empieza justificando la importancia del diseño emocional. Como que el rango de técnicas usadas bajo el nombre de IK es extenso y no demasiado claro, la tesis propone una definición de IK que sirve para delimitar su alcance. A continuación, se sugiere un modelo para desarrollar estudios de IK. El modelo incluye el desarrollo del espacio semántico – el rango de emociones que el producto puede transmitir – y el espacio de propiedades – las variables técnicas que se pueden modificar en la fase de diseño. Después de la recogida de datos, la etapa de síntesis enlaza ambos espacios (descubre cómo distintas propiedades del producto transmiten ciertas emociones). Cada paso del modelo se explica detalladamente usando un estudio de IK realizado para esta tesis: el experimento de los zumos de frutas. El modelo inicial se va mejorando progresivamente durante la tesis y los datos del experimento se reanalizan usando nuevas propuestas. Muchas inquietudes prácticas aparecen cuando se estudia el modelo para estudios de IK mencionado anteriormente (entre otras, cuántos participantes son necesarios y cómo se desarrolla la sesión de recogida de datos). Se ha realizado una extensa revisión bibliográfica con el objetivo de responder éstas y otras preguntas. Se describen también las aplicaciones de IK más habituales, junto con comentarios sobre ideas particularmente interesantes de distintos artículos. La revisión bibliográfica sirve también para listar cuáles son las herramientas más comúnmente utilizadas en la fase de síntesis. La parte central de la tesis se centra precisamente en las herramientas para la fase de síntesis. Herramientas estadísticas como la teoría de cuantificación tipo I o la regresión logística ordinal se estudian con detalle, y se proponen varias mejoras. En particular, se propone una nueva forma gráfica de representar los resultados de una regresión logística ordinal. Se introduce una técnica de aprendizaje automático, los conjuntos difusos (rough sets), y se incluye una discusión sobre su idoneidad para estudios de IK. Se usan conjuntos de datos simulados para evaluar el comportamiento de las herramientas estadísticas sugeridas, lo que da pie a proponer algunas recomendaciones. Independientemente de las herramientas de análisis utilizadas en la fase de síntesis, las conclusiones serán probablemente erróneas cuando la matriz del diseño no es adecuada. Se propone un método para evaluar la idoneidad de matrices de diseño basado en el uso de dos nuevos indicadores: un índice de ortogonalidad y un índice de confusión. Se estudia el habitualmente olvidado rol de las interacciones en los estudios de IK y se propone un método para incluir una interacción, juntamente con una forma gráfica de representarla. Finalmente, la última parte de la tesis se dedica al escasamente tratado tema de la variabilidad en los estudios de IK. Se proponen un método (basado en el análisis clúster) para segmentar los participantes según sus respuestas emocionales y una forma de ordenar los participantes según su coherencia al valorar los productos (usando un coeficiente de correlación intraclase). Puesto que muchos usuarios de IK no son especialistas en la interpretación de salidas numéricas, se incluyen representaciones visuales para estos dos nuevos métodos que facilitan el procesamiento de las conclusiones.
This PhD thesis deals with Kansei Engineering (KE), a technique for translating emotions elicited by products into technical parameters, and statistical methods that can benefit the discipline. The basic purpose of KE is discovering in which way some properties of a product convey certain emotions in its users. It is a quantitative method, and data are typically collected using questionnaires. Conclusions are reached when analyzing the collected data, normally using some kind of regression analysis. Kansei Engineering can be placed under the more general area of research of emotional design. The thesis starts justifying the importance of emotional design. As the range of techniques used under the name of Kansei Engineering is rather vast and not very clear, the thesis develops a detailed definition of KE that serves the purpose of delimiting its scope. A model for conducting KE studies is then suggested. The model includes spanning the semantic space – the whole range of emotions the product can elicit – and the space of properties – the technical variables that can be modified in the design phase. After the data collection, the synthesis phase links both spaces; that is, discovers how several properties of the product elicit certain emotions. Each step of the model is explained in detail using a KE study specially performed for this thesis: the fruit juice experiment. The initial model is progressively improved during the thesis and data from the experiment are reanalyzed using the new proposals. Many practical concerns arise when looking at the above mentioned model for KE studies (among many others, how many participants are used and how the data collection session is conducted). An extensive literature review is done with the aim of answering these and other questions. The most common applications of KE are also depicted, together with comments on particular interesting ideas from several papers. The literature review also serves to list which are the most common tools used in the synthesis phase. The central part of the thesis focuses precisely in tools for the synthesis phase. Statistical tools such as quantification theory type I and ordinal logistic regression are studied in detail, and several improvements are suggested. In particular, a new graphical way to represent results from an ordinal logistic regression is proposed. An automatic learning technique, rough sets, is introduced and a discussion is included on its adequacy for KE studies. Several sets of simulated data are used to assess the behavior of the suggested statistical techniques, leading to some useful recommendations. No matter the analysis tools used in the synthesis phase, conclusions are likely to be flawed when the design matrix is not appropriate. A method to evaluate the suitability of design matrices used in KE studies is proposed, based on the use of two new indicators: an orthogonality index and a confusion index. The commonly forgotten role of interactions in KE studies is studied and a method to include an interaction in KE studies is suggested, together with a way to represent it graphically. Finally, the untreated topic of variability in KE studies is tackled in the last part of the thesis. A method (based in cluster analysis) for finding segments among subjects according to their emotional responses and a way to rank subjects based on their coherence when rating products (using an intraclass correlation coefficient) are proposed. As many users of Kansei Engineering are not specialists in the interpretation of the numerical output from statistical techniques, visual representations for these two new proposals are included to aid understanding.

APA, Harvard, Vancouver, ISO, and other styles

24

Lau, Ho-yin Eric. "Statistical methods for analyzing epidemiological data." Click to view the E-thesis via HKUTO, 2005. http://sunzi.lib.hku.hk/hkuto/record/B34829969.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Aston, John Alexander David. "Statistical methods for functional neuroimaging data." Thesis, Imperial College London, 2002. http://hdl.handle.net/10044/1/7185.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Liu, Yang. "Statistical methods for big tracking data." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/60916.

Full text

Abstract:

Recent advances in technology have led to large sets of tracking data, which brings new challenges in statistical modeling and prediction. Built on recent developments in Gaussian process modeling for spatio--temporal data and stochastic differential equations (SDEs), we develop a sequence of new models and corresponding inferential methods to meet these challenges. We first propose Bayesian Melding (BM) and downscaling frameworks to combine observations from different sources. To use BM for big tracking data, we exploit the properties of the processes along with approximations to the likelihood to break a high dimensional problem into a series of lower dimensional problems. To implement the downscaling approach, we apply the integrated nested Laplace approximation (INLA) to fit a linear mixed effect model that connects the two sources of observations. We apply these two approaches in a case study involving the tracking of marine mammals. Both of our frameworks have superior predictive performance compared with traditional approaches in both cross--validation and simulation studies. We further develop the BM frameworks with stochastic processes that can reflect the time varying features of the tracks. We first develop a conditional heterogeneous Gaussian Process (CHGP) but certain properties of this process make it extremely difficult to perform model selection. We also propose a linear SDE with splines as its coefficients, which we refer to as a generalized Ornstein-Ulhenbeck (GOU) process. The GOU achieves flexible modeling of the tracks in both mean and covariance with a reasonably parsimonious parameterization. Inference and prediction for this process can be computed via the Kalman filter and smoother. BM with the GOU achieves a smaller prediction error and better credibility intervals in cross validation comparisons to the basic BM and downscaling models. Following the success with the GOU, we further study a special class of SDEs called the potential field (PF) models, which formulates the drift term as the gradient of another function. We apply the PF approach to modeling of tracks of marine mammals as well as basketball players, and demonstrate its potential in learning, visualizing, and interpreting the trends in the paths.
Science, Faculty of
Statistics, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

27

Lo, Chi Ho. "Statistical methods for high throughput genomics." Thesis, University of British Columbia, 2009. http://hdl.handle.net/2429/13762.

Full text

Abstract:

The advancement of biotechnologies has led to indispensable high-throughput techniques for biological and medical research. Microarray is applied to monitor the expression levels of thousands of genes simultaneously, while flow cytometry (FCM) offers rapid quantification of multi-parametric properties for millions of cells. In this thesis, we develop approaches based on mixture modeling to deal with the statistical issues arising from both high-throughput biological data sources. Inference about differential expression is a typical objective in analysis of gene expression data. The use of Bayesian hierarchical gamma-gamma and lognormal-normal models is popular for this type of problem. Some unrealistic assumptions, however, have been made in these frameworks. In view of this, we propose flexible forms of mixture models based on an empirical Bayes approach to extend both frameworks so as to release the unrealistic assumptions, and develop EM-type algorithms for parameter estimation. The extended frameworks have been shown to significantly reduce the false positive rate whilst maintaining a high sensitivity, and are more robust to model misspecification. FCM analysis currently relies on the sequential application of a series of manually defined 1D or 2D data filters to identify cell populations of interest. This process is time-consuming and ignores the high-dimensionality of FCM data. We reframe this as a clustering problem, and propose a robust model-based clustering approach based on t mixture models with the Box-Cox transformation for identifying cell populations. We describe an EM algorithm to simultaneously handle parameter estimation along with transformation selection and outlier identification, issues of mutual influence. Empirical studies have shown that this approach is well adapted to FCM data, in which a high abundance of outliers and asymmetric cell populations are frequently observed. Finally, in recognition of concern for an efficient automated FCM analysis platform, we have developed an R package called flowClust to automate the gating analysis with the proposed methodology. Focus during package development has been put on the computational efficiency and convenience of use at users' end. The package offers a wealth of tools to summarize and visualize features of the clustering results, and is well integrated with other FCM packages.

APA, Harvard, Vancouver, ISO, and other styles

28

Shimakura, Silvia Emiko. "Statistical methods for spatial survival data." Thesis, Lancaster University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.418824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Huang, Liping. "STATISTICAL METHODS IN MICROARRAY DATA ANALYSIS." UKnowledge, 2009. http://uknowledge.uky.edu/gradschool_diss/795.

Full text

Abstract:

This dissertation includes three topics. First topic: Regularized estimation in the AFT model with high dimensional covariates. Second topic: A novel application of quantile regression for identification of biomarkers exemplified by equine cartilage microarray data. Third topic: Normalization and analysis of cDNA microarray using linear contrasts.

APA, Harvard, Vancouver, ISO, and other styles

30

Lau, Ho-yin Eric, and 劉浩然. "Statistical methods for analyzing epidemiological data." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B34829969.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Valeri, Linda. "Statistical Methods for Causal Mediation Analysis." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10690.

Full text

Abstract:

Mediation analysis is a popular approach in the social an biomedical sciences to examine the extent to which the effect of an exposure on an outcome is through an intermediate variable (mediator) and the extent to which the effect is direct. We first develop statistical methods and software for the estimation of direct and indirect causal effects in generalized linear models when exposure-mediator interaction may be present. We then study the bias of direct and indirect effects estimators that arise in this context when a continuous mediator is measured with error or a binary mediator is misclassified. We develop methods of correction for measurement error and misclassification coupled with sensitivity analyses for which no auxiliary information on the mediator measured with error is needed. The proposed methods are applied to a lung cancer study to evaluate the effect of genetic variants mediated through smoking on lung cancer risk and to a perinatal epidemiological study on the determinants of preterm birth.

APA, Harvard, Vancouver, ISO, and other styles

32

Miller, Elizabeth Caitlin. "Tracking Atlantic Hurricanes Using Statistical Methods." Scholar Commons, 2013. http://scholarcommons.usf.edu/etd/4730.

Full text

Abstract:

Creating an accurate hurricane location forecasting model is of the utmost importance because of the safety measures that need to occur in the days and hours leading up to a storm's landfall. Hurricanes can be incredibly deadly and costly, but if people are given adequate warning, many lives can be spared. This thesis seeks to develop an accurate model for predicting storm location based on previous location, previous wind speed, and previous pressure. The models are developed using hurricane data from 1980-2009.

APA, Harvard, Vancouver, ISO, and other styles

33

Er, Fikret. "Robust methods in statistical shape analysis." Thesis, University of Leeds, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.342394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Baharith, Lamya Abdulbasit. "Statistical methods for cytotoxic assays data." Thesis, Edinburgh Napier University, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.429827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Golya, David Andrew. "Statistical methods for maxima and means." Thesis, University of Sheffield, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.389758.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Lunt, Mark. "Statistical methods of detecting vertebral fractures." Thesis, University of Liverpool, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275052.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Alshahrani, Mohammed Nasser D. "Statistical methods for rare variant association." Thesis, University of Leeds, 2018. http://etheses.whiterose.ac.uk/22436/.

Full text

Abstract:

Deoxyribonucleic acid (DNA) sequencing allows researchers to conduct more complete assessments of low-frequency and rare genetic variants. In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating associations between complex traits and rare variants (RVs). In contrast to association studies of common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests that analyze RVs, most of which are based on the idea of pooling/collapsing RVs. Genome-wide association studies (GWAS) based on common SNPs gained more attention in the last few years and have been regularly used to examine complex genetic compositions of diseases and quantitative traits. GWASs have not discovered everything associated with diseases and genetic variations. However, recent empirical evidence has demonstrated that low-frequency and rare variants are, in fact, connected to complex diseases. This thesis will focus on the study of rare variant association. Aggregation tests, where multiple rare variants are analyzed jointly, have incorporated weighting schemes on variants. However, their power is very much dependent on the weighting scheme. I will address three topics in this thesis: the definition of rare variants and their call file (VCF) and a description of the methods that have been used in rare variant analysis. Finally, I will illustrate challenges involved in the analysis of rare variants and propose different weighting schemes for them. Therefore, since the efficiency of rare variant studies might be considerably improved by the application of an appropriate weighting scheme, choosing the proper weighting scheme is the topic of the thesis. In the following chapters, I will propose different weighting schemes, where weights are applied at the level of the variant, the individual or the cell (i.e. the individual genotype call), as well as a weighting scheme that can incorporate quality measures for variants (i.e., a quality score for variant calls) and cells (i.e., genotype quality).

APA, Harvard, Vancouver, ISO, and other styles

38

Fasiolo, Matteo. "Statistical methods for complex population dynamics." Thesis, University of Bath, 2016. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Maas, Luis C. (Luis Carlos). "Statistical methods in ultrasonic tissue characterization." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/36456.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (p. 88-93).
by Luis Carlos Maas III.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

40

Tucker, George Jay. "Statistical methods to infer biological interactions." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/89874.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mathematics, 2014.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
169
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 153-170).
Biological systems are extremely complex, and our ability to experimentally measure interactions in these systems is limited by inherent noise. Technological advances have allowed us to collect unprecedented amounts of raw data, increasing the need for computational methods to disentangle true interactions from noise. In this thesis, we focus on statistical methods to infer two classes of important biological interactions: protein-protein interactions and the link between genotypes and phenotypes. In the first part of the thesis, we introduce methods to infer protein-protein interactions from affinity purification mass spectrometry (AP-MS) and from luminescence-based mammalian interactome mapping (LUMIER). Our work reveals novel context dependent interactions in the MAPK signaling pathway and insights into the protein homeostasis machinery. In the second part, we focus on methods to understand the link between genotypes and phenotypes. First, we characterize the effects of related individuals on standard association statistics for genome-wide association studies (GWAS) and introduce a new statistic that corrects for relatedness. Then, we introduce a statistically powerful association testing framework that corrects for confounding from population structure in large scale GWAS. Lastly, we investigate regularized regression for phenotype prediction from genetic data.
by George Jay Tucker.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

41

Molaro, Mark Christopher. "Computational statistical methods in chemical engineering." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/111286.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Chemical Engineering, 2016.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 175-182).
Recent advances in theory and practice, have introduced a wide variety of tools from machine learning that can be applied to data intensive chemical engineering problems. This thesis covers applications of statistical learning spanning a range of relative importance of data versus existing detailed theory. In each application, the quantity and quality of data available from experimental systems are used in conjunction with an understanding of the theoretical physical laws governing system behavior to the extent they are available. A detailed generative parametric model for optical spectra of multicomponent mixtures is introduced. The application of interest is the quantification of uncertainty associated with estimating the relative abundance of mixtures of carbon nanotubes in solution. This work describes a detailed analysis of sources of uncertainty in estimation of relative abundance of chemical species in solution from optical spectroscopy. In particular, the quantification of uncertainty in mixtures with parametric uncertainty in pure component spectra is addressed. Markov Chain Monte Carlo methods are utilized to quantify uncertainty in these situations and the inaccuracy and potential for error in simpler methods is demonstrated. Strategies to improve estimation accuracy and reduce uncertainty in practical experimental situations are developed including when multiple measurements are available and with sequential data. The utilization of computational Bayesian inference in chemometric problems shows great promise in a wide variety of practical experimental applications. A related deconvolution problem is addressed in which a detailed physical model is not available, but the objective of analysis is to map from a measured vector valued signal to a sum of an unknown number of discrete contributions. The data analyzed in this application is electrical signals generated from a free surface electro-spinning apparatus. In this information poor system, MAP estimation is used to reduce the variance in estimates of the physical parameters of interest. The formulation of the estimation problem in a probabilistic context allows for the introduction of prior knowledge to compensate for a high dimensional ill-conditioned inverse problem. The estimates from this work are used to develop a productivity model expanding on previous work and showing how the uncertainty from estimation impacts system understanding. A new machine learning based method for monitoring for anomalous behavior in production oil wells is reported. The method entails a transformation of the available time series of measurements into a high-dimensional feature space representation. This transformation yields results which can be treated as static independent measurements. A new method for feature selection in one-class classification problems is developed based on approximate knowledge of the state of the system. An extension of features space transformation methods on time series data is introduced to handle multivariate data in large computationally burdensome domains by using sparse feature extraction methods. As a whole these projects demonstrate the application of modern statistical modeling methods, to achieve superior results in data driven chemical engineering challenges.
by Mark Christopher Molaro.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

42

Thomson, Blaise Roger Marie. "Statistical methods for spoken dialogue management." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.609054.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Allchin, Lorraine Doreen May. "Statistical methods for mapping complex traits." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:65f392ba-1b64-4b00-8871-7cee98809ce1.

Full text

Abstract:

The first section of this thesis addresses the problem of simultaneously identifying multiple loci that are associated with a trait, using a Bayesian Markov Chain Monte Carlo method. It is applicable to both case/control and quantitative data. I present simulations comparing the methods to standard frequentist methods in human case/control and mouse QTL datasets, and show that in the case/control simulations the standard frequentist method out performs my model for all but the highest effect simulations and that for the mouse QTL simulations my method performs as well as the frequentist method in some cases and worse in others. I also present analysis of real data and simulations applying my method to a simulated epistasis data set. The next section was inspired by the challenges involved in applying a Markov Chain Monte Carlo method to genetic data. It is an investigation into the performance and benefits of the Matlab parallel computing toolbox, specifically its implementation of the Cuda programing language to Matlab's higher level language. Cuda is a language which allows computational calculations to be carried out on the computer's graphics processing unit (GPU) rather than its central processing unit (CPU). The appeal of this tool box is its ease of use as few code adaptions are needed. The final project of this thesis was to develop an HMM for reconstructing the founders of sparsely sequenced inbred populations. The motivation here, that whilst sequencing costs are rapidly decreasing, it is still prohibitively expensive to fully sequence a large number of individuals. It was proposed that, for populations descended from a known number of founders, it would be possible to sequence these individuals with a very low coverage, use a hidden Markov model (HMM) to represent the chromosomes as mosaics of the founders, then use these states to impute the missing data. For this I developed a Viterbi algorithm with a transition probability matrix based on recombination rate which changes for each observed state.

APA, Harvard, Vancouver, ISO, and other styles

44

Kacprzak, T. "Statistical methods in weak gravitational lensing." Thesis, University College London (University of London), 2015. http://discovery.ucl.ac.uk/1462150/.

Full text

Abstract:

This thesis studies several topics in the area of weak gravitational lensing and addresses some key statistical problems within this subject. A large part of the thesis concerns the measurement of galaxy shapes for weak gravitational lensing and the systematics they introduce. I focused on studying two key effects, typical for model-fitting shape measurement methods. First is noise bias, which arises due to pixel noise on astronomical images. I measure noise bias as a function of key galaxy and image parameters and found that the results are in good agreement with theoretical predictions. I found that if the statistical power of a survey is to be fully utilised, noise bias effects have to be calibrated. The second effect is called model bias, which stems from using simple models to fit galaxy images, which can have more complicated morphologies. I also investigate the interaction of these two systematics. I found model bias to be small for ground-based surveys, rarely exceeding 1%. Its interaction with noise bias was found to be negligible. These results suggest that for ongoing weak lensing surveys, noise bias is the dominant effect. Chapter 5 describes my search for a weak lensing signal from dark matter filaments in CFHTLenS fields. It presents a novel, model-fitting approach to modelling the mass dis- tribution and combining measurements from multiple filaments. We find that CFHTLenS data does provide very good evidence for dark matter filaments, with detection significance of 3.9σ for the filament density parameter relative to mean halo density of connected halos at their R200. For 19 pairs of the most massive halos, the integrated density contrast of filaments was found on a level of 1 · 1013M⊙/h. The appendices present my contribution to three other papers. They describe practical applications of the calibration of noise bias in the GREAT08 challenge and the Dark Energy Survey. I also present the results of the validation of reconvolution and image rendering using FFTs in the GalSim toolkit.

APA, Harvard, Vancouver, ISO, and other styles

45

Thorpe, Matthew. "Variational methods for geometric statistical inference." Thesis, University of Warwick, 2015. http://wrap.warwick.ac.uk/74241/.

Full text

Abstract:

Estimating multiple geometric shapes such as tracks or surfaces creates significant mathematical challenges particularly in the presence of unknown data association. In particular, problems of this type have two major challenges. The first is typically the object of interest is infinite dimensional whilst data is finite dimensional. As a result the inverse problem is ill-posed without regularization. The second is the data association makes the likelihood function highly oscillatory. The focus of this thesis is on techniques to validate approaches to estimating problems in geometric statistical inference. We use convergence of the large data limit as an indicator of robustness of the methodology. One particular advantage of our approach is that we can prove convergence under modest conditions on the data generating process. This allows one to apply the theory where very little is known about the data. This indicates a robustness in applications to real world problems. The results of this thesis therefore concern the asymptotics for a selection of statistical inference problems. We construct our estimates as the minimizer of an appropriate functional and look at what happens in the large data limit. In each case we will show our estimates converge to a minimizer of a limiting functional. In certain cases we also add rates of convergence. The emphasis is on problems which contain a data association or classification component. More precisely we study a generalized version of the k-means method which is suitable for estimating multiple trajectories from unlabeled data which combines data association with spline smoothing. Another problem considered is a graphical approach to estimating the labeling of data points. Our approach uses minimizers of the Ginzburg-Landau functional on a suitably defined graph. In order to study these problems we use variational techniques and in particular I-convergence. This is the natural framework to use for studying sequences of minimization problems. A key advantage of this approach is that it allows us to deal with infinite dimensional and highly oscillatory functionals.

APA, Harvard, Vancouver, ISO, and other styles

46

Ruan, Da. "Statistical methods for comparing labelled graphs." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/24963.

Full text

Abstract:

Due to the availability of the vast amount of graph-structured data generated in various experiment settings (e.g., biological processes, social connections), the need to rapidly identify network structural differences is becoming increasingly prevalent. In many fields, such as bioinformatics, social network analysis and neuroscience, graphs estimated from the same experimental settings are always defined on a fixed set of objects. We formalize such a problem as a labelled graph comparison problem. The main issue in this area, i.e. measuring the distance between graphs, has been extensively studied over the past few decades. Although a large distance value constitutes evidence of difference between graphs, we are more interested in the issue of inferentially justifying whether a distance value as large or larger than the observed distance could have been obtained simply by chance. However, little work has been done to provide the procedures of statistical inference necessary to formally answer this question. Permutation-based inference has been proposed as a theoretically sound approach and a natural way of tackling such a problem. However, the common permutation procedure is computationally expensive, especially for large graphs. This thesis contributes to the labelled graph comparison problem by addressing three different topics. Firstly, we analyse two labelled graphs by inferentially justifying their independence. A permutation-based testing procedure based on Generalized Hamming Distance (GHD) is proposed. We show rigorously that the permutation distribution is approximately normal for a large network, under three graph models with two different types of edge weights. The statistical significance can be evaluated without the need to resort to computationally expensive permutation procedures. Numerical results suggest the validity of this approximation. With the Topological Overlap edge weight, we suggest that the GHD test is a more powerful test to identify network differences. Secondly, we tackle the problem of comparing two large complex networks in which only localized topological differences are assumed. By applying the normal approximation for the GHD test, we propose an algorithm that can effectively detect localised changes in the network structure from two large complex networks. This algorithm is quickly and easily implemented. Simulations and applications suggest that it is a useful tool to detect subtle differences in complex network structures. Finally, we address the problem of comparing multiple graphs. For this topic, we analyse two different problems that can be interpreted as corresponding to two distinct null hypotheses: (i) a set of graphs are mutually independent; (ii) graphs in one set are independent of graphs in another set. Applications for the multiple graphs problem are commonly found in social network analysis (i) or neuroscience (ii). However, little work has been done to inferentially address the problem of comparing multiple networks. We propose two different statistical testing procedures for (i) and (ii), by again using a normality approximation for GHD. We extend the normality of GHD for the two graphs case to multiple cases, for hypotheses (i) and (ii), with two different permutation strategies. We further build a link between the test of group independence to an existing method, namely the Multivariate Exponential Random Graph Permutation model (MERGP). We show that by applying asymptotic normality, the maximum likelihood estimate of MERGP can be analytically derived. Therefore, the original, computationally expensive, inferential procedure of MERGP can be abandoned.

APA, Harvard, Vancouver, ISO, and other styles

47

Al-Kenani, Ali J. Kadhim. "Some statistical methods for dimension reduction." Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/7727.

Full text

Abstract:

The aim of the work in this thesis is to carry out dimension reduction (DR) for high dimensional (HD) data by using statistical methods for variable selection, feature extraction and a combination of the two. In Chapter 2, the DR is carried out through robust feature extraction. Robust canonical correlation (RCCA) methods have been proposed. In the correlation matrix of canonical correlation analysis (CCA), we suggest that the Pearson correlation should be substituted by robust correlation measures in order to obtain robust correlation matrices. These matrices have been employed for producing RCCA. Moreover, the classical covariance matrix has been substituted by robust estimators for multivariate location and dispersion in order to get RCCA. In Chapter 3 and 4, the DR is carried out by combining the ideas of variable selection using regularisation methods with feature extraction, through the minimum average variance estimator (MAVE) and single index quantile regression (SIQ) methods, respectively. In particular, we extend the sparse MAVE (SMAVE) reported in (Wang and Yin, 2008) by combining the MAVE loss function with different regularisation penalties in Chapter 3. An extension of the SIQ of Wu et al. (2010) by considering different regularisation penalties is proposed in Chapter 4. In Chapter 5, the DR is done through variable selection under Bayesian framework. A flexible Bayesian framework for regularisation in quantile regression (QR) model has been proposed. This work is different from Bayesian Lasso quantile regression (BLQR), employing the asymmetric Laplace error distribution (ALD). The error distribution is assumed to be an infinite mixture of Gaussian (IMG) densities.

APA, Harvard, Vancouver, ISO, and other styles

48

Shar, Nisar Ahmed. "Statistical methods for predicting genetic regulation." Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/16729/.

Full text

Abstract:

Transcriptional regulation of gene expression is essential for cellular differentiation and function, and defects in the process are associated with cancer. Transcription is regulated by the cis-acting regulatory regions and trans-acting regulatory elements. Transcription factors bind on enhancers and repressors and form complexes by interacting with each other to control the expression of the genes. Understanding the regulation of genes would help us to understand the biological system and can be helpful in identifying therapeutic targets for diseases such as cancer. The ENCODE project has mapped binding sites of many TFs in some important cell types and this project also has mapped DNase I hypersensitivity sites across the cell types. Predicting transcription factors mutual interactions would help us in finding the potential transcription regulatory networks. Here, we have developed two methods for prediction of transcription factors mutual interactions from ENCODE ChIP-seq data, and both methods generated similar results which tell us about the accuracy of the methods. It is known that functional regions of genome are conserved and here we identified that shared/overlapping transcription factor binding sites in multiple cell types and in transcription factors pairs are more conserved than their respective non-shared/non-overlapping binding sites. It has been also studied that co-binding sites influence the expression level of genes. Most of the genes mapped to the transcription factor co-binding sites have significantly higher level of expression than those genes which were mapped to the single transcription factor bound sites. The ENCODE data suggests a very large number of potential regulatory sites across the complete genome in many cell types and methods are needed to identify those that are most relevant and to connect them to the genes that they control. A penalized regression method, LASSO was used to build correlative models, and choose two regulatory regions that are predictive of gene expression, and link them to their respective gene. Here, we show that our identified regulatory regions accumulate significant number of somatic mutations that occur in cancer cells, suggesting that their effects may drive cancer initiation and development. Harboring of somatic mutations in these identified regulatory regions is an indication of positive selection, which has been also observed in cancer related genes.

APA, Harvard, Vancouver, ISO, and other styles

49

Doan, Thi Ngoc Canh. "Statistical Methods for Digital Image Forensics." Thesis, Troyes, 2018. http://www.theses.fr/2018TROY0036.

Full text

Abstract:

L’explosion de la technologie d’imagerie numérique s’est considérablement accrue, posant d’énormes problèmes pour la sécurité de l’information. Grâce à des outils d'édition d'images à faible coût, l'omniprésence des images falsifiées est devenue une réalité incontournable. Cette situation souligne la nécessité d'étendre les recherches actuelles dans le domaine de la criminalistique numérique afin de restaurer la confiance dans les images numériques. Deux problèmes importants sont abordés dans cette thèse: l’estimation du facteur de qualité d’une image JPEG et la détection de la falsification des images numériques. Ces travaux s’inscrivent dans le cadre de la théorie des tests d’hypothèse et proposent la construction de détecteurs permettant de respecter une contrainte sur la probabilité de fausse alarme. Afin d’atteindre une performance de détection élevée, il est proposé d’exploiter un modèle statistique des images naturelles. Ce modèle est construit à partir du processus de formation des images. Des expériences numériques sur des images simulées et réelles ont mis en évidence la pertinence de l'approche proposée
Digital imaging technology explosion has grown significantly posing tremendous security concerns to information security. Under the support of low-cost image editing tools, the ubiquity of tampered images has become an unavoidable reality. This situation highlights the need to improve and extend the current research in the field of digital forensics to restore the trust of digital images. Since each stage of the image history leaves a specific trace on the data, we propose to extract the digital fingerprint as evidence of tampering. Two important problems are addressed in this thesis: quality factor estimation for a given JPEG image and image forgery authentication. For the first problem, a likelihood ratio has been constructed relied on a spatial domain model of the variance of 8 × 8 blocks of JPEG images. In the second part of thesis, the robust forensic detectors have been designed for different types of tampering in the framework of the hypothesis testing theory based on a parametric model that characterizes statistical properties of natural images. The construction of this model is performed by studying the image processing pipeline of a digital camera. The statistical estimation of unknown parameters is employed, leading to application of these tests in practice. This approach allows the design of the most powerful test capable of warranting a prescribed false alarm probability while ensuring a high detection performance. Numerical experiments on simulated and real images have highlighted the relevance of the proposed approach

APA, Harvard, Vancouver, ISO, and other styles

50

Huang, Zijian. "Statistical methods for blood pressure prediction." HKBU Institutional Repository, 2020. https://repository.hkbu.edu.hk/etd_oa/801.

Full text

Abstract:

Blood pressure is one of the most important indicators of human health. The symptoms of many cardiovascular diseases like stroke, atrial fibrillation, and acute myocardial infarction are usually indicated by the abnormal variation of blood pressure. Severe symptoms of diseases like coronary syndrome, rheumatic heart disease, arterial aneurysm, and endocarditis also usually appear along with the variation of blood pressure. Most of the current blood pressure measurements rely on the Korotkoff sounds method that focuses on one-time blood pressure measuring but cannot supervise blood pressure continuously, which cannot effectively detect diseases or alert patients. Previous researches indicating the relationship between photoplethysmogram (PPG) signal and blood pressure brought up the new research direction of blood pressure measurement method. Ideally, with the continuous supervision of the PPG signal, the blood pressure of the subject could be measured longitudinally, which matches the current requirements of blood pressure measurement better as an indicator of human health. However, the relationship between blood pressure and PPG signal is very comprehensive that is related to personal and environmental status, which leads to the research challenge for many previous works that tried to find the mapping from PPG signal to blood pressure without considering other factors. In this thesis, we propose two statistical methods modeling the comprehensive relationships among blood pressure, PPG signals, and other factors for blood pressure prediction. We also express the modeling and predicting process for the real data set and provide accurate prediction results that achieve the international blood pressure measurement standard. In the first part, we propose the Independent Variance Components Mixed- model (IVCM) that introduces the variance components to describe the relationship among observations. The relationship indicators are collected as information to divide observations into different groups. The latent impacts from the properties of groups are estimated and used for predicting the multiple responses. The Stochastic Approximation Minorization-maximization (SAM) algorithm is used for IVCM model parameter estimation. As the expansion of Minorization-maximization (MM) algorithm, the SAM algorithm could provide comparable-level estimations as MM algorithm but with faster computing speed and less computational cost. We also provide the subsampling prediction method for IVCM model prediction that could predict multiple responses variables with the conditional expectation of the model random effects. The prediction speed of the subsampling method is as fast as the SAM algorithm for parameter estimation with very small accuracy loss. Because the SAM algorithm and subsampling prediction method requires assigning tuning parameters, a great amount of simulation results are provided for the tuning parameter selection. In the second part, we propose the Groupwise Reweighted Mixed-model (GRM) to describe the variation of random effects as well as the potential components of mixture distributions. In the model, we combine the properties of mixed-model and mixture model for modeling the comprehensive relationship among observations as well as between the predictive variables and the response variables. We bring up the Groupwise Expectation Minorization-maximization (GEM) algorithm for the model parameter estimation. Developed from MM algorithm and Expectation Maximization (EM) algorithm, the algorithm estimates parameters fast and accurate with adopting the properties of the diagonal blocked matrix. The corresponding prediction method for GRM model is provided as well as the simulations for the number of components selection. In the third part, we apply the IVCM model and the GRM model in modeling real data and predicting blood pressure. We establish the database for modeling blood pressure with PPG signals and personal characteristics, extract PPG features from PPG signal waves, and analyze the comprehensive relationship between PPG signal and blood pressure with the IVCM model and the GRM model. The blood pressure prediction results from different models are provided and compared. The best prediction results not only achieve the international blood pressure measurement standard but also show great performance in high blood pressure prediction

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Statistical methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles