Dissertations / Theses: 'Mixture Markov Model'

1

Frühwirth-Schnatter, Sylvia. "Model Likelihoods and Bayes Factors for Switching and Mixture Models." SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, 2002. http://epub.wu.ac.at/474/1/document.pdf.

Full text

Abstract:

In the present paper we discuss the problem of estimating model likelihoods from the MCMC output for a general mixture and switching model. Estimation is based on the method of bridge sampling (Meng and Wong, 1996), where the MCMC sample is combined with an iid sample from an importance density. The importance density is constructed in an unsupervised manner from the MCMC output using a mixture of complete data posteriors. Whereas the importance sampling estimator as well as the reciprocal importance sampling estimator are sensitive to the tail behaviour of the importance density, we demonstrate that the bridge sampling estimator is far more robust in this concern. Our case studies range from computing marginal likelihoods for a mixture of multivariate normal distributions, testing for the inhomogeneity of a discrete time Poisson process, to testing for the presence of Markov switching and order selection in the MSAR model. (author's abstract)
Series: Report Series SFB "Adaptive Information Systems and Modelling in Economics and Management Science"

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Xin, and n/a. "Research of mixture of experts model for time series prediction." University of Otago. Department of Information Science, 2005. http://adt.otago.ac.nz./public/adt-NZDU20070312.144924.

Full text

Abstract:

For the prediction of chaotic time series, a dichotomy has arisen between local approaches and global approaches. Local approaches hold the reputation of simplicity and feasibility, but they generally do not produce a compact description of the underlying system and are computationally intensive. Global approaches have the advantage of requiring less computation and are able to yield a global representation of the studied time series. However, due to the complexity of the time series process, it is often not easy to construct a global model to perform the prediction precisely. In addition to these approaches, a combination of the global and local techniques, called mixture of experts (ME), is also possible, where a smaller number of models work cooperatively to implement the prediction. This thesis reports on research about ME models for chaotic time series prediction. Based on a review of the techniques in time series prediction, a HMM-based ME model called "Time-line" Hidden Markov Experts (THME) is developed, where the trajectory of the time series is divided into some regimes in the state space and regression models called local experts are applied to learn the mapping on the regimes separately. The dynamics for the expert combination is a HMM, however, the transition probabilities are designed to be time-varying and conditional on the "real time" information of the time series. For the learning of the "time-line" HMM, a modified Baum-Welch algorithm is developed and the convergence of the algorithm is proved. Different versions of the model, based on MLP, RBF and SVM experts, are constructed and applied to a number of chaotic time series on both one-step-ahead and multi-step-ahead predictions. Experiments show that in general THME achieves better generalization performance than the corresponding single models in one-step-ahead prediction and comparable to some published benchmarks in multi-step-ahead prediction. Various properties of THME, such as the feature selection for trajectory dividing, the clustering techniques for regime extraction, the "time-line" HMM for expert combination and the performance of the model when it has different number of experts, are investigated. A number of interesting future directions for this work are suggested, which include the feature selection for regime extraction, the model selection for transition probability modelling, the extension to distribution prediction and the application on other time series.

APA, Harvard, Vancouver, ISO, and other styles

3

Heinz, Daniel. "Hyper Markov Non-Parametric Processes for Mixture Modeling and Model Selection." Research Showcase @ CMU, 2010. http://repository.cmu.edu/dissertations/11.

Full text

Abstract:

Markov distributions describe multivariate data with conditional independence structures. Dawid and Lauritzen (1993) extended this idea to hyper Markov laws for prior distributions. A hyper Markov law is a distribution over Markov distributions whose marginals satisfy the same conditional independence constraints. These laws have been used for Gaussian mixtures (Escobar, 1994; Escobar and West, 1995) and contingency tables (Liu and Massam, 2006; Dobra and Massam, 2009). In this paper, we develop a family of non-parametric hyper Markov laws that we call hyper Dirichlet processes, combining the ideas of hyper Markov laws and non-parametric processes. Hyper Dirichlet processes are joint laws with Dirichlet process laws for particular marginals. We also describe a more general class of Dirichlet processes that are not hyper Markov, but still contain useful properties for describing graphical data. The graphical Dirichlet processes are simple Dirichlet processes with a hyper Markov base measure. This class allows an extremely straight-forward application of existing Dirichlet knowledge and technology to graphical settings. Given the wide-spread use of Dirichlet processes, there are many applications of this framework waiting to be explored. One broad class of applications, known as Dirichlet process mixtures, has been used for constructing mixture densities such that the underlying number of components may be determined by the data (Lo, 1984; Escobar, 1994; Escobar and West, 1995). I consider the use of the new graphical Dirichlet process in this setting, which imparts a conditional independence structure inside each component. In other words, given the component or cluster membership, the data exhibit the desired independence structure. We discuss two applications. Expanding on the work of Escobar and West (1995), we estimate a non-parametric mixture of Markov Gaussians using a Gibbs sampler. Secondly, we employ the Mode-Oriented Stochastic Search of Dobra and Massam (2009) for determining a suitable conditional independence model, focusing on contingency tables. In general, the mixing induced by a Dirichlet process does not drastically increase the complexity beyond that of a simpler Bayesian hierarchical models sans mixture components. We provide a specific representation for decomposable graphs with useful algorithms for local updates.

APA, Harvard, Vancouver, ISO, and other styles

4

Loza, Reyes Elisa. "Classification of phylogenetic data via Bayesian mixture modelling." Thesis, University of Bath, 2010. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.519916.

Full text

Abstract:

Conventional probabilistic models for phylogenetic inference assume that an evolutionary tree,andasinglesetofbranchlengthsandstochasticprocessofDNA evolutionare sufficient to characterise the generating process across an entire DNA alignment. Unfortunately such a simplistic, homogeneous formulation may be a poor description of reality when the data arise from heterogeneous processes. A well-known example is when sites evolve at heterogeneous rates. This thesis is a contribution to the modelling and understanding of heterogeneityin phylogenetic data. Weproposea methodfor the classificationof DNA sites based on Bayesian mixture modelling. Our method not only accounts for heterogeneous data but also identifies the underlying classes and enables their interpretation. We also introduce novel MCMC methodology with the same, or greater, estimation performance than existing algorithms but with lower computational cost. We find that our mixture model can successfully detect evolutionary heterogeneity and demonstrate its direct relevance by applying it to real DNA data. One of these applications is the analysis of sixteen strains of one of the bacterial species that cause Lyme disease. Results from that analysis have helped understanding the evolutionary paths of these bacterial strains and, therefore, the dynamics of the spread of Lyme disease. Our method is discussed in the context of DNA but it may be extendedto othertypesof molecular data. Moreover,the classification scheme thatwe propose is evidence of the breadth of application of mixture modelling and a step forwards in the search for more realistic models of theprocesses that underlie phylogenetic data.

APA, Harvard, Vancouver, ISO, and other styles

5

Koh, Maria. "Socioeconomic patterning of self-rated health trajectories in Canada: A mixture latent Markov model." Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=110661.

Full text

Abstract:

This thesis investigates the association between socioeconomic position and self-rated health trajectories among Canadians. Data come from the Survey of Labour and Income Dynamics (SLID), Panel 4 (year 2002 to 2008), conducted by Statistics Canada. These longitudinal data are analyzed using mixed latent Markov model which allows for modeling multiple trajectories of health. Goodness of fit tests showed three trajectories (good health, poor health, and fluctuating health) to provide the best fit to the data. The results show that more than three quarters of Canadians were in the constant good health trajectory whereas 13.95% and 7.99% of Canadians were respectively in the persistent ill health trajectory and fluctuating health trajectory. The relative risk ratios indicate that increasing income and education are independently associated with a greater likelihood of belonging to the persistent good health trajectory rather than the persistent poor health trajectory. Both associations accounted for possible confounders including gender, age, marital status, immigrant status and visible minority status. These results suggest that a socioeconomic gradient exists in the likelihood of belonging to given health trajectories. In addition, the use of mixed latent Markov model is robust in accounting for certain issues inherent to longitudinal analysis. Notably, the Markov chain models the dependency between repeated measurements within the same individual; it allows for the modeling of the latent variables estimate measurement error; the heterogeneity of the population is accounted by finite mixture modeling; and lastly, missing data are dealt with using full information maximum likelihood.
Cette thèse étudie l'association entre la position socioéconomique et les trajectoires de santé perçue parmi la population canadienne. Les données proviennent de l'Enquête sur la dynamique du travail et du revenu (EDTR) de Statistique Canada. Ces données longitudinales couvrant la période 2002-2008 sont analysées à l'aide de chaines de Markov avec variables latentes, qui permettent de modéliser les trajectoires de santé perçue des individus. Les résultats indiquent que plus de trois Canadiens sur quatre appartiennent à la trajectoire de bonne santé stable, alors que 13.95% et 7.99% des Canadiens se trouvent respectivement dans les trajectoires de mauvaise santé persistante et de santé instable. Les ratios de risque indiquent qu'il existe un gradient inverse entre le niveau de revenu et le degré d'instruction et le risque d'appartenir à la trajectoire de mauvaise santé plutôt qu'à celle de bonne santé. Cette association persiste suite à l'ajout des caractéristiques sociodémographiques telles le sexe, l'âge, et les statuts matrimonial, d'immigrant et de minorité visible. Ces résultats établissent la présence d'un gradient socioéconomique dans les trajectoires de santé, démonstration qui n'avait jusqu'à maintenant pas été faite au Canada. Qui plus est, les méthodes utilisées s'avèrent robustes pour l'analyse des données longitudinales et des problèmes qui y sont souvent associés. En effet, les chaines de Markov tiennent explicitement compte de la corrélation entre les réponses fournies à travers le temps par un même individu; l'hétérogénéité dans les trajectoires est prise en compte par un modèle pour un mélange fini de distributions (finite mixture model); les erreurs de mesure sont incorporées dans l'estimation des variables latentes; et enfin, les données manquantes sont estimées à l'aide de l'algorithme du maximum de vraisemblance à information complète (full information maximum likelihood).

APA, Harvard, Vancouver, ISO, and other styles

6

Kullmann, Emelie. "Speech to Text for Swedish using KALDI." Thesis, KTH, Optimeringslära och systemteori, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189890.

Full text

Abstract:

The field of speech recognition has during the last decade left the re- search stage and found its way in to the public market. Most computers and mobile phones sold today support dictation and transcription in a number of chosen languages. Swedish is often not one of them. In this thesis, which is executed on behalf of the Swedish Radio, an Automatic Speech Recognition model for Swedish is trained and the performance evaluated. The model is built using the open source toolkit Kaldi. Two approaches of training the acoustic part of the model is investigated. Firstly, using Hidden Markov Model and Gaussian Mixture Models and secondly, using Hidden Markov Models and Deep Neural Networks. The later approach using deep neural networks is found to achieve a better performance in terms of Word Error Rate.
De senaste åren har olika tillämpningar inom människa-dator interaktion och främst taligenkänning hittat sig ut på den allmänna marknaden. Många system och tekniska produkter stöder idag tjänsterna att transkribera tal och diktera text. Detta gäller dock främst de större språken och sällan finns samma stöd för mindre språk som exempelvis svenskan. I detta examensprojekt har en modell för taligenkänning på svenska ut- vecklas. Det är genomfört på uppdrag av Sveriges Radio som skulle ha stor nytta av en fungerande taligenkänningsmodell på svenska. Modellen är utvecklad i ramverket Kaldi. Två tillvägagångssätt för den akustiska träningen av modellen är implementerade och prestandan för dessa två är evaluerade och jämförda. Först tränas en modell med användningen av Hidden Markov Models och Gaussian Mixture Models och slutligen en modell där Hidden Markov Models och Deep Neural Networks an- vänds, det visar sig att den senare uppnår ett bättre resultat i form av måttet Word Error Rate.

APA, Harvard, Vancouver, ISO, and other styles

7

Tüchler, Regina. "Bayesian Variable Selection for Logistic Models Using Auxiliary Mixture Sampling." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 2006. http://epub.wu.ac.at/984/1/document.pdf.

Full text

Abstract:

The paper presents an Markov Chain Monte Carlo algorithm for both variable and covariance selection in the context of logistic mixed effects models. This algorithm allows us to sample solely from standard densities, with no additional tuning being needed. We apply a stochastic search variable approach to select explanatory variables as well as to determine the structure of the random effects covariance matrix. For logistic mixed effects models prior determination of explanatory variables and random effects is no longer prerequisite since the definite structure is chosen in a data-driven manner in the course of the modeling procedure. As an illustration two real-data examples from finance and tourism studies are given. (author's abstract)
Series: Research Report Series / Department of Statistics and Mathematics

APA, Harvard, Vancouver, ISO, and other styles

8

Manikas, Vasileios. "A Bayesian Finite Mixture Model for Network-Telecommunication Data." Thesis, Stockholms universitet, Statistiska institutionen, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-146039.

Full text

Abstract:

A data modeling procedure called Mixture model, is introduced beneficial to the characteristics of our data. Mixture models have been proved flexible and easy to use, a situation which can be confirmed from the majority of papers and books which have been published the last twenty years. The models are estimated using a Bayesian inference through an efficient Markov Chain Monte Carlo (MCMC) algorithm, known as Gibbs Sampling. The focus of the paper is on models for network-telecommunication lab data (not time dependent data) and on the valid predictions we can accomplish. We categorize our variables (based on their distribution) in three cases, a mixture of Normal distributions with known allocation, a mixture of Negative Binomial Distributions with known allocations and a mixture of Normal distributions with unknown allocation.

APA, Harvard, Vancouver, ISO, and other styles

9

Prosdocimi, Cecilia. "Partial exchangeability and change detection for hidden Markov models." Doctoral thesis, Università degli studi di Padova, 2010. http://hdl.handle.net/11577/3423210.

Full text

Abstract:

The thesis focuses on Hidden Markov Models (HMMs). They are very popular models, because they have a more versatile structure than independent identically distributed sequences or Markov chains, but they are still tractable. It is thus of interest to look for properties of i.i.d. sequences that hold true also for HHMs, and this is the object of the thesis. In the first part we concentrate on a probabilistic problem. In particular we focus on exchangeable and partially exchangeable sequences, and we find conditions to realize them as HHMs. For a special class of binary exchangeable sequences we also give a realization algorithm. In the second part we consider the problem of detecting changes in the statistical pattern of a hidden Markov process. Adapting to HHMs the so-called cumulative sum (CUSUM) algorithm, first introduced for independent observations, we are led to the study of the CUSUM statistics with L-mixing input sequence. We establish a loss of memory property of the CUSUM statistics when there is no change, first in the easier case of a i.i.d. input sequence, (with negative expectation, and finite exponential moments of some positive order), and then, under some technical conditions, for bounded and L-mixing input sequence.
La tesi affronta lo studio dei modelli di Markov nascosti. Essi sono oggi giorno molto popolari, in quanto presentano una struttura più versatile dei processi indipendenti ed identicamente distribuiti o delle catene di Markov, ma sono tuttavia trattabili. Risulta quindi interessante cercare proprietà dei processi i.i.d. che restano valide per modelli di Markov nascosti, ed è questo l'oggetto della tesi. Nella prima parte trattiamo un problema probabilistico. In particolare ci concentriamo sui processi scambiabili e parzialmente scambiabili, trovando delle condizioni che li rendono realizzabili come processi di Markov nascosti. Per una classe particolare di processi scambiabili binari troviamo anche un algoritmo di realizzazione. Nella seconda parte affrontiamo il problema del rilevamento di un cambiamento nei parametri caratterizzanti la dinamica di un modello di Markov nascosto. Adattiamo ai modelli di Markov nascosti un algoritmo di tipo cumulative sum (CUSUM), introdotto inizialmente per osservazioni i.i.d. Questo ci porta a studiare la statistica CUSUM con processo di entrata L-mixing. Troviamo quindi una proprietà di perdita di memoria della statistica CUSUM, quando non ci sono cambiamenti nella triettoria, dapprima nel caso più elemenatare di processo di entrata i.i.d. (con media negativa e momenti esponenziali di qualche ordine finiti), e poi per processo di entrata L-mixing e limitato, sotto opportune ipotesi tecniche.

APA, Harvard, Vancouver, ISO, and other styles

10

Zhao, David Yuheng. "Model Based Speech Enhancement and Coding." Doctoral thesis, Stockholm : Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4412.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

White, Nicole. "Bayesian mixtures for modelling complex medical data : a case study in Parkinson’s disease." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/48202/1/Nicole_White_Thesis.pdf.

Full text

Abstract:

Mixture models are a flexible tool for unsupervised clustering that have found popularity in a vast array of research areas. In studies of medicine, the use of mixtures holds the potential to greatly enhance our understanding of patient responses through the identification of clinically meaningful clusters that, given the complexity of many data sources, may otherwise by intangible. Furthermore, when developed in the Bayesian framework, mixture models provide a natural means for capturing and propagating uncertainty in different aspects of a clustering solution, arguably resulting in richer analyses of the population under study. This thesis aims to investigate the use of Bayesian mixture models in analysing varied and detailed sources of patient information collected in the study of complex disease. The first aim of this thesis is to showcase the flexibility of mixture models in modelling markedly different types of data. In particular, we examine three common variants on the mixture model, namely, finite mixtures, Dirichlet Process mixtures and hidden Markov models. Beyond the development and application of these models to different sources of data, this thesis also focuses on modelling different aspects relating to uncertainty in clustering. Examples of clustering uncertainty considered are uncertainty in a patient’s true cluster membership and accounting for uncertainty in the true number of clusters present. Finally, this thesis aims to address and propose solutions to the task of comparing clustering solutions, whether this be comparing patients or observations assigned to different subgroups or comparing clustering solutions over multiple datasets. To address these aims, we consider a case study in Parkinson’s disease (PD), a complex and commonly diagnosed neurodegenerative disorder. In particular, two commonly collected sources of patient information are considered. The first source of data are on symptoms associated with PD, recorded using the Unified Parkinson’s Disease Rating Scale (UPDRS) and constitutes the first half of this thesis. The second half of this thesis is dedicated to the analysis of microelectrode recordings collected during Deep Brain Stimulation (DBS), a popular palliative treatment for advanced PD. Analysis of this second source of data centers on the problems of unsupervised detection and sorting of action potentials or "spikes" in recordings of multiple cell activity, providing valuable information on real time neural activity in the brain.

APA, Harvard, Vancouver, ISO, and other styles

12

Frühwirth-Schnatter, Sylvia, and Rudolf Frühwirth. "Bayesian Inference in the Multinomial Logit Model." Austrian Statistical Society, 2012. http://epub.wu.ac.at/5629/1/186%2D751%2D1%2DSM.pdf.

Full text

Abstract:

The multinomial logit model (MNL) possesses a latent variable representation in terms of random variables following a multivariate logistic distribution. Based on multivariate finite mixture approximations of the multivariate logistic distribution, various data-augmented Metropolis-Hastings algorithms are developed for a Bayesian inference of the MNL model.

APA, Harvard, Vancouver, ISO, and other styles

13

Mangayyagari, Srikanth. "Voice recognition system based on intra-modal fusion and accent classification." [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002229.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Arnaud, Alexis. "Analyse statistique d'IRM quantitatives par modèles de mélange : Application à la localisation et la caractérisation de tumeurs cérébrales." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM052/document.

Full text

Abstract:

Nous présentons dans cette thèse une méthode générique et automatique pour la localisation et la caractérisation de lésions cérébrales telles que les tumeurs primaires à partir de multiples contrastes IRM. Grâce à une récente généralisation des lois de probabilités de mélange par l'échelle de distributions gaussiennes, nous pouvons modéliser une large variété d'interactions entre les paramètres IRM mesurés, et cela afin de capter l'hétérogénéité présent dans les tissus cérébraux sains et endommagés. En nous basant sur ces lois de probabilités, nous proposons un protocole complet pour l'analyse de données IRM multi-contrastes : à partir de données quantitatives, ce protocole fournit, s'il y a lieu, la localisation et le type des lésions détectées au moyen de modèles probabilistes. Nous proposons également deux extensions de ce protocole. La première extension concerne la sélection automatique du nombre de composantes au sein du modèle probabiliste, sélection réalisée via une représentation bayésienne des modèles utilisés. La seconde extension traite de la prise en compte de la structure spatiale des données IRM par l'ajout d'un champ de Markov latent au sein du protocole développé
We present in this thesis a generic and automatic method for the localization and the characterization of brain lesions such as primary tumor using multi-contrast MRI. From the recent generalization of scale mixtures of Gaussians, we reach to model a large variety of interactions between the MRI parameters, with the aim of capturing the heterogeneity inside the healthy and damaged brain tissues. Using these probability distributions we propose an all-in-one protocol to analyze multi-contrast MRI: starting from quantitative MRI data this protocol determines if there is a lesion and in this case the localization and the type of the lesion based on probability models. We also develop two extensions for this protocol. The first one concerns the selection of mixture components in a Bayesian framework. The second one is about taking into account the spatial structure of MRI data by the addition of a random Markov field to our protocol

APA, Harvard, Vancouver, ISO, and other styles

15

Vernet, Elodie Edith. "Modèles de mélange et de Markov caché non-paramétriques : propriétés asymptotiques de la loi a posteriori et efficacité." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS418/document.

Full text

Abstract:

Les modèles latents sont très utilisés en pratique, comme en génomique, économétrie, reconnaissance de parole... Comme la modélisation paramétrique des densités d’émission, c’est-à-dire les lois d’une observation sachant l’état latent, peut conduire à de mauvais résultats en pratique, un récent intérêt pour les modèles latents non paramétriques est apparu dans les applications. Or ces modèles ont peu été étudiés en théorie. Dans cette thèse je me suis intéressée aux propriétés asymptotiques des estimateurs (dans le cas fréquentiste) et de la loi a posteriori (dans le cadre Bayésien) dans deux modèles latents particuliers : les modèles de Markov caché et les modèles de mélange. J’ai tout d’abord étudié la concentration de la loi a posteriori dans les modèles non paramétriques de Markov caché. Plus précisément, j’ai étudié la consistance puis la vitesse de concentration de la loi a posteriori. Enfin je me suis intéressée à l’estimation efficace du paramètre de mélange dans les modèles semi paramétriques de mélange
Latent models have been widely used in diverse fields such as speech recognition, genomics, econometrics. Because parametric modeling of emission distributions, that is the distributions of an observation given the latent state, may lead to poor results in practice, in particular for clustering purposes, recent interest in using non parametric latent models appeared in applications. Yet little thoughts have been given to theory in this framework. During my PhD I have been interested in the asymptotic behaviour of estimators (in the frequentist case) and the posterior distribution (in the Bayesian case) in two particuliar non parametric latent models: hidden Markov models and mixture models. I have first studied the concentration of the posterior distribution in non parametric hidden Markov models. More precisely, I have considered posterior consistency and posterior concentration rates. Finally, I have been interested in efficient estimation of the mixture parameter in semi parametric mixture models

APA, Harvard, Vancouver, ISO, and other styles

16

Jose, Neenu. "SPEAKER AND GENDER IDENTIFICATION USING BIOACOUSTIC DATA SETS." UKnowledge, 2018. https://uknowledge.uky.edu/ece_etds/120.

Full text

Abstract:

Acoustic analysis of animal vocalizations has been widely used to identify the presence of individual species, classify vocalizations, identify individuals, and determine gender. In this work automatic identification of speaker and gender of mice from ultrasonic vocalizations and speaker identification of meerkats from their Close calls is investigated. Feature extraction was implemented using Greenwood Function Cepstral Coefficients (GFCC), designed exclusively for extracting features from animal vocalizations. Mice ultrasonic vocalizations were analyzed using Gaussian Mixture Models (GMM) which yielded an accuracy of 78.3% for speaker identification and 93.2% for gender identification. Meerkat speaker identification with Close calls was implemented using Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM), with an accuracy of 90.8% and 94.4% respectively. The results obtained shows these methods indicate the presence of gender and identity information in vocalizations and support the possibility of robust gender identification and individual identification using bioacoustic data sets.

APA, Harvard, Vancouver, ISO, and other styles

17

Ullah, Ikram. "Probabilistic Models for Species Tree Inference and Orthology Analysis." Doctoral thesis, KTH, Beräkningsbiologi, CB, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-168146.

Full text

Abstract:

A phylogenetic tree is used to model gene evolution and species evolution using molecular sequence data. For artifactual and biological reasons, a gene tree may differ from a species tree, a phenomenon known as gene tree-species tree incongruence. Assuming the presence of one or more evolutionary events, e.g., gene duplication, gene loss, and lateral gene transfer (LGT), the incongruence may be explained using a reconciliation of a gene tree inside a species tree. Such information has biological utilities, e.g., inference of orthologous relationship between genes. In this thesis, we present probabilistic models and methods for orthology analysis and species tree inference, while accounting for evolutionary factors such as gene duplication, gene loss, and sequence evolution. Furthermore, we use a probabilistic LGT-aware model for inferring gene trees having temporal information for duplication and LGT events. In the first project, we present a Bayesian method, called DLRSOrthology, for estimating orthology probabilities using the DLRS model: a probabilistic model integrating gene evolution, a relaxed molecular clock for substitution rates, and sequence evolution. We devise a dynamic programming algorithm for efficiently summing orthology probabilities over all reconciliations of a gene tree inside a species tree. Furthermore, we present heuristics based on receiver operating characteristics (ROC) curve to estimate suitable thresholds for deciding orthology events. Our method, as demonstrated by synthetic and biological results, outperforms existing probabilistic approaches in accuracy and is robust to incomplete taxon sampling artifacts. In the second project, we present a probabilistic method, based on a mixture model, for species tree inference. The method employs a two-phase approach, where in the first phase, a structural expectation maximization algorithm, based on a mixture model, is used to reconstruct a maximum likelihood set of candidate species trees. In the second phase, in order to select the best species tree, each of the candidate species tree is evaluated using PrIME-DLRS: a method based on the DLRS model. The method is accurate, efficient, and scalable when compared to a recent probabilistic species tree inference method called PHYLDOG. We observe that, in most cases, the analysis constituted only by the first phase may also be used for selecting the target species tree, yielding a fast and accurate method for larger datasets. Finally, we devise a probabilistic method based on the DLTRS model: an extension of the DLRS model to include LGT events, for sampling reconciliations of a gene tree inside a species tree. The method enables us to estimate gene trees having temporal information for duplication and LGT events. To the best of our knowledge, this is the first probabilistic method that takes gene sequence data directly into account for sampling reconciliations that contains information about LGT events. Based on the synthetic data analysis, we believe that the method has the potential to identify LGT highways.

QC 20150529

APA, Harvard, Vancouver, ISO, and other styles

18

Tang, Man. "Statistical methods for variant discovery and functional genomic analysis using next-generation sequencing data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/104039.

Full text

Abstract:

The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data, allowing the identification of biomarkers in early disease diagnosis and driving the transformation of most disciplines in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. This dissertation focuses on modeling ``omics'' data in various NGS applications with a primary goal of developing novel statistical methods to identify sequence variants, find transcription factor (TF) binding patterns, and decode the relationship between TF and gene expression levels. Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in NGS applications. Existing methods for calling these variants often make simplified assumption of positional independence and fail to leverage the dependence of genotypes at nearby loci induced by linkage disequilibrium. We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short read data. Simulation experiments show that, under various sequencing depths, vi-HMM outperforms existing methods in terms of sensitivity and F1 score. When applied to the human whole genome sequencing data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. One important NGS application is chromatin immunoprecipitation followed by sequencing (ChIP-seq), which characterizes protein-DNA relations through genome-wide mapping of TF binding sites. Multiple TFs, binding to DNA sequences, often show complex binding patterns, which indicate how TFs with similar functionalities work together to regulate the expression of target genes. To help uncover the transcriptional regulation mechanism, we propose a novel nonparametric Bayesian method to detect the clustering pattern of multiple-TF bindings from ChIP-seq datasets. Simulation study demonstrates that our method performs best with regard to precision, recall, and F1 score, in comparison to traditional methods. We also apply the method on real data and observe several TF clusters that have been recognized previously in mouse embryonic stem cells. Recent advances in ChIP-seq and RNA sequencing (RNA-Seq) technologies provides more reliable and accurate characterization of TF binding sites and gene expression measurements, which serves as a basis to study the regulatory functions of TFs on gene expression. We propose a log Gaussian cox process with wavelet-based functional model to quantify the relationship between TF binding site locations and gene expression levels. Through the simulation study, we demonstrate that our method performs well, especially with large sample size and small variance. It also shows a remarkable ability to distinguish real local feature in the function estimates.
Doctor of Philosophy
The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data and bring out innovations in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. In this dissertation, we mainly focus on three problems closely related to NGS and its applications: (1) how to improve variant calling accuracy, (2) how to model transcription factor (TF) binding patterns, and (3) how to quantify of the contribution of TF binding on gene expression. We develop novel statistical methods to identify sequence variants, find TF binding patterns, and explore the relationship between TF binding and gene expressions. We expect our findings will be helpful in promoting a better understanding of disease causality and facilitating the design of personalized treatments.

APA, Harvard, Vancouver, ISO, and other styles

19

Kastner, Gregor, and Sylvia Frühwirth-Schnatter. "Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models." WU Vienna University of Economics and Business, 2013. http://epub.wu.ac.at/3771/1/paper.pdf.

Full text

Abstract:

Bayesian inference for stochastic volatility models using MCMC methods highly depends on actual parameter values in terms of sampling efficiency. While draws from the posterior utilizing the standard centered parameterization break down when the volatility of volatility parameter in the latent state equation is small, non-centered versions of the model show deficiencies for highly persistent latent variable series. The novel approach of ancillarity-sufficiency interweaving has recently been shown to aid in overcoming these issues for a broad class of multilevel models. In this paper, we demonstrate how such an interweaving strategy can be applied to stochastic volatility models in order to greatly improve sampling efficiency for all parameters and throughout the entire parameter range. Moreover, this method of "combining best of different worlds" allows for inference for parameter constellations that have previously been infeasible to estimate without the need to select a particular parameterization beforehand.
Series: Research Report Series / Department of Statistics and Mathematics

APA, Harvard, Vancouver, ISO, and other styles

20

MAQSOOD, RABIA. "ANALYZING AND MODELING STUDENTS¿ BEHAVIORAL DYNAMICS IN CONFIDENCE-BASED ASSESSMENT." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/699383.

Full text

Abstract:

Confidence-based assessment is a two-dimensional assessment paradigm which considers the confidence or expectancy level a student has about the answer, to ascertain his/her actual knowledge. Several researchers have discussed the usefulness of this model over the traditional one-dimensional assessment approach, which takes the number of correctly answered questions as a sole parameter to calculate the test scores of a student. Additionally, some educational psychologists and theorists have found that confidence-based assessment has a positive impact on students’ academic performance, knowledge retention, and metacognitive abilities of self-regulation and engagement depicted during a learning process. However, to the best of our knowledge, these findings are not exploited by the educational data mining community, aiming to exploit students (logged) data to investigate their performance and behavioral characteristics in order to enhance their performance outcomes and/or learning experiences. Engagement reflects a student’s active participation in an ongoing task or process, that becomes even more important when students are interacting with a computer-based learning or assessment system. There is some evidence that students’ online engagement (which is estimated through their behaviors while interacting with a learning/assessment environment) is also positively correlated with good performance scores. However, no data mining method to date has measured students engagement behaviors during confidence-based assessment. This Ph.D. research work aimed to identify, analyze, model and predict students’ dynamic behaviors triggered by their progression in a computer-based assessment system, offering confidence-driven questions. The data was collected from two experimental studies conducted with undergraduate students who solved a number of problems during confidence-based assessment. In this thesis, we first addressed the challenge of identifying different parameters representing students’ problem-solving behaviors that are positively correlated with confidence-based assessment. Next, we developed a novel scheme to classify students’ problem-solving activities into engaged or disengaged behaviors using the three previously identified parameters namely: students’ response correctness, confidence level, feedback seeking/no-seeking behavior. Our next challenge was to exploit the students’ interactions recorded at the micro-level, i.e. event by event, by the computer-based assessment tools, to estimate their intended engagement behaviors during the assessment. We also observed that traditional non-mixture, first-order Markov chain is inadequate to capture students’ evolving behaviors revealed from their interactions with a computer-based learning/assessment system. We, therefore, investigated mixture Markov models to map students trails of performed activities. However, the quality of the resultant Markov chains is critically dependent on the initialization of the algorithm, which is usually performed randomly. We proposed a new approach for initializing the Expectation-Maximization algorithm for multivariate categorical data we called K-EM. Our method achieved better prediction accuracy and convergence rate in contrast to two pre-existing algorithms when applied on two real datasets. This doctoral research work contributes to elevate the existing states of the educational research (i.e. theoretical aspect) and the educational data mining area (i.e. empirical aspect). The outcomes of this work pave the way to a framework for an adaptive confidence-based assessment system, contributing to one of the central components of Adaptive Learning, that is, personalized student models. The adaptive system can exploit data generated in a confidence-based assessment system, to model students’ behavioral profiles and provide personalized feedback to improve students’ confidence accuracy and knowledge by considering their behavioral dynamics.

APA, Harvard, Vancouver, ISO, and other styles

21

Berard, Caroline. "Modèles à variables latentes pour des données issues de tiling arrays : Applications aux expériences de ChIP-chip et de transcriptome." Thesis, Paris, AgroParisTech, 2011. http://www.theses.fr/2011AGPT0067.

Full text

Abstract:

Les puces tiling arrays sont des puces à haute densité permettant l'exploration des génomes à grande échelle. Elles sont impliquées dans l'étude de l'expression des gènes et de la détection de nouveaux transcrits grâce aux expériences de transcriptome, ainsi que dans l'étude des mécanismes de régulation de l'expression des gènes grâce aux expériences de ChIP-chip. Dans l'objectif d'analyser des données de ChIP-chip et de transcriptome, nous proposons une modélisation fondée sur les modèles à variables latentes, en particulier les modèles de Markov cachés, qui sont des méthodes usuelles de classification non-supervisée. Les caractéristiques biologiques du signal issu des puces tiling arrays telles que la dépendance spatiale des observations le long du génome et l'annotation structurale sont intégrées dans la modélisation. D'autre part, les modèles sont adaptés en fonction de la question biologique et une modélisation est proposée pour chaque type d'expériences. Nous proposons un mélange de régressions pour la comparaison de deux échantillons dont l'un peut être considéré comme un échantillon de référence (ChIP-chip), ainsi qu'un modèle gaussien bidimensionnel avec des contraintes sur la matrice de variance lorsque les deux échantillons jouent des rôles symétriques (transcriptome). Enfin, une modélisation semi-paramétrique autorisant des distributions plus flexibles pour la loi d'émission est envisagée. Dans un objectif de classification, nous proposons un contrôle de faux-positifs dans le cas d'une classification à deux groupes et pour des observations indépendantes. Puis, nous nous intéressons à la classification d'un ensemble d'observations constituant une région d'intérêt, telle que les gènes. Les différents modèles sont illustrés sur des jeux de données réelles de ChIP-chip et de transcriptome issus d'une puce NimbleGen couvrant le génome entier d'Arabidopsis thaliana
Tiling arrays make possible a large scale exploration of the genome with high resolution. Biological questions usually addressed are either the gene expression or the detection of transcribed regions which can be investigated via transcriptomic experiments, and also the regulation of gene expression thanks to ChIP-chip experiments. In order to analyse ChIP-chip and transcriptomic data, we propose latent variable models, especially Hidden Markov Models, which are part of unsupervised classification methods. The biological features of the tiling arrays signal, such as the spatial dependence between observations along the genome and structural annotation are integrated in the model. Moreover, the models are adapted to the biological question at hand and a model is proposed for each type of experiment. We propose a mixture of regressions for the comparison of two samples, when one sample can be considered as a reference sample (ChIP-chip), and a two-dimensional Gaussian model with constraints on the variance parameter when the two samples play symmetrical roles (transcriptome). Finally, a semi-parametric modeling is considered, allowing more flexible emission distributions. With the objective of classification, we propose a false-positive control in the case of a two-cluster classification and for independent observations. Then, we focus on the classification of a set of observations forming a region of interest such as a gene. The different models are illustrated on real ChIP-chip and transcriptomic datasets coming from a NimbleGen tiling array covering the entire genome of Arabidopsis thaliana

APA, Harvard, Vancouver, ISO, and other styles

22

Villaron, Emilie. "Modèles aléatoires harmoniques pour les signaux électroencéphalographiques." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4815.

Full text

Abstract:

Cette thèse s'inscrit dans le contexte de l'analyse des signaux biomédicaux multicapteurs par des méthodes stochastiques. Les signaux auxquels nous nous intéressons présentent un caractère oscillant transitoire bien représenté par les décompositions dans le plan temps-fréquence c'est pourquoi nous avons choisi de considérer non plus les décours temporels de ces signaux mais les coefficients issus de la décomposition de ces derniers dans le plan temps-fréquence. Dans une première partie, nous décomposons les signaux multicapteurs sur une base de cosinus locaux (appelée base MDCT) et nous modélisons les coefficients à l'aide d'un modèle à états latents. Les coefficients sont considérés comme les réalisations de processus aléatoires gaussiens multivariés dont la distribution est gouvernée par une chaîne de Markov cachée. Nous présentons les algorithmes classiques liés à l'utilisation des modèles de Markov caché et nous proposons une extension dans le cas où les matrices de covariance sont factorisées sous forme d'un produit de Kronecker. Cette modélisation permet de diminuer la complexité des méthodes de calcul numérique utilisées tout en stabilisant les algorithmes associés. Nous appliquons ces modèles à des données électroencéphalographiques et nous montrons que les matrices de covariance représentant les corrélations entre les capteurs et les fréquences apportent des informations pertinentes sur les signaux analysés. Ceci est notamment illustré par un cas d'étude sur la caractérisation de la désynchronisation des ondes alpha dans le contexte de la sclérose en plaques
This thesis adresses the problem of multichannel biomedical signals analysis using stochastic methods. EEG signals exhibit specific features that are both time and frequency localized, which motivates the use of time-frequency signal representations. In this document the (time-frequency labelled) coefficients are modelled as multivariate random variables. In the first part of this work, multichannel signals are expanded using a local cosine basis (called MDCT basis). The approach we propose models the distribution of time-frequency coefficients (here MDCT coefficients) in terms of latent variables by the use of a hidden Markov model. In the framework of application to EEG signals, the latent variables describe some hidden mental state of the subject. The latter control the covariance matrices of Gaussian vectors of fixed-time vectors of multi-channel, multi-frequency, MDCT coefficients. After presenting classical algorithms to estimate the parameters, we define a new model in which the (space-frequency) covariance matrices are expanded as tensor products (also named Kronecker products) of frequency and channels matrices. Inference for the proposed model is developped and yields estimates for the model parameters, together with maximum likelihood estimates for the sequences of latent variables. The model is applied to electroencephalogram data, and it is shown that variance-covariance matrices labelled by sensor and frequency indices can yield relevant informations on the analyzed signals. This is illustrated with a case study, namely the detection of alpha waves in rest EEG for multiple sclerosis patients and control subjects

APA, Harvard, Vancouver, ISO, and other styles

23

Haas, Markus. "Dynamic mixture models for financial time series /." Berlin : Pro Business, 2004. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=012999049&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Frühwirth-Schnatter, Sylvia. "MCMC Estimation of Classical and Dynamic Switching and Mixture Models." Department of Statistics and Mathematics, WU Vienna University of Economics and Business, 1998. http://epub.wu.ac.at/698/1/document.pdf.

Full text

Abstract:

In the present paper we discuss Bayesian estimation of a very general model class where the distribution of the observations is assumed to depend on a latent mixture or switching variable taking values in a discrete state space. This model class covers e.g. finite mixture modelling, Markov switching autoregressive modelling and dynamic linear models with switching. Joint Bayesian estimation of all latent variables, model parameters and parameters determining the probability law of the switching variable is carried out by a new Markov Chain Monte Carlo method called permutation sampling. Estimation of switching and mixture models is known to be faced with identifiability problems as switching and mixture are identifiable only up to permutations of the indices of the states. For a Bayesian analysis the posterior has to be constrained in such a way that identifiablity constraints are fulfilled. The permutation sampler is designed to sample efficiently from the constrained posterior, by first sampling from the unconstrained posterior - which often can be done in a convenient multimove manner - and then by applying a suitable permutation, if the identifiability constraint is violated. We present simple conditions on the prior which ensure that this method is a valid Markov Chain Monte Carlo method (that is invariance, irreducibility and aperiodicity hold). Three case studies are presented, including finite mixture modelling of fetal lamb data, Markov switching Autoregressive modelling of the U.S. quarterly real GDP data, and modelling the U .S./U.K. real exchange rate by a dynamic linear model with Markov switching heteroscedasticity. (author's abstract)
Series: Forschungsberichte / Institut für Statistik

APA, Harvard, Vancouver, ISO, and other styles

25

Leroux, Brian. "Maximum likelihood estimation for mixture distributions and hidden Markov models." Thesis, University of British Columbia, 1989. http://hdl.handle.net/2429/29176.

Full text

Abstract:

This thesis deals with computational and theoretical aspects of maximum likelihood estimation for data from a mixture model and a hidden Markov model. A maximum penalized likelihood method is proposed for estimating the number of components in a mixture distribution. This method produces a consistent estimator of the unknown mixing distribution, in the sense of weak convergence of distribution functions. The proof of this result consists of establishing consistency results concerning maximum likelihood estimators (which have unrestricted number of components) and constrained maximum likelihood estimators (which assume a fixed finite number of components). In particular, a new proof of the consistency of maximum likelihood estimators is given. Also, the large sample limits of a sequence of constrained maximum likelihood estimators are identified as those distributions minimizing Kullback-Leibler divergence from the true distribution. If the number of components of the true mixture distribution is not greater than the assumed number, the constrained maximum likelihood estimator is consistent in the sense of weak convergence. If the assumed number of components is exactly correct, the estimators of the parameters which define the mixing distribution are also consistent (in a certain sense). An algorithm for computation of maximum likelihood estimates (and the maximum penalized likelihood estimate) is given. The EM algorithm is used to locate local maxima of the likelihood function and a method of automatically generating "good" starting values for each possible number of components is incorporated. The estimation of a Poisson mixture distribution is illustrated using a distribution of traffic accidents in a population and a sequence of observations of fetal movements. One way of looking at the finite mixture model is as a random sample of "states" from a mixing distribution and a sequence of conditionally independent observed variables with distributions determined by the states. In the hidden Markov model considered here, the sequence of states is modelled by a Markov chain. The use of the EM algorithm for finding local maxima of the likelihood function for the hidden Markov model is described. Problems arising in the implementation of the algorithm are discussed, including the automatic generation of starting values and a necessary adjustment to the forward-backward equations. The algorithm is applied, with Poisson component distributions, to the sequence of observations of fetal movements. The consistency of the maximum likelihood estimator for the hidden Markov model is proved. The proof requires the consideration of identifiability, ergodicity, entropy, cross-entropy, and convergence of the log-likelihood function. For instance, the conclusion of the Shannon-McMillan-Breiman theorem on entropy convergence is established for hidden Markov models. A class of doubly stochastic Poisson processes which corresponds to a continuous time version of the hidden Markov model is also considered. We discuss some preliminary work on the extension of the EM algorithm to these processes, and also the possibility of applying our method of proof of consistency of maximum likelihood estimators.
Science, Faculty of
Statistics, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

26

Fitzpatrick, Matthew Anthony. "Multi-regime models involving Markov chains." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/14530.

Full text

Abstract:

In this work, we explore the theory and applications of various multi-regime models involving Markov chains. Markov chains are an elegant way to model path-dependent data. We study a series of problems with non-homogeneous data and the various ways that Markov chains come into play. Non-homogeneous data can be modelled using multi-regime models, which apply a distinct set of parameters to distinct population sub-groups, referred to as regimes. Such models essentially allow for a practitioner to understand the nature (and in some cases the existence) of particular regimes within the data without the need to split the population into assumed sub-groups. For example, the problem of modelling business outcomes in different economic states without explicitly using economic variables. Different regimes can apply to an entire population at different times they can apply to different subsections of the population over the whole observed time. Markov chains are involved via the estimation procedure or within models for the observed data. In our first two problems, we utilise the properties of Markov chains to discover and establish efficiencies in the estimation algorithms. In our third problem, we are analysing mixtures of Markov chains. We prove that the log-likelihood ratio test statistic for the test between 1 and 2 mixture components diverges to infinity in probability. In our fourth problem, we look at a simple case, where each Markov chain component has two states, one of which is absorbing, we derive the exact limiting distribution of the log-likelihood ratio test statistic. Although this work is largely focussed on addressing the theoretical issues of each problem, the motivation behind each of the problems studied comes from real datasets, which possess levels of complexity that are insufficiently described through more standard procedures.

APA, Harvard, Vancouver, ISO, and other styles

27

Al, Hakmani Rahab. "Bayesian Estimation of Mixture IRT Models using NUTS." OpenSIUC, 2018. https://opensiuc.lib.siu.edu/dissertations/1641.

Full text

Abstract:

The No-U-Turn Sampler (NUTS) is a relatively new Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior that common MCMC algorithms such as Gibbs sampling or Metropolis Hastings usually exhibit. Given the fact that NUTS can efficiently explore the entire space of the target distribution, the sampler converges to high-dimensional target distributions more quickly than other MCMC algorithms and is hence less computational expensive. The focus of this study is on applying NUTS to one of the complex IRT models, specifically the two-parameter mixture IRT (Mix2PL) model, and further to examine its performance in estimating model parameters when sample size, test length, and number of latent classes are manipulated. The results indicate that overall, NUTS performs well in recovering model parameters. However, the recovery of the class membership of individual persons is not satisfactory for the three-class conditions. Also, the results indicate that WAIC performs better than LOO in recovering the number of latent classes, in terms of the proportion of the time the correct model was selected as the best fitting model. However, when the effective number of parameters was also considered in selecting the best fitting model, both fully Bayesian fit indices perform equally well. In addition, the results suggest that when multiple latent classes exist, using either fully Bayesian fit indices (WAIC or LOO) would not select the conventional IRT model. On the other hand, when all examinees came from a single unified population, fitting MixIRT models using NUTS causes problems in convergence.

APA, Harvard, Vancouver, ISO, and other styles

28

De, Santis Giulia. "Modeling and Recognizing Network Scanning Activities with Finite Mixture Models and Hidden Markov Models." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0201/document.

Full text

Abstract:

Le travail accompli dans cette thèse a consisté à construire des modèles stochastiques de deux scanners de l'Internet qui sont ZMap et Shodan. Les paquets provenant de chacun des deux scanners ont été collectés par le Laboratoire de Haute Sécurité (LHS) hébergé à Inria Nancy Grand Est, et ont été utilisés pour construire par apprentissage des chaînes de Markov cachées (HMMs). La première partie du travail consistait à modéliser l'intensité des deux scanners considérés. Nous avons cherché à savoir si l'intensité de ZMap varie en fonction du service ciblé et si les intensités des deux scanners sont comparables. Les résultats ont montré que la réponse à la première question est positive (c'est-à-dire que l'intensité de ZMap varie en fonction des ports ciblés), alors que la réponse à la deuxième question est négative. En d'autres termes, nous avons obtenu un modèle pour chaque ensemble de logs. La partie suivante du travail consistait à étudier deux autres caractéristiques des mêmes scanners : leurs mouvements spatiotemporels. Nous avons créé des ensembles d'échantillons de logs avec chacune d'elle contient une seule exécution de ZMap et Shodan. Ensuite, nous avons calculé les différences d'adresses IP ciblées consécutivement par le même scanner (c.-à-d. dans chaque échantillon), et les timestamps correspondants. Les premiers ont été utilisés pour modéliser les mouvements spatiaux, tandis que les seconds pour les mouvements temporels. Une fois que les modèles de chaînes de Markov cachées sont construites, ils ont été appliqués pour identifier les scanners d'autres ensembles de logs. Dans les deux cas, nos modèles ne sont pas capables de détecter le service ciblé, mais ils détectent correctement le scanner qui génère de nouveaux logs, avec une précision de 95% en utilisant les mouvements spatiaux et de 98% pour les mouvements temporels
The work accomplished in this PhD consisted in building stochastic models of ZMap and Shodan, respectively, two Internet-wide scanners. More in detail, packets originated by each of the two considered scanners have been collected by the High Security Lab hosted in Inria, and have been used to learn Hidden Markov Models (HMMs). The rst part of the work consisted in modeling intensity of the two considered scanners. We investigated if the intensity of ZMap varies with respect to the targeted service, and if the intensities of the two scanners are comparable. Results showed that the answer to the first question is positive (i.e., intensity of ZMap varied with respect to the targeted ports), whereas the answer to the second question is negative. In other words, we obtained a model for each set of logs. The following part of the work consisted in investigating other two features of the same scanners: their spatial and temporal movements, respectively. More in detail, we created datasets containing logs of one single execution of ZMap and Shodan, respectively. Then, we computed di erences of IP addresses consecutively targeted by the same scanner (i.e., in each sample), and of the corresponding timestamps. The former have been used to model spatial movements, whereas the latter temporal ones. Once the Hidden Markov Models are available, they have been applied to detect scanners from other sets of logs. In both cases, our models are not able to detect the targeted service, but they correctly detect the scanner that originates new logs, with an accuracy of 95% when exploiting spatial movements, and of 98% when using temporal movements

APA, Harvard, Vancouver, ISO, and other styles

29

Baker, Peter John. "Applied Bayesian modelling in genetics." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

30

Falk, Matthew Gregory. "Incorporating uncertainty in environmental models informed by imagery." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/33235/1/Matthew_Falk_Thesis.pdf.

Full text

Abstract:

In this thesis, the issue of incorporating uncertainty for environmental modelling informed by imagery is explored by considering uncertainty in deterministic modelling, measurement uncertainty and uncertainty in image composition. Incorporating uncertainty in deterministic modelling is extended for use with imagery using the Bayesian melding approach. In the application presented, slope steepness is shown to be the main contributor to total uncertainty in the Revised Universal Soil Loss Equation. A spatial sampling procedure is also proposed to assist in implementing Bayesian melding given the increased data size with models informed by imagery. Measurement error models are another approach to incorporating uncertainty when data is informed by imagery. These models for measurement uncertainty, considered in a Bayesian conditional independence framework, are applied to ecological data generated from imagery. The models are shown to be appropriate and useful in certain situations. Measurement uncertainty is also considered in the context of change detection when two images are not co-registered. An approach for detecting change in two successive images is proposed that is not affected by registration. The procedure uses the Kolmogorov-Smirnov test on homogeneous segments of an image to detect change, with the homogeneous segments determined using a Bayesian mixture model of pixel values. Using the mixture model to segment an image also allows for uncertainty in the composition of an image. This thesis concludes by comparing several different Bayesian image segmentation approaches that allow for uncertainty regarding the allocation of pixels to different ground components. Each segmentation approach is applied to a data set of chlorophyll values and shown to have different benefits and drawbacks depending on the aims of the analysis.

APA, Harvard, Vancouver, ISO, and other styles

31

Hillman, Robert J. T. "Econometric modelling of nonlinearity and nonstationarity in the foreign exchange market." Thesis, University of Southampton, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264846.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Tan, Jen Ning. "Mixtures of exponential and geometric distributions, clumped Markov models with applications to biomedical research." Thesis, Swansea University, 2010. https://cronfa.swan.ac.uk/Record/cronfa43057.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Nahimana, Donnay Fleury. "Impact des multitrajets sur les performances des systèmes de navigation par satellite : contribution à l'amélioration de la précision de localisation par modélisation bayésienne." Phd thesis, Ecole Centrale de Lille, 2009. http://tel.archives-ouvertes.fr/tel-00446552.

Full text

Abstract:

De nombreuses solutions sont développées pour diminuer l'influence des multitrajets sur la précision et la disponibilité des systèmes GNSS. L'intégration de capteurs supplémentaires dans le système de localisation est l'une des solutions permettant de compenser notamment l'absence de données satellitaires. Un tel système est certes d'une bonne précision mais sa complexité et son coût limitent un usage très répandu.Cette thèse propose une approche algorithmique destinée à améliorer la précision des systèmes GNSS en milieu urbain. L'étude se base sur l'utilisation des signaux GNSS uniquement et une connaissance de l'environnement proche du récepteur à partir d'un modèle 3D du lieu de navigation.La méthode présentée intervient à l'étape de filtrage du signal reçu par le récepteur GNSS. Elle exploite les techniques de filtrage statistique de type Monte Carlo Séquentiels appelées filtre particulaire. L'erreur de position en milieu urbain est liée à l'état de réception des signaux satellitaires (bloqué, direct ou réfléchi). C'est pourquoi une information sur l'environnement du récepteur doit être prise en compte. La thèse propose également un nouveau modèle d'erreurs de pseudodistance qui permet de considérer les conditions de réception du signal dans le calcul de la position.Dans un premier temps, l'état de réception de chaque satellite reçu est supposé connu dans le filtre particulaire. Une chaîne de Markov, valable pour une trajectoire connue du mobile, est préalablement définie pour déduire les états successifs de réception des satellites. Par la suite, on utilise une distribution de Dirichlet pour estimer les états de réception des satellites

APA, Harvard, Vancouver, ISO, and other styles

34

Idvall, Patrik, and Conny Jonsson. "Algorithmic Trading : Hidden Markov Models on Foreign Exchange Data." Thesis, Linköping University, Department of Mathematics, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-10719.

Full text

Abstract:

In this master's thesis, hidden Markov models (HMM) are evaluated as a tool for forecasting movements in a currency cross. With an ever increasing electronic market, making way for more automated trading, or so called algorithmic trading, there is constantly a need for new trading strategies trying to find alpha, the excess return, in the market.

HMMs are based on the well-known theories of Markov chains, but where the states are assumed hidden, governing some observable output. HMMs have mainly been used for speech recognition and communication systems, but have lately also been utilized on financial time series with encouraging results. Both discrete and continuous versions of the model will be tested, as well as single- and multivariate input data.

In addition to the basic framework, two extensions are implemented in the belief that they will further improve the prediction capabilities of the HMM. The first is a Gaussian mixture model (GMM), where one for each state assign a set of single Gaussians that are weighted together to replicate the density function of the stochastic process. This opens up for modeling non-normal distributions, which is often assumed for foreign exchange data. The second is an exponentially weighted expectation maximization (EWEM) algorithm, which takes time attenuation in consideration when re-estimating the parameters of the model. This allows for keeping old trends in mind while more recent patterns at the same time are given more attention.

Empirical results shows that the HMM using continuous emission probabilities can, for some model settings, generate acceptable returns with Sharpe ratios well over one, whilst the discrete in general performs poorly. The GMM therefore seems to be an highly needed complement to the HMM for functionality. The EWEM however does not improve results as one might have expected. Our general impression is that the predictor using HMMs that we have developed and tested is too unstable to be taken in as a trading tool on foreign exchange data, with too many factors influencing the results. More research and development is called for.

APA, Harvard, Vancouver, ISO, and other styles

35

Van, Eeden Willem Daniel. "Human and animal classification using Doppler radar." Diss., University of Pretoria, 2005. http://hdl.handle.net/2263/66252.

Full text

Abstract:

South Africa is currently struggling to deal with a significant poaching and livestock theft problem. This work is concerned with the detection and classification of ground based targets using radar micro- Doppler signatures to aid in the monitoring of borders, nature reserves and farmlands. The research starts of by investigating the state of the art of ground target classification. Different radar systems are investigated with respect to their ability to classify targets at different operating frequencies. Finally, a Gaussian Mixture Model Hidden Markov Model based (GMM-HMM) classification approach is presented and tested in an operational environment. The GMM-HMM method is compared to methods in the literature and is shown to achieve reasonable (up to 95%) classification accuracy, marginally outperforming existing ground target classification methods.
Dissertation (MEng)--University of Pretoria, 2017.
Electrical, Electronic and Computer Engineering
MEng
Unrestricted

APA, Harvard, Vancouver, ISO, and other styles

36

Guha, Subharup. "Benchmark estimation for Markov Chain Monte Carlo samplers." The Ohio State University, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=osu1085594208.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Ho, Kwok Wah. "RJMCMC algorithm for multivariate Gaussian mixtures with applications in linear mixed-effects models /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?ISMT%202005%20HO.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Katkuri, Jaipal. "Application of Dirichlet Distribution for Polytopic Model Estimation." ScholarWorks@UNO, 2010. http://scholarworks.uno.edu/td/1210.

Full text

Abstract:

The polytopic model (PM) structure is often used in the areas of automatic control and fault detection and isolation (FDI). It is an alternative to the multiple model approach which explicitly allows for interpolation among local models. This thesis proposes a novel approach to PM estimation by modeling the set of PM weights as a random vector with Dirichlet Distribution (DD). A new approximate (adaptive) PM estimator, referred to as a Quasi-Bayesian Adaptive Kalman Filter (QBAKF) is derived and implemented. The model weights and state estimation in the QBAKF is performed adaptively by a simple QB weights' estimator and a single KF on the PM with the estimated weights. Since PM estimation problem is nonlinear and non-Gaussian, a DD marginalized particle filter (DDMPF) is also developed and implemented similar to MPF. The simulation results show that the newly proposed algorithms have better estimation accuracy, design simplicity, and computational requirements for PM estimation.

APA, Harvard, Vancouver, ISO, and other styles

39

Madsen, Christopher. "Clustering of the Stockholm County housing market." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252301.

Full text

Abstract:

In this thesis a clustering of the Stockholm county housing market has been performed using different clustering methods. Data has been derived and different geographical constraints have been used. DeSO areas (Demographic statistical areas), developed by SCB, have been used to divide the housing market in to smaller regions for which the derived variables have been calculated. Hierarchical clustering methods, SKATER and Gaussian mixture models have been applied. Methods using different kinds of geographical constraints have also been applied in an attempt to create more geographically contiguous clusters. The different methods are then compared with respect to performance and stability. The best performing method is the Gaussian mixture model EII, also known as the K-means algorithm. The most stable method when applied to bootstrapped samples is the ClustGeo-method.
I denna uppsats har en klustring av Stockholms läns bostadsmarknad genomförts med olika klustringsmetoder. Data har bearbetats och olika geografiska begränsningar har använts. DeSO (Demografiska Statistiska Områden), som utvecklats av SCB, har använts för att dela in bostadsmarknaden i mindre regioner för vilka områdesattribut har beräknats. Hierarkiska klustringsmetoder, SKATER och Gaussian mixture models har tillämpats. Metoder som använder olika typer av geografiska begränsningar har också tillämpats i ett försök att skapa mer geografiskt sammanhängande kluster. De olika metoderna jämförs sedan med avseende på kvalitet och stabilitet. Den bästa metoden, med avseende på kvalitet, är en Gaussian mixture model kallad EII, även känd som K-means. Den mest stabila metoden är ClustGeo-metoden.

APA, Harvard, Vancouver, ISO, and other styles

40

Schaeffer, Marie-Caroline. "Traitement du signal ECoG pour Interface Cerveau Machine à grand nombre de degrés de liberté pour application clinique." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS026/document.

Full text

Abstract:

Les Interfaces Cerveau-Machine (ICM) sont des systèmes qui permettent à des patients souffrant d'un handicap moteur sévère d'utiliser leur activité cérébrale pour contrôler des effecteurs, par exemple des prothèses des membres supérieurs dans le cas d'ICM motrices. Les intentions de mouvement de l'utilisateur sont estimées en appliquant un décodeur sur des caractéristiques extraites de son activité cérébrale. Des challenges spécifiques au déploiement clinique d'ICMs motrices ont été considérés, à savoir le contrôle mono-membre ou séquentiel multi-membre asynchrone et précis. Un décodeur, le Markov Switching Linear Model (MSLM), a été développé pour limiter les activations erronées de l'ICM, empêcher des mouvements parallèles des effecteurs et décoder avec précision des mouvements complexes. Le MSLM associe des modèles linéaires à différents états possibles, e.g. le contrôle d'un membre spécifique ou une phase de mouvement particulière. Le MSLM réalise une détection d'état dynamique, et les probabilités des états sont utilisées pour pondérer les modèles linéaires.La performance du décodeur MSLM a été évaluée pour la reconstruction asynchrone de trajectoires de poignet et de doigts à partir de signaux electrocorticographiques. Il a permis de limiter les activations erronées du système et d'améliorer la précision du décodage du signal cérébral
Brain-Computer Interfaces (BCI) are systems that allow severely motor-impaired patients to use their brain activity to control external devices, for example upper-limb prostheses in the case of motor BCIs. The user's intentions are estimated by applying a decoder on neural features extracted from the user's brain activity. Signal processing challenges specific to the clinical deployment of motor BCI systems are addressed in the present doctoral thesis, namely asynchronous mono-limb or sequential multi-limb decoding and accurate decoding during active control states. A switching decoder, namely a Markov Switching Linear Model (MSLM), has been developed to limit spurious system activations, to prevent parallel limb movements and to accurately decode complex movements.The MSLM associates linear models with different possible control states, e.g. activation of a specific limb, specific movement phases. Dynamic state detection is performed by the MSLM, and the probability of each state is used to weight the linear models. The performance of the MSLM decoder was assessed for asynchronous wrist and multi-finger trajectory reconstruction from electrocorticographic signals. It was found to outperform previously reported decoders for the limitation of spurious activations during no-control periods and permitted to improve decoding accuracy during active periods

APA, Harvard, Vancouver, ISO, and other styles

41

Assis, Raul Caram de. "Inferência em modelos de mistura via algoritmo EM estocástico modificado." Universidade Federal de São Carlos, 2017. https://repositorio.ufscar.br/handle/ufscar/9047.

Full text

Abstract:

Submitted by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:30Z No. of bitstreams: 1 DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5)
Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:38Z (GMT) No. of bitstreams: 1 DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5)
Approved for entry into archive by Ronildo Prado (ronisp@ufscar.br) on 2017-08-22T14:32:44Z (GMT) No. of bitstreams: 1 DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5)
Made available in DSpace on 2017-08-22T14:32:50Z (GMT). No. of bitstreams: 1 DissRCA.pdf: 1727058 bytes, checksum: 78d5444e767bf066e768b88a3a9ab535 (MD5) Previous issue date: 2017-06-02
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
We present the topics and theory of Mixture Models in a context of maximum likelihood and Bayesian inferece. We approach clustering methods in both contexts, with emphasis on the stochastic EM algorithm and the Dirichlet Process Mixture Model. We propose a new method, a modified stochastic EM algorithm, which can be used to estimate the parameters of a mixture model and the number of components.
Apresentamos o tópico e a teoria de Modelos de Mistura de Distribuições, revendo aspectos teóricos e interpretações de tais misturas. Desenvolvemos a teoria dos modelos nos contextos de máxima verossimilhança e de inferência bayesiana. Abordamos métodos de agrupamento já existentes em ambos os contextos, com ênfase em dois métodos, o algoritmo EM estocástico no contexto de máxima verossimilhança e o Modelo de Mistura com Processos de Dirichlet no contexto bayesiano. Propomos um novo método, uma modificação do algoritmo EM Estocástico, que pode ser utilizado para estimar os parâmetros de uma mistura de componentes enquanto permite soluções com número distinto de grupos.

APA, Harvard, Vancouver, ISO, and other styles

42

O'Leary, Rebecca A. "Informed statistical modelling of habitat suitability for rare and threatened species." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17779/1/Rebecca_O%27Leary_Thesis.pdf.

Full text

Abstract:

In this thesis a number of statistical methods have been developed and applied to habitat suitability modelling for rare and threatened species. Data available on these species are typically limited. Therefore, developing these models from these data can be problematic and may produce prediction biases. To address these problems there are three aims of this thesis. The _rst aim is to develop and implement frequentist and Bayesian statistical modelling approaches for these types of data. The second aim is develop and implement expert elicitation methods. The third aim is to apply these novel approaches to Australian rare and threatened species case studies with the intention of habitat suitability modelling. The _rst aim is ful_lled by investigating two innovative approaches for habitat suitability modelling and sensitivity analysis of the second approach to priors. The _rst approach is a new multilevel framework developed to model the species distribution at multiple scales and identify excess zeros (absences outside the species range). Applying a statistical modelling approach to the identi_cation of excess zeros has not previously been conducted. The second approach is an extension and application of Bayesian classi_cation trees to modelling the habitat suitability of a threatened species. This is the _rst `real' application of this approach in ecology. Lastly, sensitivity analysis of the priors in Bayesian classi_cation trees are examined for a real case study. Previously, sensitivity analysis of this approach to priors has not been examined. To address the second aim, expert elicitation methods are developed, extended and compared in this thesis. In particular, one elicitation approach is extended from previous research, there is a comparison of three elicitation methods, and one new elicitation approach is proposed. These approaches are illustrated for habitat suitability modelling of a rare species and the opinions of one or two experts are elicited. The _rst approach utilises a simple questionnaire, in which expert opinion is elicited on whether increasing values of a covariate either increases, decreases or does not substantively impact on a response. This approach is extended to express this information as a mixture of three normally distributed prior distributions, which are then combined with available presence/absence data in a logistic regression. This is one of the _rst elicitation approaches within the habitat suitability modelling literature that is appropriate for experts with limited statistical knowledge and can be used to elicit information from single or multiple experts. Three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression are compared, one of which is the questionnaire approach. Included in this comparison of three elicitation methods are a summary of the advantages and disadvantages of these three methods, the results from elicitations and comparison of the prior and posterior distributions. An expert elicitation approach is developed for classi_cation trees, in which the size and structure of the tree is elicited. There have been numerous elicitation approaches proposed for logistic regression, however no approaches have been suggested for classi_cation trees. The last aim of this thesis is addressed in all chapters, since the statistical approaches proposed and extended in this thesis have been applied to real case studies. Two case studies have been examined in this thesis. The _rst is the rare native Australian thistle (Stemmacantha australis), in which the dataset contains a large number of absences distributed over the majority of Queensland, and a small number of presence sites that are only within South-East Queensland. This case study motivated the multilevel modelling framework. The second case study is the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The application and sensitivity analysis of Bayesian classi_cation trees, and all expert elicitation approaches investigated in this thesis are applied to this case study. This work has several implications for conservation and management of rare and threatened species. Novel statistical approaches addressing the _rst aim provide extensions to currently existing methods, or propose a new approach, for identi _cation of current and potential habitat. We demonstrate that better model predictions can be achieved using each method, compared to standard techniques. Elicitation approaches addressing the second aim ensure expert knowledge in various forms can be harnessed for habitat modelling, a particular bene_t for rare and threatened species which typically have limited data. Throughout, innovations in statistical methodology are both motivated and illustrated via habitat modelling for two rare and threatened species: the native thistle Stemmacantha australis and the brush-tailed rock wallaby Petrogale penicillata.

APA, Harvard, Vancouver, ISO, and other styles

43

O'Leary, Rebecca A. "Informed statistical modelling of habitat suitability for rare and threatened species." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17779/.

Full text

Abstract:

In this thesis a number of statistical methods have been developed and applied to habitat suitability modelling for rare and threatened species. Data available on these species are typically limited. Therefore, developing these models from these data can be problematic and may produce prediction biases. To address these problems there are three aims of this thesis. The _rst aim is to develop and implement frequentist and Bayesian statistical modelling approaches for these types of data. The second aim is develop and implement expert elicitation methods. The third aim is to apply these novel approaches to Australian rare and threatened species case studies with the intention of habitat suitability modelling. The _rst aim is ful_lled by investigating two innovative approaches for habitat suitability modelling and sensitivity analysis of the second approach to priors. The _rst approach is a new multilevel framework developed to model the species distribution at multiple scales and identify excess zeros (absences outside the species range). Applying a statistical modelling approach to the identi_cation of excess zeros has not previously been conducted. The second approach is an extension and application of Bayesian classi_cation trees to modelling the habitat suitability of a threatened species. This is the _rst `real' application of this approach in ecology. Lastly, sensitivity analysis of the priors in Bayesian classi_cation trees are examined for a real case study. Previously, sensitivity analysis of this approach to priors has not been examined. To address the second aim, expert elicitation methods are developed, extended and compared in this thesis. In particular, one elicitation approach is extended from previous research, there is a comparison of three elicitation methods, and one new elicitation approach is proposed. These approaches are illustrated for habitat suitability modelling of a rare species and the opinions of one or two experts are elicited. The _rst approach utilises a simple questionnaire, in which expert opinion is elicited on whether increasing values of a covariate either increases, decreases or does not substantively impact on a response. This approach is extended to express this information as a mixture of three normally distributed prior distributions, which are then combined with available presence/absence data in a logistic regression. This is one of the _rst elicitation approaches within the habitat suitability modelling literature that is appropriate for experts with limited statistical knowledge and can be used to elicit information from single or multiple experts. Three relatively new approaches to eliciting expert knowledge in a form suitable for Bayesian logistic regression are compared, one of which is the questionnaire approach. Included in this comparison of three elicitation methods are a summary of the advantages and disadvantages of these three methods, the results from elicitations and comparison of the prior and posterior distributions. An expert elicitation approach is developed for classi_cation trees, in which the size and structure of the tree is elicited. There have been numerous elicitation approaches proposed for logistic regression, however no approaches have been suggested for classi_cation trees. The last aim of this thesis is addressed in all chapters, since the statistical approaches proposed and extended in this thesis have been applied to real case studies. Two case studies have been examined in this thesis. The _rst is the rare native Australian thistle (Stemmacantha australis), in which the dataset contains a large number of absences distributed over the majority of Queensland, and a small number of presence sites that are only within South-East Queensland. This case study motivated the multilevel modelling framework. The second case study is the threatened Australian brush-tailed rock-wallaby (Petrogale penicillata). The application and sensitivity analysis of Bayesian classi_cation trees, and all expert elicitation approaches investigated in this thesis are applied to this case study. This work has several implications for conservation and management of rare and threatened species. Novel statistical approaches addressing the _rst aim provide extensions to currently existing methods, or propose a new approach, for identi _cation of current and potential habitat. We demonstrate that better model predictions can be achieved using each method, compared to standard techniques. Elicitation approaches addressing the second aim ensure expert knowledge in various forms can be harnessed for habitat modelling, a particular bene_t for rare and threatened species which typically have limited data. Throughout, innovations in statistical methodology are both motivated and illustrated via habitat modelling for two rare and threatened species: the native thistle Stemmacantha australis and the brush-tailed rock wallaby Petrogale penicillata.

APA, Harvard, Vancouver, ISO, and other styles

44

Theeranaew, Wanchat. "STUDY ON INFORMATION THEORY: CONNECTION TO CONTROL THEORY, APPROACH AND ANALYSIS FOR COMPUTATION." Case Western Reserve University School of Graduate Studies / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=case1416847576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Lundberg, Magdalena. "Observing the unobservable? : Segmentation of tourism expenditure in Venice usingunobservable heterogeneity to find latent classes." Thesis, Högskolan Dalarna, Nationalekonomi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:du-28060.

Full text

Abstract:

Consumer segmentation based on expenditure are usually done by using observedcharacteristics, such as age and income. This thesis highlights the problem with negativeexternalities which Venice suffers from, due to mass tourism. This thesis aims to assesswhether unobservable heterogeneity can be used to detect latent classes within tourismexpenditure. Segmenting the tourism market using this approach is valuable for policy making.Segmenting is also useful for the actors in the market to identify and attract high spenders. Inthat way, a destination may uphold a sustainable level of tourism instead of increasing touristnumbers. The method used for this approach is finite mixture modelling (FMM), which is notmuch used within consumer markets and therefore this thesis also contributes to tourismexpenditure methodology. This thesis adds to the literature by increasing the knowledge aboutthe importance of unobserved factors when segmenting visitors.The results show that four latent classes are found in tourism expenditure. Some of thevariables, which are significant in determining tourism expenditure, are shown to affectexpenditure differently in different classes while some are shown not to be significant. Theconclusions are that segmenting tourism expenditure, using unobserved heterogeneity, issignificant and that variables, which are barely significant in determining the expenditure ofthe population, can be strongly significant in determining the expenditure for a certain class.

APA, Harvard, Vancouver, ISO, and other styles

46

Pradella, Lorenzo. "A data-driven prognostic approach based on AR identification and hidden Markov models." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

In this work a data-driven prognostic approach based on AutoRegressive (AR) estimation and hidden Markov models (HMMs) is addressed. In particular, the approach is capable of achieving Prognostic and Health Management (PHM) tasks such as real time detection and Remaining Useful Life (RUL) estimation. The approach can be seen as composed of a training part (offline) and an exploitation part (online). The offline part relies upon the use of a scalar health indicator coming from the system identification field: the Itakura Saito (IS) spectral distance. In particular, raw acceleration data, gathered in an unsupervised framework from the machine, are modeled by AR processes and then transformed into IS. Then, HMMs are used to map such IS signals into a finite number of parameters. Moreover, in the training procedure of HMMs, a left-to-right clustering of unsupervised data, based on Mixture of Gaussians (MOG) distribution is proposed. During the online exploitation a simulation of a running signal is tested against trained ones in order to carry out PHM tasks in real time. Simulations have been performed using a public benchmark available in ”NASA prognostic data repository”. It contains run-to-failure tests on bearings, on which acceleration signals are gathered. In particular the gathering experiment simulates an industry application, under constant operating conditions. Results of simulations, performed on real time data, validate the proposed prognostic approach and make the combined use of IS an HMMs a reliable way in achieving PHM goals.

APA, Harvard, Vancouver, ISO, and other styles

47

Ozturk, Mahir. "Markov Random Field Based Road Network Extraction From High Resoulution Satellite Images." Master's thesis, METU, 2013. http://etd.lib.metu.edu.tr/upload/12615499/index.pdf.

Full text

Abstract:

Road Networks play an important role in various applications such as urban and rural planning, infrastructure planning, transportation management, vehicle navigation. Extraction of Roads from Remote Sensed satellite images for updating road database in geographical information systems (GIS) is generally done manually by a human operator. However, manual extraction of roads is time consuming and labor intensive process. In the existing literature, there are a great number of researches published for the purpose of automating the road extraction process. However, automated processes still yield some erroneous and incomplete results and human intervention is still required. The aim of this research is to propose a framework for road network extraction from high spatial resolution multi-spectral imagery (MSI) to improve the accuracy of road extraction systems. The proposed framework begins with a spectral classification using One-class Support Vector Machines (SVM) and Gaussian Mixture Models (GMM) classifiers. Spectral Classification exploits the spectral signature of road surfaces to classify road pixels. Then, an iterative template matching filter is proposed to refine spectral classification results. K-medians clustering algorithm is employed to detect candidate road centerline points. Final road network formation is achieved by Markov Random Fields. The extracted road network is evaluated against a reference dataset using a set of quality metrics.

APA, Harvard, Vancouver, ISO, and other styles

48

Van, Heerden Charl Johannes. "Phoneme duration modelling for speaker verification." Diss., Pretoria : [s.n.], 2009. http://upetd.up.ac.za/thesis/available/etd-06262009-150945/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Ben, Youssef Atef. "Contrôle de têtes parlantes par inversion acoustico-articulatoire pour l’apprentissage et la réhabilitation du langage." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENT088/document.

Full text

Abstract:

Les sons de parole peuvent être complétés par l'affichage des articulateurs sur un écran d'ordinateur pour produire de la parole augmentée, un signal potentiellement utile dans tous les cas où le son lui-même peut être difficile à comprendre, pour des raisons physiques ou perceptuelles. Dans cette thèse, nous présentons un système appelé retour articulatoire visuel, dans lequel les articulateurs visibles et non visibles d'une tête parlante sont contrôlés à partir de la voix du locuteur. La motivation de cette thèse était de développer un tel système qui pourrait être appliqué à l'aide à l'apprentissage de la prononciation pour les langues étrangères, ou dans le domaine de l'orthophonie. Nous avons basé notre approche de ce problème d'inversion sur des modèles statistiques construits à partir de données acoustiques et articulatoires enregistrées sur un locuteur français à l'aide d'un articulographe électromagnétique (EMA). Notre approche avec les modèles de Markov cachés (HMMs) combine des techniques de reconnaissance automatique de la parole et de synthèse articulatoire pour estimer les trajectoires articulatoires à partir du signal acoustique. D'un autre côté, les modèles de mélanges gaussiens (GMMs) estiment directement les trajectoires articulatoires à partir du signal acoustique sans faire intervenir d'information phonétique. Nous avons basé notre évaluation des améliorations apportées à ces modèles sur différents critères : l'erreur quadratique moyenne (RMSE) entre les coordonnées EMA originales et reconstruites, le coefficient de corrélation de Pearson, l'affichage des espaces et des trajectoires articulatoires, aussi bien que les taux de reconnaissance acoustique et articulatoire. Les expériences montrent que l'utilisation d'états liés et de multi-gaussiennes pour les états des HMMs acoustiques améliore l'étage de reconnaissance acoustique des phones, et que la minimisation de l'erreur générée (MGE) dans la phase d'apprentissage des HMMs articulatoires donne des résultats plus précis par rapport à l'utilisation du critère plus conventionnel de maximisation de vraisemblance (MLE). En outre, l'utilisation du critère MLE au niveau de mapping direct de l'acoustique vers l'articulatoire par GMMs est plus efficace que le critère de minimisation de l'erreur quadratique moyenne (MMSE). Nous constatons également trouvé que le système d'inversion par HMMs est plus précis celui basé sur les GMMs. Par ailleurs, des expériences utilisant les mêmes méthodes statistiques et les mêmes données ont montré que le problème de reconstruction des mouvements de la langue à partir des mouvements du visage et des lèvres ne peut pas être résolu dans le cas général, et est impossible pour certaines classes phonétiques. Afin de généraliser notre système basé sur un locuteur unique à un système d'inversion de parole multi-locuteur, nous avons implémenté une méthode d'adaptation du locuteur basée sur la maximisation de la vraisemblance par régression linéaire (MLLR). Dans cette méthode MLLR, la transformation basée sur la régression linéaire qui adapte les HMMs acoustiques originaux à ceux du nouveau locuteur est calculée de manière à maximiser la vraisemblance des données d'adaptation. Finalement, cet étage d'adaptation du locuteur a été évalué en utilisant un système de reconnaissance automatique des classes phonétique de l'articulation, dans la mesure où les données articulatoires originales du nouveau locuteur n'existent pas. Finalement, en utilisant cette procédure d'adaptation, nous avons développé un démonstrateur complet de retour articulatoire visuel, qui peut être utilisé par un locuteur quelconque. Ce système devra être évalué de manière perceptive dans des conditions réalistes
Speech sounds may be complemented by displaying speech articulators shapes on a computer screen, hence producing augmented speech, a signal that is potentially useful in all instances where the sound itself might be difficult to understand, for physical or perceptual reasons. In this thesis, we introduce a system called visual articulatory feedback, in which the visible and hidden articulators of a talking head are controlled from the speaker's speech sound. The motivation of this research was to develop such a system that could be applied to Computer Aided Pronunciation Training (CAPT) for learning of foreign languages, or in the domain of speech therapy. We have based our approach to this mapping problem on statistical models build from acoustic and articulatory data. In this thesis we have developed and evaluated two statistical learning methods trained on parallel synchronous acoustic and articulatory data recorded on a French speaker by means of an electromagnetic articulograph. Our Hidden Markov models (HMMs) approach combines HMM-based acoustic recognition and HMM-based articulatory synthesis techniques to estimate the articulatory trajectories from the acoustic signal. Gaussian mixture models (GMMs) estimate articulatory features directly from the acoustic ones. We have based our evaluation of the improvement results brought to these models on several criteria: the Root Mean Square Error between the original and recovered EMA coordinates, the Pearson Product-Moment Correlation Coefficient, displays of the articulatory spaces and articulatory trajectories, as well as some acoustic or articulatory recognition rates. Experiments indicate that the use of states tying and multi-Gaussian per state in the acoustic HMM improves the recognition stage, and that the minimum generation error (MGE) articulatory HMMs parameter updating results in a more accurate inversion than the conventional maximum likelihood estimation (MLE) training. In addition, the GMM mapping using MLE criteria is more efficient than using minimum mean square error (MMSE) criteria. In conclusion, we have found that the HMM inversion system has a greater accuracy compared with the GMM one. Beside, experiments using the same statistical methods and data have shown that the face-to-tongue inversion problem, i.e. predicting tongue shapes from face and lip shapes cannot be solved in a general way, and that it is impossible for some phonetic classes. In order to extend our system based on a single speaker to a multi-speaker speech inversion system, we have implemented a speaker adaptation method based on the maximum likelihood linear regression (MLLR). In MLLR, a linear regression-based transform that adapts the original acoustic HMMs to those of the new speaker was calculated to maximise the likelihood of adaptation data. Finally, this speaker adaptation stage has been evaluated using an articulatory phonetic recognition system, as there are not original articulatory data available for the new speakers. Finally, using this adaptation procedure, we have developed a complete articulatory feedback demonstrator, which can work for any speaker. This system should be assessed by perceptual tests in realistic conditions

APA, Harvard, Vancouver, ISO, and other styles

50

Gurrapu, Chaitanya. "Human Action Recognition In Video Data For Surveillance Applications." Thesis, Queensland University of Technology, 2004. https://eprints.qut.edu.au/15878/1/Chaitanya_Gurrapu_Thesis.pdf.

Full text

Abstract:

Detecting human actions using a camera has many possible applications in the security industry. When a human performs an action, his/her body goes through a signature sequence of poses. To detect these pose changes and hence the activities performed, a pattern recogniser needs to be built into the video system. Due to the temporal nature of the patterns, Hidden Markov Models (HMM), used extensively in speech recognition, were investigated. Initially a gesture recognition system was built using novel features. These features were obtained by approximating the contour of the foreground object with a polygon and extracting the polygon's vertices. A Gaussian Mixture Model (GMM) was fit to the vertices obtained from a few frames and the parameters of the GMM itself were used as features for the HMM. A more practical activity detection system using a more sophisticated foreground segmentation algorithm immune to varying lighting conditions and permanent changes to the foreground was then built. The foreground segmentation algorithm models each of the pixel values using clusters and continually uses incoming pixels to update the cluster parameters. Cast shadows were identified and removed by assuming that shadow regions were less likely to produce strong edges in the image than real objects and that this likelihood further decreases after colour segmentation. Colour segmentation itself was performed by clustering together pixel values in the feature space using a gradient ascent algorithm called mean shift. More robust features in the form of mesh features were also obtained by dividing the bounding box of the binarised object into grid elements and calculating the ratio of foreground to background pixels in each of the grid elements. These features were vector quantized to reduce their dimensionality and the resulting symbols presented as features to the HMM to achieve a recognition rate of 62% for an event involving a person writing on a white board. The recognition rate increased to 80% for the "seen" person sequences, i.e. the sequences of the person used to train the models. With a fixed lighting position, the lack of a shadow removal subsystem improved the detection rate. This is because of the consistent profile of the shadows in both the training and testing sequences due to the fixed lighting positions. Even with a lower recognition rate, the shadow removal subsystem was considered an indispensable part of a practical, generic surveillance system.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Mixture Markov Model'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles