Dissertations / Theses on the topic 'Dirichlet modeling'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 49 dissertations / theses for your research on the topic 'Dirichlet modeling.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Heaton, Matthew J. "Temporally Correlated Dirichlet Processes in Pollution Receptor Modeling." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd1861.pdf.
Full textHu, Zhen. "Modeling photonic crystal devices by Dirichlet-to-Neumann maps /." access full-text access abstract and table of contents, 2009. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?phd-ma-b30082559f.pdf.
Full text"Submitted to Department of Mathematics in partial fulfillment of the requirements for the degree of Doctor of Philosophy." Includes bibliographical references (leaves [85]-91)
Gao, Wenyu. "Advanced Nonparametric Bayesian Functional Modeling." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99913.
Full textDoctor of Philosophy
As we have easier access to massive data sets, functional analyses have gained more interest to analyze data providing information about curves, surfaces, or others varying over a continuum. However, such data sets often contain large heterogeneities and noise. When generalizing the analyses from vectors to functions, classical methods might not work directly. This dissertation considers noisy information reduction in functional analyses from two perspectives: functional variable selection to reduce the dimensionality and functional clustering to group similar observations and thus reduce the sample size. The complicated data structures and relations can be easily modeled by a Bayesian hierarchical model due to its flexibility. Hence, this dissertation focuses on the development of nonparametric Bayesian approaches for functional analyses. Our proposed methods can be applied in various applications: the epidemiological studies on aseptic meningitis with clustered binary data, the genetic diabetes data, and breast cancer racial disparities.
Monson, Rebecca Lee. "Modeling Transition Probabilities for Loan States Using a Bayesian Hierarchical Model." Diss., CLICK HERE for online access, 2007. http://contentdm.lib.byu.edu/ETD/image/etd2179.pdf.
Full textlim, woobeen. "Bayesian Semiparametric Joint Modeling of Longitudinal Predictors and Discrete Outcomes." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1618955725276958.
Full textDomingues, Rémi. "Probabilistic Modeling for Novelty Detection with Applications to Fraud Identification." Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS473.pdf.
Full textNovelty detection is the unsupervised problem of identifying anomalies in test data which significantly differ from the training set. While numerous novelty detection methods were designed to model continuous numerical data, tackling datasets composed of mixed-type features, such as numerical and categorical data, or temporal datasets describing discrete event sequences is a challenging task. In addition to the supported data types, the key criteria for efficient novelty detection methods are the ability to accurately dissociate novelties from nominal samples, the interpretability, the scalability and the robustness to anomalies located in the training data. In this thesis, we investigate novel ways to tackle these issues. In particular, we propose (i) a survey of state-of-the-art novelty detection methods applied to mixed-type data, including extensive scalability, memory consumption and robustness tests (ii) a survey of state-of-the-art novelty detection methods suitable for sequence data (iii) a probabilistic nonparametric novelty detection method for mixed-type data based on Dirichlet process mixtures and exponential-family distributions and (iv) an autoencoder-based novelty detection model with encoder/decoder modelled as deep Gaussian processes. The learning of this last model is made tractable and scalable through the use of random feature approximations and stochastic variational inference. The method is suitable for large-scale novelty detection problems and data with mixed-type features. The experiments indicate that the proposed model achieves competitive results with state-of-the-art novelty detection methods
Race, Jonathan Andrew. "Semi-parametric Survival Analysis via Dirichlet Process Mixtures of the First Hitting Time Model." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu157357742741077.
Full textHuo, Shuning. "Bayesian Modeling of Complex High-Dimensional Data." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/101037.
Full textDoctor of Philosophy
With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data.
Liu, Jia. "Heterogeneous Sensor Data based Online Quality Assurance for Advanced Manufacturing using Spatiotemporal Modeling." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/78722.
Full textPh. D.
Bui, Quang Vu. "Pretopology and Topic Modeling for Complex Systems Analysis : Application on Document Classification and Complex Network Analysis." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEP034/document.
Full textThe work of this thesis presents the development of algorithms for document classification on the one hand, or complex network analysis on the other hand, based on pretopology, a theory that models the concept of proximity. The first work develops a framework for document clustering by combining Topic Modeling and Pretopology. Our contribution proposes using topic distributions extracted from topic modeling treatment as input for classification methods. In this approach, we investigated two aspects: determine an appropriate distance between documents by studying the relevance of Probabilistic-Based and Vector-Based Measurements and effect groupings according to several criteria using a pseudo-distance defined from pretopology. The second work introduces a general framework for modeling Complex Networks by developing a reformulation of stochastic pretopology and proposes Pretopology Cascade Model as a general model for information diffusion. In addition, we proposed an agent-based model, Textual-ABM, to analyze complex dynamic networks associated with textual information using author-topic model and introduced Textual-Homo-IC, an independent cascade model of the resemblance, in which homophily is measured based on textual content obtained by utilizing Topic Modeling
Schulte, Lukas. "Investigating topic modeling techniques for historical feature location." Thesis, Karlstads universitet, Institutionen för matematik och datavetenskap (from 2013), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kau:diva-85379.
Full textHu, Xuequn. "Modeling Endogenous Treatment Eects with Heterogeneity: A Bayesian Nonparametric Approach." Scholar Commons, 2011. http://scholarcommons.usf.edu/etd/3159.
Full textZerkoune, Abbas. "Modélisation de l'incertitude géologique par simulation stochastique de cubes de proportions de faciès : application aux réservoirs pétroliers de type carbonaté ou silico-clastique." Phd thesis, Grenoble 1, 2009. http://www.theses.fr/2009GRE10104.
Full textAfter finding out a potential oil field, development decisions are based on uncertain representations of the reservoir. Indeed, its characterisation uses numerical, spatial models of the reservoir. However, if they are representative of subsoil heterogeneities, the uncertainty linked to subsoil complexity remain. Usually, uncertainty is supposed to be assessed using many equiprobable models, which represent the heterogeneities expected into the reservoir. Nevertheless, those alternative images of the underground correspond to multiple realizations of a given and a single stochastic model. Those methods ignore the uncertainty related to the choice of the underlying probabilistic model. This work aims at improving that kind of uncertainty assessment when modelling petroleum reservoir. It conveys the doubt linked with our subsoil properties understanding on probabilistic models, and proposes to integrate it on them. This thesis first defines uncertainty in the context of oil industry modelling, particularly on 3D geological models comprising several litho-types or facies. To build them, we need, before any simulations, to estimate for every point in the space the probability of occurring for each facies : this is the proportions cube. Even thought those probabilities are often poorly known, they are frozen while using current methods of uncertainty assessment. So, the impact of an uncertain geological scenario on the definition of a proportion cube is forgotten. Two methods based on stochastic simulations of alternative, equiprobable proportion cubes have been developed to sample the complete geological uncertainty space. The first one is closely linked to geology. It integrates directly uncertainty related to the parameters composing the geological scenario. Based on a multi-realisation approach, it describes its implementation on every parameters of geological scenario from information at wells to maps or global hypothesis at reservoir scale resolution. A Monte Carlo approach samples the components of the sedimentary scheme. Each drawing enables to build a proportion cube using modelling tools which integrates more or less explicitly parameters of geological scenario. That methodology is illustrated and applied to an modelling process which is used to model marine carbonate deposits. The second method appears to be more geostatistics focussing on proportion cubes. It rather aims at reconcile distinct eventual sedimentary models. In the meshed model symbolising the reservoir, it assesses the probabilistic law of facies proportion in each cells – they are supposed to follow Dirichlet's probabilistic law. That assessment is done from some models inferred from different geological scenarios. Facies proportions are sequentially simulated, cell after cell, introducing a spatial correlation model (variogram), which could be deterministic as probabilistic. Various practical cases, comprising synthetic reservoirs or real field, illustrates and specifies the different steps of the proposed method
Zerkoune, Abbas. "Modélisation de l'incertitude géologique par simulation stochastique de cubes de proportions de faciès - Application aux réservoirs pétroliers de type carbonaté ou silico-clastique." Phd thesis, Université Joseph Fourier (Grenoble), 2009. http://tel.archives-ouvertes.fr/tel-00410136.
Full textHarrysson, Mattias. "Neural probabilistic topic modeling of short and messy text." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189532.
Full textAtt utforska enorma mängder användargenererad data med ämnen postulerar ett nytt sätt att hitta användbar information. Ämnena antas vara “gömda” och måste “avtäckas” med statistiska metoder såsom ämnesmodellering. Dock är användargenererad data generellt sätt kort och stökig t.ex. informella chattkonversationer, mycket slangord och “brus” som kan vara URL:er eller andra former av pseudo-text. Denna typ av data är svår att bearbeta för de flesta algoritmer i naturligt språk, inklusive ämnesmodellering. Det här arbetet har försökt hitta den metod som objektivt ger dem bättre ämnena ur kort och stökig text i en jämförande studie. De metoder som jämfördes var latent Dirichlet allocation (LDA), Re-organized LDA (RO-LDA), Gaussian Mixture Model (GMM) with distributed representation of words samt en egen metod med namnet Neural Probabilistic Topic Modeling (NPTM) baserat på tidigare arbeten. Den slutsats som kan dras är att NPTM har en tendens att ge bättre ämnen på kort och stökig text jämfört med LDA och RO-LDA. GMM lyckades inte ge några meningsfulla resultat alls. Resultaten är mindre bevisande eftersom NPTM har problem med långa körtider vilket innebär att tillräckligt många stickprov inte kunde erhållas för ett statistiskt test.
Simonnet, Titouan. "Apprentissage et réseaux de neurones en tomographie par diffraction de rayons X. Application à l'identification minéralogique." Electronic Thesis or Diss., Orléans, 2024. http://www.theses.fr/2024ORLE1033.
Full textUnderstanding the chemical and mechanical behavior of compacted materials (e.g. soil, subsoil, engineered materials) requires a quantitative description of the material's structure, and in particular the nature of the various mineralogical phases and their spatial relationships. Natural materials, however, are composed of numerous small-sized minerals, frequently mixed on a small scale. Recent advances in synchrotron-based X-ray diffraction tomography (to be distinguished from phase contrast tomography) now make it possible to obtain tomographic volumes with nanometer-sized voxels, with a XRD pattern for each of these voxels (where phase contrast only gives a gray level). On the other hand, the sheer volume of data (typically on the order of 100~000 XRD patterns per sample slice), combined with the large number of phases present, makes quantitative processing virtually impossible without appropriate numerical codes. This thesis aims to fill this gap, using neural network approaches to identify and quantify minerals in a material. Training such models requires the construction of large-scale learning bases, which cannot be made up of experimental data alone.Algorithms capable of synthesizing XRD patterns to generate these bases have therefore been developed.The originality of this work also concerned the inference of proportions using neural networks. To meet this new and complex task, adapted loss functions were designed.The potential of neural networks was tested on data of increasing complexity: (i) from XRD patterns calculated from crystallographic information, (ii) using experimental powder XRD patterns measured in the laboratory, (iii) on data obtained by X-ray tomography. Different neural network architectures were also tested. While a convolutional neural network seemed to provide interesting results, the particular structure of the diffraction signal (which is not translation invariant) led to the use of models such as Transformers. The approach adopted in this thesis has demonstrated its ability to quantify mineral phases in a solid. For more complex data, such as tomography, improvements have been proposed
Johansson, Richard, and Heino Otto Engström. "Topic propagation over time in internet security conferences : Topic modeling as a tool to investigate trends for future research." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177748.
Full textMalsiner-Walli, Gertraud, Sylvia Frühwirth-Schnatter, and Bettina Grün. "Model-based clustering based on sparse finite Gaussian mixtures." Springer, 2016. http://dx.doi.org/10.1007/s11222-014-9500-2.
Full textLindgren, Jennifer. "Evaluating Hierarchical LDA Topic Models for Article Categorization." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-167080.
Full textLe, Hai-Son Phuoc. "Probabilistic Models for Collecting, Analyzing, and Modeling Expression Data." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/245.
Full textApelthun, Catharina. "Topic modeling on a classical Swedish text corpus of prose fiction : Hyperparameters’ effect on theme composition and identification of writing style." Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-441653.
Full textKhan, Mohammed Salman. "A Topic Modeling approach for Code Clone Detection." UNF Digital Commons, 2019. https://digitalcommons.unf.edu/etd/874.
Full textPark, Kyoung Jin. "Generating Thematic Maps from Hyperspectral Imagery Using a Bag-of-Materials Model." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1366296426.
Full textSUI, ZHENHUAN. "Hierarchical Text Topic Modeling with Applications in Social Media-Enabled Cyber Maintenance Decision Analysis and Quality Hypothesis Generation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1499446404436637.
Full textCedervall, Andreas, and Daniel Jansson. "Topic classification of Monetary Policy Minutes from the Swedish Central Bank." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240403.
Full textUnder de senaste åren har artificiell intelligens och maskininlärning fått mycket uppmärksamhet och växt otroligt. Tidigare manuella arbeten blir nu automatiserade och mycket tyder på att utvecklingen kommer att fortsätta i en hög takt. Detta arbete bygger vidare på arbeten inom topic modeling (ämnesklassifikation) och applicera detta i ett tidigare outforskat område, riksbanksprotokoll. Latent Dirichlet Allocation och Neural Network används för att undersöka huruvida fördelningen av diskussionspunkter (topics) förändras över tid. Slutligen presenteras en teoretisk diskussion av det potentiella affärsvärdet i att implementera en liknande metod. Resultaten för de olika modellerna uppvisar stora skillnader över tid. Medan Latent Dirichlet Allocation inte finner några större trender i diskussionspunkter visar Neural Network på större förändringar över tid. De senare stämmer dessutom väl överens med andra observationer såsom påbörjandet av obligationsköp. Därav indikerar resultaten att Neural Network är en mer lämplig metod för analys av riksbankens mötesprotokoll.
Schneider, Bruno. "Visualização em multirresolução do fluxo de tópicos em coleções de texto." reponame:Repositório Institucional do FGV, 2014. http://hdl.handle.net/10438/11745.
Full textApproved for entry into archive by Janete de Oliveira Feitosa (janete.feitosa@fgv.br) on 2014-05-13T12:56:21Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5)
Approved for entry into archive by Marcia Bacha (marcia.bacha@fgv.br) on 2014-05-14T19:44:51Z (GMT) No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5)
Made available in DSpace on 2014-05-14T19:45:33Z (GMT). No. of bitstreams: 1 dissertacao_bruno_schneider.pdf.pdf: 8019497 bytes, checksum: 70ff1fddb844b630666397e95c188672 (MD5) Previous issue date: 2014-03-21
The combined use of algorithms for topic discovery in document collections with topic flow visualization techniques allows the exploration of thematic patterns in long corpus. In this task, those patterns could be revealed through compact visual representations. This research has investigated the requirements for viewing data about the thematic composition of documents obtained through topic modeling - where datasets are sparse and has multi-attributes - at different levels of detail through the development of an own technique and the use of an open source library for data visualization, comparatively. About the studied problem of topic flow visualization, we observed the presence of conflicting requirements for data display in different resolutions, which led to detailed investigation on ways of manipulating and displaying this data. In this study, the hypothesis put forward was that the integrated use of more than one visualization technique according to the resolution of data expands the possibilities for exploitation of the object under study in relation to what would be obtained using only one method. The exhibition of the limits on the use of these techniques according to the resolution of data exploration is the main contribution of this work, in order to provide subsidies for the development of new applications.
O uso combinado de algoritmos para a descoberta de tópicos em coleções de documentos com técnicas orientadas à visualização da evolução daqueles tópicos no tempo permite a exploração de padrões temáticos em corpora extensos a partir de representações visuais compactas. A pesquisa em apresentação investigou os requisitos de visualização do dado sobre composição temática de documentos obtido através da modelagem de tópicos – o qual é esparso e possui multiatributos – em diferentes níveis de detalhe, através do desenvolvimento de uma técnica de visualização própria e pelo uso de uma biblioteca de código aberto para visualização de dados, de forma comparativa. Sobre o problema estudado de visualização do fluxo de tópicos, observou-se a presença de requisitos de visualização conflitantes para diferentes resoluções dos dados, o que levou à investigação detalhada das formas de manipulação e exibição daqueles. Dessa investigação, a hipótese defendida foi a de que o uso integrado de mais de uma técnica de visualização de acordo com a resolução do dado amplia as possibilidades de exploração do objeto em estudo em relação ao que seria obtido através de apenas uma técnica. A exibição dos limites no uso dessas técnicas de acordo com a resolução de exploração do dado é a principal contribuição desse trabalho, no intuito de dar subsídios ao desenvolvimento de novas aplicações.
Moon, Gordon Euhyun. "Parallel Algorithms for Machine Learning." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1561980674706558.
Full textChiron, Guillaume. "Système complet d’acquisition vidéo, de suivi de trajectoires et de modélisation comportementale pour des environnements 3D naturellement encombrés : application à la surveillance apicole." Thesis, La Rochelle, 2014. http://www.theses.fr/2014LAROS030/document.
Full textThis manuscript provides the basis for a complete chain of videosurveillence for naturally cluttered environments. In the latter, we identify and solve the wide spectrum of methodological and technological barriers inherent to : 1) the acquisition of video sequences in natural conditions, 2) the image processing problems, 3) the multi-target tracking ambiguities, 4) the discovery and the modeling of recurring behavioral patterns, and 5) the data fusion. The application context of our work is the monitoring of honeybees, and in particular the study of the trajectories bees in flight in front of their hive. In fact, this thesis is part a feasibility and prototyping study carried by the two interdisciplinary projects EPERAS and RISQAPI (projects undertaken in collaboration with INRA institute and the French National Museum of Natural History). It is for us, computer scientists, and for biologists who accompanied us, a completely new area of investigation for which the scientific knowledge, usually essential for such applications, are still in their infancy. Unlike existing approaches for monitoring insects, we propose to tackle the problem in the three-dimensional space through the use of a high frequency stereo camera. In this context, we detail our new target detection method which we called HIDS segmentation. Concerning the computation of trajectories, we explored several tracking approaches, relying on more or less a priori, which are able to deal with the extreme conditions of the application (e.g. many targets, small in size, following chaotic movements). Once the trajectories are collected, we organize them according to a given hierarchical data structure and apply a Bayesian nonparametric approach for discovering emergent behaviors within the colony of insects. The exploratory analysis of the trajectories generated by the crowded scene is performed following an unsupervised classification method simultaneously over different levels of semantic, and where the number of clusters for each level is not defined a priori, but rather estimated from the data only. This approach is has been validated thanks to a ground truth generated by a Multi-Agent System. Then we tested it in the context of real data
Ladouceur, Martin. "Modelling continuous digagnostic test data using Dirichlet process prios distributions." Thesis, McGill University, 2009. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=95623.
Full textLes tests diagnostiques sont abondamment utilisés en médecine et en épidémiologie. Laplupart d'entre eux ne distinguent pas parfaitement les sujets qui ont ou non la conditiond'intérêt, et ceux qui fournissent des résultats sur une échelle continue ont souvent desdensités des résultats des sujets malades et non-malades qui se chevauchent. Pour cestests continus, la plupart des techniques statistiques développées jusqu'à présentprésument une famille de distributions paramétriques des résultats dans les 2 groupes, unehypothèse pratique mais souvent non vérifiable. De plus, l'évaluation de leurs propriétésrequiert typiquement qu'un test étalon d'or soit disponible. [...]
Jaradat, Shatha. "OLLDA: Dynamic and Scalable Topic Modelling for Twitter : AN ONLINE SUPERVISED LATENT DIRICHLET ALLOCATION ALGORITHM." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177535.
Full textTillhandahålla högkvalitativa ämnen slutsats i dagens stora och dynamiska korpusar, såsom Twitter, är en utmanande uppgift. Detta är särskilt utmanande med tanke på att innehållet i den här miljön innehåller korta texter och många förkortningar. Projektet föreslår en förbättring med en populär online ämnen modellering algoritm för Latent Dirichlet Tilldelning (LDA), genom att införliva tillsyn för att göra den lämplig för Twitter sammanhang. Denna förbättring motiveras av behovet av en enda algoritm som uppnår båda målen: analysera stora mängder av dokument, inklusive nya dokument som anländer i en bäck, och samtidigt uppnå hög kvalitet på ämnen "upptäckt i speciella fall miljöer, till exempel som Twitter. Den föreslagna algoritmen är en kombination av en online-algoritm för LDA och en övervakad variant av LDA - Labeled LDA. Prestanda och kvalitet av den föreslagna algoritmen jämförs med dessa två algoritmer. Resultaten visar att den föreslagna algoritmen har visat bättre prestanda och kvalitet i jämförelse med den övervakade varianten av LDA, och det uppnådde bättre resultat i fråga om kvalitet i jämförelse med den online-algoritmen. Dessa förbättringar gör vår algoritm till ett attraktivt alternativ när de tillämpas på dynamiska miljöer, som Twitter. En miljö för att analysera och märkning uppgifter är utformad för att förbereda dataset innan du utför experimenten. Möjliga användningsområden för den föreslagna algoritmen är tweets rekommendation och trender upptäckt.
Habli, Nada. "Nonparametric Bayesian Modelling in Machine Learning." Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34267.
Full textHalmann, Marju. "Email Mining Classifier : The empirical study on combining the topic modelling with Random Forest classification." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-14710.
Full textDéhaye, Vincent. "Characterisation of a developer’s experience fields using topic modelling." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-171946.
Full textChazel, Florent. "Influence de la topographie sur les ondes de surface." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2007. http://tel.archives-ouvertes.fr/tel-00200419.
Full textBakharia, Aneesha. "Interactive content analysis : evaluating interactive variants of non-negative Matrix Factorisation and Latent Dirichlet Allocation as qualitative content analysis aids." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/76535/1/Aneesha_Bakharia_Thesis.pdf.
Full textAspilaire, Roseman. "Économie informelle en Haïti, marché du travail et pauvreté : analyses quantitatives." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC0122/document.
Full textThe predominance of the informal sector in the economy of Haiti, where more than 80% of the population lives below the threshold of poverty and more than 35% unemployed, suggests links between the informal economy, poverty and the labour market. Highlight these interrelationships, requires an assessment of the informal economy, which is the subject of the four chapters of this thesis, dealing successively with the evolution of the macroeconomic situation, human capital, the informal earnings of workers, and the segmentation of the labour market.The first chapter made a diagnosis of the phenomenon according to the State of affairs of the developed theories and the evolution of the macroeconomic framework of Haiti from 1980 to 2010. And then offers a macroeconomic assessment of the informal sector as a percentage of GDP from a PLS (Partial Least Squares).Chapter two sets out the relationship between the evolution of the informal economy, deregulation and neo-liberal policies through a LISREL (Linear Structural Relations) model. We look at the impact of the budgetary, fiscal and monetary policies of the past 30 years on the informal economy. We also reassess the causes of the evolution of the informal economy generally evoked by the empirical studies (taxes, social security).In the chapter three, we analyse the micro-real dimension of the informal economy through a model of the Mincer earnings estimated by the equations logit from data in a national survey on employment and the informal economy (EEEI) in 2007. We analyse the determinants of informal gains in terms of the position of the market workers (employees, entrepreneurs and self-employed); and revenues (formal and informal) and the socio-economic characteristics of the working poor and non-poor compared to the poverty line.In chapter four, we first test the competitiveness and the segmentation of the labour market by making use of model of Roy and the expanded Roy model through an estimate a model Tobit. We use a model of Dirichlet process: first analyse the segmentation and possible informal work and market competitiveness as its determinants, according to data from the EEEI 2007; then, to distinguish the fundamental characteristics of the involuntary informal (excluded from the formal labour market) than the voluntary informal who gain comparative advantages
White, Nicole. "Bayesian mixtures for modelling complex medical data : a case study in Parkinson’s disease." Thesis, Queensland University of Technology, 2011. https://eprints.qut.edu.au/48202/1/Nicole_White_Thesis.pdf.
Full textFicapal, Vila Joan. "Anemone: a Visual Semantic Graph." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252810.
Full textSemantiska grafer har använts för att optimera olika processer för naturlig språkbehandling samt för att förbättra sökoch informationsinhämtningsuppgifter. I de flesta fall har sådana semantiska grafer konstruerats genom övervakade maskininlärningsmetoder som förutsätter manuellt kurerade ontologier såsom Wikipedia eller liknande. I denna uppsats, som består av två delar, undersöker vi i första delen möjligheten att automatiskt generera en semantisk graf från ett ad hoc dataset bestående av 50 000 tidningsartiklar på ett helt oövervakat sätt. Användbarheten hos den visuella representationen av den resulterande grafen testas på 14 försökspersoner som utför grundläggande informationshämtningsuppgifter på en delmängd av artiklarna. Vår studie visar att vår funktionalitet är lönsam för att hitta och dokumentera likhet med varandra, och den visuella kartan som produceras av vår artefakt är visuellt användbar. I den andra delen utforskar vi möjligheten att identifiera entitetsrelationer på ett oövervakat sätt genom att använda abstraktiva djupa inlärningsmetoder för meningsomformulering. De omformulerade meningarna utvärderas kvalitativt med avseende på grammatisk korrekthet och meningsfullhet såsom detta uppfattas av 14 testpersoner. Vi utvärderar negativt resultaten av denna andra del, eftersom de inte har varit tillräckligt bra för att få någon definitiv slutsats, men har istället öppnat nya dörrar för att utforska.
GIOVANNINI, STEFANO. "Verso un indice "convergente" dell'impatto dei prodotti culturali italiani in Cina." Doctoral thesis, Università Cattolica del Sacro Cuore, 2022. http://hdl.handle.net/10280/122043.
Full textThe thesis’s hypothesis is the possibility of the creation of a model for the prediction of any Italian media product in the Chinese market, by “success” meaning a cultural-economic impact equal to that of some benchmark products of the last years (the novel L’amica geniale, the film Perfetti sconosciuti, the TV-series My Brilliant Friend). Cultural impact is measured as the generation of online discourse, while the economic impact by traditional and digital indicators, for physical and online distributions respectively. The three cases used as benchmarks served to provide online discourse material from which to extract predictive variables by Python 3-supported LDA topic modelling and sentiment analysis. In chapter 1 Digital Humanities (DH) are explored and defined as a field of study. Chapter 2 summarises the main theories about cultural production. Chapter 3 details the methodology and its empirical application, also containing a section devoted to testing, which in turn confirmed the model’s reliability. Results show that DH’s tools were a proper choice, while last century’s theories of cultural production may benefit from updates. Three sets of variables to predict Italian media products in China were identified.
Mercado, Salazar Jorge Anibal, and S. M. Masud Rana. "A Confirmatory Analysis for Automating the Evaluation of Motivation Letters to Emulate Human Judgment." Thesis, Högskolan Dalarna, Institutionen för information och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:du-37469.
Full textPatel, Virashree Hrushikesh. "Topic modeling using latent dirichlet allocation on disaster tweets." 2018. http://hdl.handle.net/2097/39337.
Full textDepartment of Computer Science
Cornelia Caragea
Doina Caragea
Social media has changed the way people communicate information. It has been noted that social media platforms like Twitter are increasingly being used by people and authorities in the wake of natural disasters. The year 2017 was a historic year for the USA in terms of natural calamities and associated costs. According to NOAA (National Oceanic and Atmospheric Administration), during 2017, USA experienced 16 separate billion-dollar disaster events, including three tropical cyclones, eight severe storms, two inland floods, a crop freeze, drought, and wild re. During natural disasters, due to the collapse of infrastructure and telecommunication, often it is hard to reach out to people in need or to determine what areas are affected. In such situations, Twitter can be a lifesaving tool for local government and search and rescue agencies. Using Twitter streaming API service, disaster-related tweets can be collected and analyzed in real-time. Although tweets received from Twitter can be sparse, noisy and ambiguous, some may contain useful information with respect to situational awareness. For example, some tweets express emotions, such as grief, anguish, or call for help, other tweets provide information specific to a region, place or person, while others simply help spread information from news or environmental agencies. To extract information useful for disaster response teams from tweets, disaster tweets need to be cleaned and classified into various categories. Topic modeling can help identify topics from the collection of such disaster tweets. Subsequently, a topic (or a set of topics) will be associated with a tweet. Thus, in this report, we will use Latent Dirichlet Allocation (LDA) to accomplish topic modeling for disaster tweets dataset.
Lin, Chieh-Hung, and 林桀宏. "Survey Topic Modeling and Expert Finding Based on Latent Dirichlet Allocation." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/7f8879.
Full text國立臺灣科技大學
資訊工程系
99
To get into a new research topic for a researcher, it is a shortcut to study survey articles that provide some benefits for readers. Survey articles introduce and summarize significant approaches from those important articles of a certain research topic. Users get these to understand corresponding domains easily, and find related papers quickly. However, it is not easy to find survey articles in every research domain, or there are no recent survey articles in a specific domain. To deal with this in actual condition, traditional approaches use citing texts to generate surveys. Nevertheless, citation-based approaches might limit the performance. In this thesis, we propose an approach, namely Survey Topic Model (STM), which applies Latent Dirichlet Allocation model (LDA) to facilitate the processes of building Topic Modeling or Survey Structure. The proposed STM provides two functions for readers by using certain academic keywords: (1) Collecting important research articles from on-line digital libraries; (2) Categorizing the collected papers with a certain structure manner. In the proposed methodology, feature selection for important articles and LDA-based clustering for survey articles are proposed. We evaluate the proposed mechanism by survey as a dataset collected from CSUR articles. The experimental results show that LDA-based clustering leads to significant improvement. We also solve Expert Finding problem based on LDA, and our system provides fair and relevant expert lists for each proposal.
Crea, Catherine. "On the Robustness of Dirichlet-multinomial Regression in the Context of Modeling Pollination Networks." Thesis, 2011. http://hdl.handle.net/10214/3222.
Full textWei, Hongchuan. "Sensor Planning for Bayesian Nonparametric Target Modeling." Diss., 2016. http://hdl.handle.net/10161/12863.
Full textBayesian nonparametric models, such as the Gaussian process and the Dirichlet process, have been extensively applied for target kinematics modeling in various applications including environmental monitoring, traffic planning, endangered species tracking, dynamic scene analysis, autonomous robot navigation, and human motion modeling. As shown by these successful applications, Bayesian nonparametric models are able to adjust their complexities adaptively from data as necessary, and are resistant to overfitting or underfitting. However, most existing works assume that the sensor measurements used to learn the Bayesian nonparametric target kinematics models are obtained a priori or that the target kinematics can be measured by the sensor at any given time throughout the task. Little work has been done for controlling the sensor with bounded field of view to obtain measurements of mobile targets that are most informative for reducing the uncertainty of the Bayesian nonparametric models. To present the systematic sensor planning approach to leaning Bayesian nonparametric models, the Gaussian process target kinematics model is introduced at first, which is capable of describing time-invariant spatial phenomena, such as ocean currents, temperature distributions and wind velocity fields. The Dirichlet process-Gaussian process target kinematics model is subsequently discussed for modeling mixture of mobile targets, such as pedestrian motion patterns.
Novel information theoretic functions are developed for these introduced Bayesian nonparametric target kinematics models to represent the expected utility of measurements as a function of sensor control inputs and random environmental variables. A Gaussian process expected Kullback Leibler divergence is developed as the expectation of the KL divergence between the current (prior) and posterior Gaussian process target kinematics models with respect to the future measurements. Then, this approach is extended to develop a new information value function that can be used to estimate target kinematics described by a Dirichlet process-Gaussian process mixture model. A theorem is proposed that shows the novel information theoretic functions are bounded. Based on this theorem, efficient estimators of the new information theoretic functions are designed, which are proved to be unbiased with the variance of the resultant approximation error decreasing linearly as the number of samples increases. Computational complexities for optimizing the novel information theoretic functions under sensor dynamics constraints are studied, and are proved to be NP-hard. A cumulative lower bound is then proposed to reduce the computational complexity to polynomial time.
Three sensor planning algorithms are developed according to the assumptions on the target kinematics and the sensor dynamics. For problems where the control space of the sensor is discrete, a greedy algorithm is proposed. The efficiency of the greedy algorithm is demonstrated by a numerical experiment with data of ocean currents obtained by moored buoys. A sweep line algorithm is developed for applications where the sensor control space is continuous and unconstrained. Synthetic simulations as well as physical experiments with ground robots and a surveillance camera are conducted to evaluate the performance of the sweep line algorithm. Moreover, a lexicographic algorithm is designed based on the cumulative lower bound of the novel information theoretic functions, for the scenario where the sensor dynamics are constrained. Numerical experiments with real data collected from indoor pedestrians by a commercial pan-tilt camera are performed to examine the lexicographic algorithm. Results from both the numerical simulations and the physical experiments show that the three sensor planning algorithms proposed in this dissertation based on the novel information theoretic functions are superior at learning the target kinematics with
little or no prior knowledge
Dissertation
Hines, Keegan. "Bayesian approaches for modeling protein biophysics." Thesis, 2014. http://hdl.handle.net/2152/26016.
Full texttext
"Bayesian Nonparametric Modeling and Inference for Multiple Object Tracking." Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.54996.
Full textDissertation/Thesis
Doctoral Dissertation Electrical Engineering 2019
Karlsson, Kalle. "News media attention in Climate Action: Latent topics and open access." Thesis, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-23413.
Full textEly, Nicole. "Rekonstrukce identit ve fake news: Srovnání dvou webových stránek s obsahem fake news." Master's thesis, 2020. http://www.nusl.cz/ntk/nusl-415291.
Full textJi, Chunlin. "Advances in Bayesian Modelling and Computation: Spatio-Temporal Processes, Model Assessment and Adaptive MCMC." Diss., 2009. http://hdl.handle.net/10161/1609.
Full textThe modelling and analysis of complex stochastic systems with increasingly large data sets, state-spaces and parameters provides major stimulus to research in Bayesian nonparametric methods and Bayesian computation. This dissertation presents advances in both nonparametric modelling and statistical computation stimulated by challenging problems of analysis in complex spatio-temporal systems and core computational issues in model fitting and model assessment. The first part of the thesis, represented by chapters 2 to 4, concerns novel, nonparametric Bayesian mixture models for spatial point processes, with advances in modelling, computation and applications in biological contexts. Chapter 2 describes and develops models for spatial point processes in which the point outcomes are latent, where indirect observations related to the point outcomes are available, and in which the underlying spatial intensity functions are typically highly heterogenous. Spatial intensities of inhomogeneous Poisson processes are represented via flexible nonparametric Bayesian mixture models. Computational approaches are presented for this new class of spatial point process mixtures and extended to the context of unobserved point process outcomes. Two examples drawn from a central, motivating context, that of immunofluorescence histology analysis in biological studies generating high-resolution imaging data, demonstrate the modelling approach and computational methodology. Chapters 3 and 4 extend this framework to define a class of flexible Bayesian nonparametric models for inhomogeneous spatio-temporal point processes, adding dynamic models for underlying intensity patterns. Dependent Dirichlet process mixture models are introduced as core components of this new time-varying spatial model. Utilizing such nonparametric mixture models for the spatial process intensity functions allows the introduction of time variation via dynamic, state-space models for parameters characterizing the intensities. Bayesian inference and model-fitting is addressed via novel particle filtering ideas and methods. Illustrative simulation examples include studies in problems of extended target tracking and substantive data analysis in cell fluorescent microscopic imaging tracking problems.
The second part of the thesis, consisting of chapters 5 and chapter 6, concerns advances in computational methods for some core and generic Bayesian inferential problems. Chapter 5 develops a novel approach to estimation of upper and lower bounds for marginal likelihoods in Bayesian modelling using refinements of existing variational methods. Traditional variational approaches only provide lower bound estimation; this new lower/upper bound analysis is able to provide accurate and tight bounds in many problems, so facilitates more reliable computation for Bayesian model comparison while also providing a way to assess adequacy of variational densities as approximations to exact, intractable posteriors. The advances also include demonstration of the significant improvements that may be achieved in marginal likelihood estimation by marginalizing some parameters in the model. A distinct contribution to Bayesian computation is covered in Chapter 6. This concerns a generic framework for designing adaptive MCMC algorithms, emphasizing the adaptive Metropolized independence sampler and an effective adaptation strategy using a family of mixture distribution proposals. This work is coupled with development of a novel adaptive approach to computation in nonparametric modelling with large data sets; here a sequential learning approach is defined that iteratively utilizes smaller data subsets. Under the general framework of importance sampling based marginal likelihood computation, the proposed adaptive Monte Carlo method and sequential learning approach can facilitate improved accuracy in marginal likelihood computation. The approaches are exemplified in studies of both synthetic data examples, and in a real data analysis arising in astro-statistics.
Finally, chapter 7 summarizes the dissertation and discusses possible extensions of the specific modelling and computational innovations, as well as potential future work.
Dissertation