Auswahl der wissenschaftlichen Literatur zum Thema „Dirichlet allocation“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Dirichlet allocation" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Dirichlet allocation"

1

Du, Lan, Wray Buntine, Huidong Jin und Changyou Chen. „Sequential latent Dirichlet allocation“. Knowledge and Information Systems 31, Nr. 3 (10.06.2011): 475–503. http://dx.doi.org/10.1007/s10115-011-0425-1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Schwarz, Carlo. „Ldagibbs: A Command for Topic Modeling in Stata Using Latent Dirichlet Allocation“. Stata Journal: Promoting communications on statistics and Stata 18, Nr. 1 (März 2018): 101–17. http://dx.doi.org/10.1177/1536867x1801800107.

Der volle Inhalt der Quelle
Annotation:
In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Yoshida, Takahiro, Ryohei Hisano und Takaaki Ohnishi. „Gaussian hierarchical latent Dirichlet allocation: Bringing polysemy back“. PLOS ONE 18, Nr. 7 (12.07.2023): e0288274. http://dx.doi.org/10.1371/journal.pone.0288274.

Der volle Inhalt der Quelle
Annotation:
Topic models are widely used to discover the latent representation of a set of documents. The two canonical models are latent Dirichlet allocation, and Gaussian latent Dirichlet allocation, where the former uses multinomial distributions over words, and the latter uses multivariate Gaussian distributions over pre-trained word embedding vectors as the latent topic representations, respectively. Compared with latent Dirichlet allocation, Gaussian latent Dirichlet allocation is limited in the sense that it does not capture the polysemy of a word such as “bank.” In this paper, we show that Gaussian latent Dirichlet allocation could recover the ability to capture polysemy by introducing a hierarchical structure in the set of topics that the model can use to represent a given document. Our Gaussian hierarchical latent Dirichlet allocation significantly improves polysemy detection compared with Gaussian-based models and provides more parsimonious topic representations compared with hierarchical latent Dirichlet allocation. Our extensive quantitative experiments show that our model also achieves better topic coherence and held-out document predictive accuracy over a wide range of corpus and word embedding vectors which significantly improves the capture of polysemy compared with GLDA and CGTM. Our model learns the underlying topic distribution and hierarchical structure among topics simultaneously, which can be further used to understand the correlation among topics. Moreover, the added flexibility of our model does not necessarily increase the time complexity compared with GLDA and CGTM, which makes our model a good competitor to GLDA.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Archambeau, Cedric, Balaji Lakshminarayanan und Guillaume Bouchard. „Latent IBP Compound Dirichlet Allocation“. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, Nr. 2 (Februar 2015): 321–33. http://dx.doi.org/10.1109/tpami.2014.2313122.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Pion-Tonachini, Luca, Scott Makeig und Ken Kreutz-Delgado. „Crowd labeling latent Dirichlet allocation“. Knowledge and Information Systems 53, Nr. 3 (19.04.2017): 749–65. http://dx.doi.org/10.1007/s10115-017-1053-1.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

S.S., Ramyadharshni, und Pabitha Dr.P. „Topic Categorization on Social Network Using Latent Dirichlet Allocation“. Bonfring International Journal of Software Engineering and Soft Computing 8, Nr. 2 (30.04.2018): 16–20. http://dx.doi.org/10.9756/bijsesc.8390.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Syed, Shaheen, und Marco Spruit. „Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation“. International Journal of Semantic Computing 12, Nr. 03 (September 2018): 399–423. http://dx.doi.org/10.1142/s1793351x18400184.

Der volle Inhalt der Quelle
Annotation:
Latent Dirichlet Allocation (LDA) has gained much attention from researchers and is increasingly being applied to uncover underlying semantic structures from a variety of corpora. However, nearly all researchers use symmetrical Dirichlet priors, often unaware of the underlying practical implications that they bear. This research is the first to explore symmetrical and asymmetrical Dirichlet priors on topic coherence and human topic ranking when uncovering latent semantic structures from scientific research articles. More specifically, we examine the practical effects of several classes of Dirichlet priors on 2000 LDA models created from abstract and full-text research articles. Our results show that symmetrical or asymmetrical priors on the document–topic distribution or the topic–word distribution for full-text data have little effect on topic coherence scores and human topic ranking. In contrast, asymmetrical priors on the document–topic distribution for abstract data show a significant increase in topic coherence scores and improved human topic ranking compared to a symmetrical prior. Symmetrical or asymmetrical priors on the topic–word distribution show no real benefits for both abstract and full-text data.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Li, Gen, und Hazri Jamil. „Teacher professional learning community and interdisciplinary collaborative teaching path under the informationization basic education model“. Yugoslav Journal of Operations Research, Nr. 00 (2024): 29. http://dx.doi.org/10.2298/yjor2403029l.

Der volle Inhalt der Quelle
Annotation:
The construction of a learning community cannot be separated from the participation of information technology. The current teacher learning community has problems of low interaction efficiency and insufficient enthusiasm for group cooperative teaching. This study adopts the Latent Dirichlet allocation method to process text data generated by teacher interaction from the evolution of knowledge topics in the learning community network space. At the same time, the interaction data of the network community learning space is used to extract the interaction characteristics between teachers, and a collaborative teaching group is formed using the K-means clustering algorithm. This study verifies the management effect of Latent Dirichlet allocation and Kmeans algorithm in learning community space through experiments. The experiment showed that the Latent Dirichlet allocation algorithm had the highest F1 value at a K value of 12, which is 0.88. It collaborated with the filtering algorithm on the overall F1 value. At the same time, there were a total of 4 samples with incorrect judgments in Latent Dirichlet allocation, with an accuracy of 86.7%, which is higher than other algorithm models. The results indicate that the proposed Latent Dirichlet allocation combined with K-means algorithm has superior performance in the management of teacher professional learning communities, and can effectively improve the service level of teacher work.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Garg, Mohit, und Priya Rangra. „Bibliometric Analysis of Latent Dirichlet Allocation“. DESIDOC Journal of Library & Information Technology 42, Nr. 2 (28.02.2022): 105–13. http://dx.doi.org/10.14429/djlit.42.2.17307.

Der volle Inhalt der Quelle
Annotation:
Latent Dirichlet Allocation (LDA) has emerged as an important algorithm in big data analysis that finds the group of topics in the text data. It posits that each text document consists of a group of topics, and each topic is a mixture of words related to it. With the emergence of a plethora of text data, the LDA has become a popular algorithm for topic modeling among researchers from different domains. Therefore, it is essential to understand the trends of LDA researches. Bibliometric techniques are established methods to study the research progress of a topic. In this study, bibliographic data of 18715 publications that have cited the LDA were extracted from the Scopus database. The software R and Vosviewer were used to carry out the analysis. The analysis revealed that research interest in LDA had grown exponentially. The results showed that most authors preferred “Book Series” followed by “Conference Proceedings” as the publication venue. The majority of the institutions and authors were from the USA, followed by China. The co-occurrence analysis of keywords indicated that text mining and machine learning were dominant topics in LDA research with significant interest in social media. This study attempts to provide a comprehensive analysis and intellectual structure of LDA compared to previous studies.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Chauhan, Uttam, und Apurva Shah. „Topic Modeling Using Latent Dirichlet allocation“. ACM Computing Surveys 54, Nr. 7 (30.09.2022): 1–35. http://dx.doi.org/10.1145/3462478.

Der volle Inhalt der Quelle
Annotation:
We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Dirichlet allocation"

1

Ponweiser, Martin. „Latent Dirichlet Allocation in R“. WU Vienna University of Economics and Business, 2012. http://epub.wu.ac.at/3558/1/main.pdf.

Der volle Inhalt der Quelle
Annotation:
Topic models are a new research field within the computer sciences information retrieval and text mining. They are generative probabilistic models of text corpora inferred by machine learning and they can be used for retrieval and text mining tasks. The most prominent topic model is latent Dirichlet allocation (LDA), which was introduced in 2003 by Blei et al. and has since then sparked off the development of other topic models for domain-specific purposes. This thesis focuses on LDA's practical application. Its main goal is the replication of the data analyses from the 2004 LDA paper ``Finding scientific topics'' by Thomas Griffiths and Mark Steyvers within the framework of the R statistical programming language and the R~package topicmodels by Bettina Grün and Kurt Hornik. The complete process, including extraction of a text corpus from the PNAS journal's website, data preprocessing, transformation into a document-term matrix, model selection, model estimation, as well as presentation of the results, is fully documented and commented. The outcome closely matches the analyses of the original paper, therefore the research by Griffiths/Steyvers can be reproduced. Furthermore, this thesis proves the suitability of the R environment for text mining with LDA. (author's abstract)
Series: Theses / Institute for Statistics and Mathematics
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Arnekvist, Isac, und Ludvig Ericson. „Finding competitors using Latent Dirichlet Allocation“. Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-186386.

Der volle Inhalt der Quelle
Annotation:
Identifying business competitors is of interest to many, but is becoming increasingly hard in an expanding global market. The aim of this report is to investigate whether Latent Dirichlet Allocation (LDA) can be used to identify and rank competitors based on distances between LDA representations of company descriptions. The performance of the LDA model was compared to that of bag-of-words and random ordering by evaluating then comparing them on a handful of common information retrieval metrics. Several different distance metrics were evaluated to determine which metric had best correspondence between representation distance and companies being competitors. Cosine similarity was found to outperform the other distance metrics. While both LDA and bag-of-words representations were found to be significantly better than random ordering, LDA was found to perform worse than bag-of-words. However, computation of distance metrics was considerably faster for LDA representations. The LDA representations capture features that are not helpful for identifying competitors, and it is suggested that LDA representations could be used together with some other data source or heuristic.
Det finns ett intresse av att kunna identifiera affärskonkurrenter, men detta blir allt svårare på en ständigt växande och alltmer global marknad. Syftet med denna rapport är att undersöka om Latent Dirichlet Allocation (LDA) kan användas för att identifiera och rangordna konkurrenter. Detta genom att jämföra avstånden mellan LDA-representationerna av dessas företagsbeskrivningar. Effektiviteten av LDA i detta syfte jämfördes med den för bag-of-words samt slumpmässig ordning, detta med hjälp av några vanliga informationsteoretiska mått. Flera olika avståndsmått utvärderades för att bestämma vilken av dessa som bäst åstadkommer att konkurrerande företag hamnar nära varandra. I detta fall fanns Cosine similarity överträffa andra avståndsmått. Medan både LDA och bag-of-words konstaterades vara signifikant bättre än slumpmässig ordning så fanns att LDA presterar kvalitativt sämre än bag-of-words. Uträkning av avståndsmått var dock betydligt snabbare med LDA-representationer. Att omvandla webbinnehåll till LDA-representationer fångar dock vissa ospecifika likheter som inte nödvändigt beskriver konkurrenter. Det kan möjligen vara fördelaktigt att använda LDA-representationer ihop med någon ytterligare datakälla och/eller heuristik.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Choubey, Rahul. „Tag recommendation using Latent Dirichlet Allocation“. Thesis, Kansas State University, 2011. http://hdl.handle.net/2097/9785.

Der volle Inhalt der Quelle
Annotation:
Master of Science
Department of Computing and Information Sciences
Doina Caragea
The vast amount of data present on the internet calls for ways to label and organize this data according to specific categories, in order to facilitate search and browsing activities. This can be easily accomplished by making use of folksonomies and user provided tags. However, it can be difficult for users to provide meaningful tags. Tag recommendation systems can guide the users towards informative tags for online resources such as websites, pictures, etc. The aim of this thesis is to build a system for recommending tags to URLs available through a bookmark sharing service, called BibSonomy. We assume that the URLs for which we recommend tags do not have any prior tags assigned to them. Two approaches are proposed to address the tagging problem, both of them based on Latent Dirichlet Allocation (LDA) Blei et al. [2003]. LDA is a generative and probabilistic topic model which aims to infer the hidden topical structure in a collection of documents. According to LDA, documents can be seen as mixtures of topics, while topics can be seen as mixtures of words (in our case, tags). The first approach that we propose, called topic words based approach, recommends the top words in the top topics representing a resource as tags for that particular resource. The second approach, called topic distance based approach, uses the tags of the most similar training resources (identified using the KL-divergence Kullback and Liebler [1951]) to recommend tags for a test untagged resource. The dataset used in this work was made available through the ECML/PKDD Discovery Challenge 2009. We construct the documents that are provided as input to LDA in two ways, thus producing two different datasets. In the first dataset, we use only the description and the tags (when available) corresponding to a URL. In the second dataset, we crawl the URL content and use it to construct the document. Experimental results show that the LDA approach is not very effective at recommending tags for new untagged resources. However, using the resource content gives better results than using the description only. Furthermore, the topic distance based approach is better than the topic words based approach, when only the descriptions are used to construct documents, while the topic words based approach works better when the contents are used to construct documents.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Risch, Johan. „Detecting Twitter topics using Latent Dirichlet Allocation“. Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-277260.

Der volle Inhalt der Quelle
Annotation:
Latent Dirichlet Allocations is evaluated for its suitability when detecting topics in a stream of short messages limited to 140 characters. This is done by assessing its ability to model the incoming messages and its ability to classify previously unseen messages with known topics. The evaluation shows that the model can be suitable for certain applications in topic detection when the stream size is small enough. Furthermoresuggestions on how to handle larger streams are outlined.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Liu, Zelong. „High performance latent dirichlet allocation for text mining“. Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/7726.

Der volle Inhalt der Quelle
Annotation:
Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Kulhanek, Raymond Daniel. „A Latent Dirichlet Allocation/N-gram Composite Language Model“. Wright State University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=wright1379520876.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Anaya, Leticia H. „Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers“. Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc103284/.

Der volle Inhalt der Quelle
Annotation:
In the Information Age, a proliferation of unstructured text electronic documents exists. Processing these documents by humans is a daunting task as humans have limited cognitive abilities for processing large volumes of documents that can often be extremely lengthy. To address this problem, text data computer algorithms are being developed. Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are two text data computer algorithms that have received much attention individually in the text data literature for topic extraction studies but not for document classification nor for comparison studies. Since classification is considered an important human function and has been studied in the areas of cognitive science and information science, in this dissertation a research study was performed to compare LDA, LSA and humans as document classifiers. The research questions posed in this study are: R1: How accurate is LDA and LSA in classifying documents in a corpus of textual data over a known set of topics? R2: How accurate are humans in performing the same classification task? R3: How does LDA classification performance compare to LSA classification performance? To address these questions, a classification study involving human subjects was designed where humans were asked to generate and classify documents (customer comments) at two levels of abstraction for a quality assurance setting. Then two computer algorithms, LSA and LDA, were used to perform classification on these documents. The results indicate that humans outperformed all computer algorithms and had an accuracy rate of 94% at the higher level of abstraction and 76% at the lower level of abstraction. At the high level of abstraction, the accuracy rates were 84% for both LSA and LDA and at the lower level, the accuracy rate were 67% for LSA and 64% for LDA. The findings of this research have many strong implications for the improvement of information systems that process unstructured text. Document classifiers have many potential applications in many fields (e.g., fraud detection, information retrieval, national security, and customer management). Development and refinement of algorithms that classify text is a fruitful area of ongoing research and this dissertation contributes to this area.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Jaradat, Shatha. „OLLDA: Dynamic and Scalable Topic Modelling for Twitter : AN ONLINE SUPERVISED LATENT DIRICHLET ALLOCATION ALGORITHM“. Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-177535.

Der volle Inhalt der Quelle
Annotation:
Providing high quality of topics inference in today's large and dynamic corpora, such as Twitter, is a challenging task. This is especially challenging taking into account that the content in this environment contains short texts and many abbreviations. This project proposes an improvement of a popular online topics modelling algorithm for Latent Dirichlet Allocation (LDA), by incorporating supervision to make it suitable for Twitter context. This improvement is motivated by the need for a single algorithm that achieves both objectives: analyzing huge amounts of documents, including new documents arriving in a stream, and, at the same time, achieving high quality of topics’ detection in special case environments, such as Twitter. The proposed algorithm is a combination of an online algorithm for LDA and a supervised variant of LDA - labeled LDA. The performance and quality of the proposed algorithm is compared with these two algorithms. The results demonstrate that the proposed algorithm has shown better performance and quality when compared to the supervised variant of LDA, and it achieved better results in terms of quality in comparison to the online algorithm. These improvements make our algorithm an attractive option when applied to dynamic environments, like Twitter. An environment for analyzing and labelling data is designed to prepare the dataset before executing the experiments. Possible application areas for the proposed algorithm are tweets recommendation and trends detection.
Tillhandahålla högkvalitativa ämnen slutsats i dagens stora och dynamiska korpusar, såsom Twitter, är en utmanande uppgift. Detta är särskilt utmanande med tanke på att innehållet i den här miljön innehåller korta texter och många förkortningar. Projektet föreslår en förbättring med en populär online ämnen modellering algoritm för Latent Dirichlet Tilldelning (LDA), genom att införliva tillsyn för att göra den lämplig för Twitter sammanhang. Denna förbättring motiveras av behovet av en enda algoritm som uppnår båda målen: analysera stora mängder av dokument, inklusive nya dokument som anländer i en bäck, och samtidigt uppnå hög kvalitet på ämnen "upptäckt i speciella fall miljöer, till exempel som Twitter. Den föreslagna algoritmen är en kombination av en online-algoritm för LDA och en övervakad variant av LDA - Labeled LDA. Prestanda och kvalitet av den föreslagna algoritmen jämförs med dessa två algoritmer. Resultaten visar att den föreslagna algoritmen har visat bättre prestanda och kvalitet i jämförelse med den övervakade varianten av LDA, och det uppnådde bättre resultat i fråga om kvalitet i jämförelse med den online-algoritmen. Dessa förbättringar gör vår algoritm till ett attraktivt alternativ när de tillämpas på dynamiska miljöer, som Twitter. En miljö för att analysera och märkning uppgifter är utformad för att förbereda dataset innan du utför experimenten. Möjliga användningsområden för den föreslagna algoritmen är tweets rekommendation och trender upptäckt.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Yalamanchili, Hima Bindu. „A Novel Approach For Cancer Characterization Using Latent Dirichlet Allocation and Disease-Specific Genomic Analysis“. Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1527600876174758.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Sheikha, Hassan. „Text mining Twitter social media for Covid-19 : Comparing latent semantic analysis and latent Dirichlet allocation“. Thesis, Högskolan i Gävle, Avdelningen för datavetenskap och samhällsbyggnad, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-32567.

Der volle Inhalt der Quelle
Annotation:
In this thesis, the Twitter social media is data mined for information about the covid-19 outbreak during the month of March, starting from the 3’rd and ending on the 31’st. 100,000 tweets were collected from Harvard’s opensource data and recreated using Hydrate. This data is analyzed further using different Natural Language Processing (NLP) methodologies, such as termfrequency inverse document frequency (TF-IDF), lemmatizing, tokenizing, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Furthermore, the results of the LSA and LDA algorithms is reduced dimensional data that will be clustered using clustering algorithms HDBSCAN and K-Means for later comparison. Different methodologies are used to determine the optimal parameters for the algorithms. This is all done in the python programing language, as there are libraries for supporting this research, the most important being scikit-learn. The frequent words of each cluster will then be displayed and compared with factual data regarding the outbreak to discover if there are any correlations. The factual data is collected by World Health Organization (WHO) and is then visualized in graphs in ourworldindata.org. Correlations with the results are also looked for in news articles to find any significant moments to see if that affected the top words in the clustered data. The news articles with good timelines used for correlating incidents are that of NBC News and New York Times. The results show no direct correlations with the data reported by WHO, however looking into the timelines reported by news sources some correlation can be seen with the clustered data. Also, the combination of LDA and HDBSCAN yielded the most desireable results in comparison to the other combinations of the dimnension reductions and clustering. This was much due to the use of GridSearchCV on LDA to determine the ideal parameters for the LDA models on each dataset as well as how well HDBSCAN clusters its data in comparison to K-Means.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Dirichlet allocation"

1

Shi, Feng. Learn About Latent Dirichlet Allocation in R With Data From the News Articles Dataset (2016). 1 Oliver's Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526495693.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Shi, Feng. Learn About Latent Dirichlet Allocation in Python With Data From the News Articles Dataset (2016). 1 Oliver's Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications, Ltd., 2019. http://dx.doi.org/10.4135/9781526497727.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies. CreateSpace Independent Publishing Platform, 2014.

Den vollen Inhalt der Quelle finden
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Jockers, Matthew L. Theme. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037528.003.0008.

Der volle Inhalt der Quelle
Annotation:
This chapter demonstrates how big data and computation can be used to identify and track recurrent themes as the products of external influence. It first considers the limitations of the Google Ngram Viewer as a tool for tracing thematic trends over time before turning to Douglas Biber's Corpus Linguistics: Investigating Language Structure and Use, a primer on various factors complicating word-focused text analysis and the subsequent conclusions one might draw regarding word meanings. It then discusses the results of the author's application of latent Dirichlet allocation (LDA) to a corpus of 3,346 nineteenth-century novels using the open-source MALLET (MAchine Learning for LanguagE Toolkit), a software package for topic modeling. It also explains the different types of analyses performed by the author, including text segmentation, word chunking, and author nationality, gender and time-themes relationship analyses. The thematic data from the LDA model reveal the degree to which author nationality, author gender, and date of publication could be predicted by the thematic signals expressed in the nineteenth-century novels corpus.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Dirichlet allocation"

1

Li, Hang. „Latent Dirichlet Allocation“. In Machine Learning Methods, 439–71. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-3917-6_20.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Tang, Yi-Kun, Xian-Ling Mao und Heyan Huang. „Labeled Phrase Latent Dirichlet Allocation“. In Web Information Systems Engineering – WISE 2016, 525–36. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-48740-3_39.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Moon, Gordon E., Israt Nisa, Aravind Sukumaran-Rajam, Bortik Bandyopadhyay, Srinivasan Parthasarathy und P. Sadayappan. „Parallel Latent Dirichlet Allocation on GPUs“. In Lecture Notes in Computer Science, 259–72. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-93701-4_20.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Calvo, Hiram, Ángel Hernández-Castañeda und Jorge García-Flores. „Author Identification Using Latent Dirichlet Allocation“. In Computational Linguistics and Intelligent Text Processing, 303–12. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-77116-8_22.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Hao, Jing, und Hongxi Wei. „Latent Dirichlet Allocation Based Image Retrieval“. In Lecture Notes in Computer Science, 211–21. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68699-8_17.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Maanicshah, Kamal, Manar Amayri und Nizar Bouguila. „Interactive Generalized Dirichlet Mixture Allocation Model“. In Lecture Notes in Computer Science, 33–42. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-23028-8_4.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Wheeler, Jordan M., Shiyu Wang und Allan S. Cohen. „Latent Dirichlet Allocation of Constructed Responses“. In The Routledge International Handbook of Automated Essay Evaluation, 535–55. New York: Routledge, 2024. http://dx.doi.org/10.4324/9781003397618-31.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Rus, Vasile, Nobal Niraula und Rajendra Banjade. „Similarity Measures Based on Latent Dirichlet Allocation“. In Computational Linguistics and Intelligent Text Processing, 459–70. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37247-6_37.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Bíró, István, und Jácint Szabó. „Latent Dirichlet Allocation for Automatic Document Categorization“. In Machine Learning and Knowledge Discovery in Databases, 430–41. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-04174-7_28.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Lovato, Pietro, Manuele Bicego, Vittorio Murino und Alessandro Perina. „Robust Initialization for Learning Latent Dirichlet Allocation“. In Similarity-Based Pattern Recognition, 117–32. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24261-3_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Dirichlet allocation"

1

Tahsin, Faiza, Hafsa Ennajari und Nizar Bouguila. „Author Dirichlet Multinomial Allocation Model with Generalized Distribution (ADMAGD)“. In 2024 International Symposium on Networks, Computers and Communications (ISNCC), 1–7. IEEE, 2024. http://dx.doi.org/10.1109/isncc62547.2024.10758998.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Koltcov, Sergei, Olessia Koltsova und Sergey Nikolenko. „Latent dirichlet allocation“. In the 2014 ACM conference. New York, New York, USA: ACM Press, 2014. http://dx.doi.org/10.1145/2615569.2615680.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Chien, Jen-Tzung, Chao-Hsi Lee und Zheng-Hua Tan. „Dirichlet mixture allocation“. In 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2016. http://dx.doi.org/10.1109/mlsp.2016.7738866.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Shen, Zhi-Yong, Jun Sun und Yi-Dong Shen. „Collective Latent Dirichlet Allocation“. In 2008 Eighth IEEE International Conference on Data Mining (ICDM). IEEE, 2008. http://dx.doi.org/10.1109/icdm.2008.75.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Li, Shuangyin, Guan Huang, Ruiyang Tan und Rong Pan. „Tag-Weighted Dirichlet Allocation“. In 2013 IEEE International Conference on Data Mining (ICDM). IEEE, 2013. http://dx.doi.org/10.1109/icdm.2013.11.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Hsin, Wei-Cheng, und Jen-Wei Huang. „Multi-dependent Latent Dirichlet Allocation“. In 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 2017. http://dx.doi.org/10.1109/taai.2017.51.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Krestel, Ralf, Peter Fankhauser und Wolfgang Nejdl. „Latent dirichlet allocation for tag recommendation“. In the third ACM conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1639714.1639726.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Tan, Yimin, und Zhijian Ou. „Topic-weak-correlated Latent Dirichlet allocation“. In 2010 7th International Symposium on Chinese Spoken Language Processing (ISCSLP). IEEE, 2010. http://dx.doi.org/10.1109/iscslp.2010.5684906.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Xiang, Yingzhuo, Dongmei Yang und Jikun Yan. „The Auto Annotation Latent Dirichlet Allocation“. In First International Conference on Information Sciences, Machinery, Materials and Energy. Paris, France: Atlantis Press, 2015. http://dx.doi.org/10.2991/icismme-15.2015.387.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Bhutada, Sunil, V. V. S. S. S. Balaram und Vishnu Vardhan Bulusu. „Latent Dirichlet Allocation based multilevel classification“. In 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT). IEEE, 2014. http://dx.doi.org/10.1109/iccicct.2014.6993109.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Berichte der Organisationen zum Thema "Dirichlet allocation"

1

Teh, Yee W., David Newman und Max Welling. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation. Fort Belvoir, VA: Defense Technical Information Center, September 2007. http://dx.doi.org/10.21236/ada629956.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Antón Sarabia, Arturo, Santiago Bazdresch und Alejandra Lelo-de-Larrea. The Influence of Central Bank's Projections and Economic Narrative on Professional Forecasters' Expectations: Evidence from Mexico. Banco de México, Dezember 2023. http://dx.doi.org/10.36095/banxico/di.2023.21.

Der volle Inhalt der Quelle
Annotation:
This paper evaluates the influence of central bank's projections and narrative signals provided in the summaries of its Inflation Report on the expectations of professional forecasters for inflation and GDP growth in the case of Mexico. We use the Latent Dirichlet Allocation model, a textmining technique, to identify narrative signals. We show that both quantitative and qualitative information have an influence on inflation and GDP growth expectations. We also find that narrative signals related to monetary policy, observed inflation, aggregate demand, and inflation and employment projections stand out as the most relevant in accounting for changes in analysts' expectations. If the period of the COVID-19 pandemic is excluded, we still find that forecasters consider both types of information for their inflation expectations.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Moreno Pérez, Carlos, und Marco Minozzo. “Making Text Talk”: The Minutes of the Central Bank of Brazil and the Real Economy. Madrid: Banco de España, November 2022. http://dx.doi.org/10.53479/23646.

Der volle Inhalt der Quelle
Annotation:
This paper investigates the relationship between the views expressed in the minutes of the meetings of the Central Bank of Brazil’s Monetary Policy Committee (COPOM) and the real economy. It applies various computational linguistic machine learning algorithms to construct measures of the minutes of the COPOM. First, we create measures of the content of the paragraphs of the minutes using Latent Dirichlet Allocation (LDA). Second, we build an uncertainty index for the minutes using Word Embedding and K-Means. Then, we combine these indices to create two topic-uncertainty indices. The first one is constructed from paragraphs with a higher probability of topics related to “general economic conditions”. The second topic-uncertainty index is constructed from paragraphs that have a higher probability of topics related to “inflation” and the “monetary policy discussion”. Finally, we employ a structural VAR model to explore the lasting effects of these uncertainty indices on certain Brazilian macroeconomic variables. Our results show that greater uncertainty leads to a decline in inflation, the exchange rate, industrial production and retail trade in the period from January 2000 to July 2019.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Alonso-Robisco, Andrés, José Manuel Carbó und José Manuel Carbó. Machine Learning methods in climate finance: a systematic review. Madrid: Banco de España, Februar 2023. http://dx.doi.org/10.53479/29594.

Der volle Inhalt der Quelle
Annotation:
Preventing the materialization of climate change is one of the main challenges of our time. The involvement of the financial sector is a fundamental pillar in this task, which has led to the emergence of a new field in the literature, climate finance. In turn, the use of Machine Learning (ML) as a tool to analyze climate finance is on the rise, due to the need to use big data to collect new climate-related information and model complex non-linear relationships. Considering the proliferation of articles in this field, and the potential for the use of ML, we propose a review of the academic literature to assess how ML is enabling climate finance to scale up. The main contribution of this paper is to provide a structure of application domains in a highly fragmented research field, aiming to spur further innovative work from ML experts. To pursue this objective, first we perform a systematic search of three scientific databases to assemble a corpus of relevant studies. Using topic modeling (Latent Dirichlet Allocation) we uncover representative thematic clusters. This allows us to statistically identify seven granular areas where ML is playing a significant role in climate finance literature: natural hazards, biodiversity, agricultural risk, carbon markets, energy economics, ESG factors & investing, and climate data. Second, we perform an analysis highlighting publication trends; and thirdly, we show a breakdown of ML methods applied by research area.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie