Journal articles on the topic 'Topic model methods'

To see the other types of publications on this topic, follow the link: Topic model methods.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Topic model methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

BOYKO, N., and O. PETROVSKYI. "METHODS OF CLASSIFICATION OF MACHINE LEARNING FOR CONSTRUCTION OF MATHEMATICAL MODELS ON MULTIMODAL DATA." Herald of Khmelnytskyi National University. Technical sciences 307, no. 2 (May 2, 2022): 25–32. http://dx.doi.org/10.31891/2307-5732-2022-307-2-25-32.

Full text
Abstract:
This article is dedicated to topic modeling as an unsupervised machine learning technique. It is analyzed how it seems possible to determine the topics of documents in order to categorize them further with the help of topic modeling methods. Such methods as latent semantic analysis, probabilistic latent semantic analysis and latent Dirichlet allocation are considered. An approach that allows the construction of effective topic models of text document collections in Ukrainian and other synthetic languages based on peculiarities of this linguistic language type is proposed, and its main stages are described. The proposed approach consists of a custom input data preprocessing pipeline, which covers file loading, text extraction, removal of improper symbols, tokenization, removal of stop-words, stemming of each token and a newly introduced model pruning stage, which makes any of the modern topic modeling methods applicable for synthetic language topic modeling. The approach was implemented in Python programming language and used to obtain the topic model of the collection of Ukrainian-language scientific publications on civic identity and related topics. An expert in political psychology, who studies the phenomenon of civic identity, was involved in the research for the topic model quality evaluation. As a result of expert evaluation of the topics singled out during the modeling, it was proposed to clarify the formulation of cluster names based on the semantics of the sets of words that form them. In general, according to the expert, the topics singled out represent the concept of the civic identity of an individual and will allow researchers to simplify the work with literature sources on this issue when used to categorize documents. This demonstrates the efficiency of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
2

Jankowski, Maciej. "Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation." Computer Science and Mathematical Modelling, no. 8/2018 (March 25, 2019): 17–28. http://dx.doi.org/10.5604/01.3001.0013.1458.

Full text
Abstract:
Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.
APA, Harvard, Vancouver, ISO, and other styles
3

Xie, Qianqian, Yutao Zhu, Jimin Huang, Pan Du, and Jian-Yun Nie. "Graph Neural Collaborative Topic Model for Citation Recommendation." ACM Transactions on Information Systems 40, no. 3 (July 31, 2022): 1–30. http://dx.doi.org/10.1145/3473973.

Full text
Abstract:
Due to the overload of published scientific articles, citation recommendation has long been a critical research problem for automatically recommending the most relevant citations of given articles. Relational topic models (RTMs) have shown promise on citation prediction via joint modeling of document contents and citations. However, existing RTMs can only capture pairwise or direct (first-order) citation relationships among documents. The indirect (high-order) citation links have been explored in graph neural network–based methods, but these methods suffer from the well-known explainability problem. In this article, we propose a model called Graph Neural Collaborative Topic Model that takes advantage of both relational topic models and graph neural networks to capture high-order citation relationships and to have higher explainability due to the latent topic semantic structure. Experiments on three real-world citation datasets show that our model outperforms several competitive baseline methods on citation recommendation. In addition, we show that our approach can learn better topics than the existing approaches. The recommendation results can be well explained by the underlying topics.
APA, Harvard, Vancouver, ISO, and other styles
4

Gou, Zhinan, Yan Li, and Zheng Huo. "A Method for Constructing Supervised Time Topic Model Based on Variational Autoencoder." Scientific Programming 2021 (February 8, 2021): 1–11. http://dx.doi.org/10.1155/2021/6623689.

Full text
Abstract:
Topic modeling is a probabilistic generation model to find the representative topic of a document and has been successfully applied to various document-related tasks in recent years. Especially in the supervised topic model and time topic model, many methods have achieved some success. The supervised topic model can learn topics from documents annotated with multiple labels and the time topic model can learn topics that evolve over time in a sequentially organized corpus. However, there are some documents with multiple labels and time-stamped in reality, which need to construct a supervised time topic model to achieve document-related tasks. There are few research papers on the supervised time topic model. To solve this problem, we propose a method for constructing a supervised time topic model. By analysing the generative process of the supervised topic model and time topic model, respectively, we introduce the construction process of the supervised time topic model based on variational autoencoder in detail and conduct preliminary experiments. Experimental results demonstrate that the supervised time topic model outperforms several state-of-the-art topic models.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhu, Lixing, Yulan He, and Deyu Zhou. "A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings." Transactions of the Association for Computational Linguistics 8 (August 2020): 471–85. http://dx.doi.org/10.1162/tacl_a_00326.

Full text
Abstract:
We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings. In particular, we assume that global latent topics are shared across documents, a word is generated by a hidden semantic vector encoding its contextual semantic meaning, and its context words are generated conditional on both the hidden semantic vector and global latent topics. Topics are trained jointly with the word embeddings. The trained model maps words to topic-dependent embeddings, which naturally addresses the issue of word polysemy. Experimental results show that the proposed model outperforms the word-level embedding methods in both word similarity evaluation and word sense disambiguation. Furthermore, the model also extracts more coherent topics compared with existing neural topic models or other models for joint learning of topics and word embeddings. Finally, the model can be easily integrated with existing deep contextualized word embedding learning methods to further improve the performance of downstream tasks such as sentiment classification.
APA, Harvard, Vancouver, ISO, and other styles
6

Mylläri, Sanna, Suoma Eeva Saarni, Ville Ritola, Grigori Joffe, Jan-Henry Stenberg, Ole André Solbakken, Nikolai Olavi Czajkowski, and Tom Rosenström. "Text Topics and Treatment Response in Internet-Delivered Cognitive Behavioral Therapy for Generalized Anxiety Disorder: Text Mining Study." Journal of Medical Internet Research 24, no. 11 (November 9, 2022): e38911. http://dx.doi.org/10.2196/38911.

Full text
Abstract:
Background Text mining methods such as topic modeling can offer valuable information on how and to whom internet-delivered cognitive behavioral therapies (iCBT) work. Although iCBT treatments provide convenient data for topic modeling, it has rarely been used in this context. Objective Our aims were to apply topic modeling to written assignment texts from iCBT for generalized anxiety disorder and explore the resulting topics’ associations with treatment response. As predetermining the number of topics presents a considerable challenge in topic modeling, we also aimed to explore a novel method for topic number selection. Methods We defined 2 latent Dirichlet allocation (LDA) topic models using a novel data-driven and a more commonly used interpretability-based topic number selection approaches. We used multilevel models to associate the topics with continuous-valued treatment response, defined as the rate of per-session change in GAD-7 sum scores throughout the treatment. Results Our analyses included 1686 patients. We observed 2 topics that were associated with better than average treatment response: “well-being of family, pets, and loved ones” from the data-driven LDA model (B=–0.10 SD/session/∆topic; 95% CI –016 to –0.03) and “children, family issues” from the interpretability-based model (B=–0.18 SD/session/∆topic; 95% CI –0.31 to –0.05). Two topics were associated with worse treatment response: “monitoring of thoughts and worries” from the data-driven model (B=0.06 SD/session/∆topic; 95% CI 0.01 to 0.11) and “internet therapy” from the interpretability-based model (B=0.27 SD/session/∆topic; 95% CI 0.07 to 0.46). Conclusions The 2 LDA models were different in terms of their interpretability and broadness of topics but both contained topics that were associated with treatment response in an interpretable manner. Our work demonstrates that topic modeling is well suited for iCBT research and has potential to expose clinically relevant information in vast text data.
APA, Harvard, Vancouver, ISO, and other styles
7

Shi, Lei, Junping Du, and Feifei Kou. "A Sparse Topic Model for Bursty Topic Discovery in Social Networks." International Arab Journal of Information Technology 17, no. 5 (September 1, 2020): 816–24. http://dx.doi.org/10.34028/iajit/17/5/15.

Full text
Abstract:
Bursty topic discovery aims to automatically identify bursty events and continuously keep track of known events. The existing methods focus on the topic model. However, the sparsity of short text brings the challenge to the traditional topic models because the words are too few to learn from the original corpus. To tackle this problem, we propose a Sparse Topic Model (STM) for bursty topic discovery. First, we distinguish the modeling between the bursty topic and the common topic to detect the change of the words in time and discover the bursty words. Second, we introduce “Spike and Slab” prior to decouple the sparsity and smoothness of a distribution. The bursty words are leveraged to achieve automatic discovery of the bursty topics. Finally, to evaluate the effectiveness of our proposed algorithm, we collect Sina weibo dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that the proposed STM algorithm outperforms favorably against several state-of-the-art methods
APA, Harvard, Vancouver, ISO, and other styles
8

Meaney, Christopher, Michael Escobar, Therese A. Stukel, Peter C. Austin, and Liisa Jaakkimainen. "Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study." JMIR Medical Informatics 10, no. 12 (December 19, 2022): e40102. http://dx.doi.org/10.2196/40102.

Full text
Abstract:
Background Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. Objective We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada. Methods We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model. Results Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs). Conclusions Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system.
APA, Harvard, Vancouver, ISO, and other styles
9

Hong-Hui, LI, Zhao Ai-Hua, and Zhang Jun-Wen. "Research of Software Reliability Test Based on Test Model." International Journal of Open Source Software and Processes 8, no. 3 (July 2017): 49–64. http://dx.doi.org/10.4018/ijossp.2017070103.

Full text
Abstract:
This article describes how in recent years, models-based software reliability test methods have been a hot topic. In order to summarize the research results of the reliability test models created in recent years, and to find new research hot topics on this basis, two kinds of test models are in this article. These include the operational profile and the usage model which are introduced and compared. In addition, the methods of constructing a usage model are also discussed in detail. Finally, the topic of building a usage model is presented.
APA, Harvard, Vancouver, ISO, and other styles
10

Park, Sang-Min, Sung Joon Lee, and Byung-Won On. "Topic Word Embedding-Based Methods for Automatically Extracting Main Aspects from Product Reviews." Applied Sciences 10, no. 11 (May 31, 2020): 3831. http://dx.doi.org/10.3390/app10113831.

Full text
Abstract:
Detecting the main aspects of a particular product from a collection of review documents is so challenging in real applications. To address this problem, we focus on utilizing existing topic models that can briefly summarize large text documents. Unlike existing approaches that are limited because of modifying any topic model or using seed opinion words as prior knowledge, we propose a novel approach of (1) identifying starting points for learning, (2) cleaning dirty topic results through word embedding and unsupervised clustering, and (3) automatically generating right aspects using topic and head word embedding. Experimental results show that the proposed methods create more clean topics, improving about 25% of Rouge–1, compared to the baseline method. In addition, through the proposed three methods, the main aspects suitable for given data are detected automatically.
APA, Harvard, Vancouver, ISO, and other styles
11

Guimin Huang, Guimin Huang, and Xiaowei Zhang Guimin Huang. "An Analysis Model of Potential Topics in English Essays Based on Semantic Space." 電腦學刊 33, no. 1 (February 2022): 151–64. http://dx.doi.org/10.53106/199115992022023301014.

Full text
Abstract:
<p>With the reform of English examination in China in recent years, the automatic evaluation of English essays as subjective questions has always been the focus and difficulty of research. The existing automatic essay evaluation system (AEE) has obtained good feedback on the vocabulary, syntactic and other features of English essays, but there is still a problem of low accuracy in the analysis of potential topic in English essay. In order to solve this problem, this paper takes relational triples as the carrier to analyze the potential topics of English essays. By constructing the hierarchical topic trees hybrid semantic spaces to carry out topic clustering, distributed representation of topic relational triples and topic set extension in English essays. Then, based on the improved on-topic analysis algorithm in this paper, the paper analyzes the topic of English essay in multiple dimensions to obtain more abundant potential on-topic semantic information. The experiment results show that the proposed model can reduce the noise caused by non-topic words effectively, and improve the fine-grained topic semantic space in English essays, and the proposed model has better performance than the current methods of on-topic analysis in English esssays.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
12

Shi, Lei, Gang Cheng, Shang-ru Xie, and Gang Xie. "A word embedding topic model for topic detection and summary in social networks." Measurement and Control 52, no. 9-10 (August 27, 2019): 1289–98. http://dx.doi.org/10.1177/0020294019865750.

Full text
Abstract:
The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.
APA, Harvard, Vancouver, ISO, and other styles
13

Wang, Kai, and Fuzhi Wang. "Topic-Feature Lattices Construction and Visualization for Dynamic Topic Number." Journal of Systems Science and Information 9, no. 5 (October 1, 2021): 558–74. http://dx.doi.org/10.21078/jssi-2021-558-17.

Full text
Abstract:
Abstract The topic recognition for dynamic topic number can realize the dynamic update of super parameters, and obtain the probability distribution of dynamic topics in time dimension, which helps to clear the understanding and tracking of convection text data. However, the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge. Therefore, it is impossible to deeply perceive the dynamic change of the topic in the time series. By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model, a topic feature lattice under the dynamic topic number is constructed. In the model, documents, topics and vocabularies are jointly modeled to generate two probability distribution matrices: Documents-topics and topic-feature words. Afterwards, the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix. Finally, the topic feature is induced according to the formal concept analysis (FCA) theory. The topic feature lattice under dynamic topic number (TFL_DTN) model is validated on the real dataset by comparing with the mainstream methods. Experiments show that this model is more in line with actual needs, and achieves better results in semi-automatic modeling of topic visualization analysis.
APA, Harvard, Vancouver, ISO, and other styles
14

Han, Jun, Yu Huang, Kuldeep Kumar, and Sukanto Bhattacharya. "Time-Varying Dynamic Topic Model." Journal of Global Information Management 26, no. 1 (January 2018): 104–19. http://dx.doi.org/10.4018/jgim.2018010106.

Full text
Abstract:
In this paper the authors build on prior literature to develop an adaptive and time-varying metadata-enabled dynamic topic model (mDTM) and apply it to a large Weibo dataset using an online Gibbs sampler for parameter estimation. Their approach simultaneously captures the maximum number of inherent dynamic features of microblogs thereby setting it apart from other online document mining methods in the extant literature. In summary, the authors' results show a better performance of mDTM in terms of the quality of the mined information compared to prior research and showcases mDTM as a promising tool for the effective mining of microblogs in a rapidly changing global information space.
APA, Harvard, Vancouver, ISO, and other styles
15

ZHANG, XIAOYAN, and TING WANG. "TOPIC TRACKING WITH IMPROVED REPRESENTATION MODEL AND JOINT TRACKING METHOD." International Journal of Wavelets, Multiresolution and Information Processing 08, no. 06 (November 2010): 913–30. http://dx.doi.org/10.1142/s0219691310003869.

Full text
Abstract:
Topic tracking is to monitor a stream of stories to find additional stories on a topic identified by several samples. However, the predefined information about a tracked topic does not provide enough information to deal with the new information occurred in the tracking procedure. To overcome this problem, we proposed a joint tracking method using both the topic-specific information from the predefined information and the non-topic-specific information from the data on other topics. Besides, to overcome the limitation of the representation model and the topic drift problem, we have also used two other improvements: a topic-based weighting method is used to measure the features of both tracked topics and single testing stories; a dynamic topic model is extended with the information brought by the incoming related stories and the noise is filtered out with the information in the incoming unrelated stories. The implemented tracking systems are evaluated on the Chinese subset of TDT4 corpus by the TDT2003 evaluation method. The experimental results indicate that the above methods all improve the tracking performance. More importantly, these techniques are complementary to one another and not mutually exclusive.
APA, Harvard, Vancouver, ISO, and other styles
16

Yu, Zehao. "Methods on Detecting Closely Related Topics and Spatial Events." International Journal of Software Engineering and Knowledge Engineering 31, no. 10 (October 2021): 1377–98. http://dx.doi.org/10.1142/s0218194021500455.

Full text
Abstract:
Topic detection is a hot issue that many researchers are interested in. The previous researches focused on the single data stream, they did not consider the topic detection from different data streams in a harmonious way, so they cannot detect closely related topics from different data streams. Recently, Twitter, along with other SNS such as Weibo, and Yelp, began backing position services in their texts. Previous approaches are either complex to be conducted or oversimplified that cannot achieve better performance on detecting spatial topics. In our paper, we introduce a probabilistic method which can precisely detect closely related bursty topics and their bursty periods across different data streams in a unified way. We also introduce a probabilistic method called Latent Spatial Events Model (LSEM) that can find areas as well as to detect the spatial events, it can also predict positions of the texts. We evaluate LSEM on different datasets and reflect that our approach outperforms other baseline approaches in different indexes such as perplexity, entropy of topic and KL-divergence, range error. Evaluation of our first proposed approach on different datasets shows that it can detect closely related topics and meaningful bursty time periods from different datasets.
APA, Harvard, Vancouver, ISO, and other styles
17

Xiong, Deyi, and Min Zhang. "A Topic-Based Coherence Model for Statistical Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 977–83. http://dx.doi.org/10.1609/aaai.v27i1.8566.

Full text
Abstract:
Coherence that ties sentences of a text into a meaningfully connected structure is of great importance to text generation and translation. In this paper, we propose a topic-based coherence model to produce coherence for document translation, in terms of the continuity of sentence topics in a text. We automatically extract a coherence chain for each source text to be translated. Based on the extracted source coherence chain, we adopt a maximum entropy classifier to predict the target coherence chain that defines a linear topic structure for the target document. The proposed topic-based coherence model then uses the predicted target coherence chain to help decoder select coherent word/phrase translations. Our experiments show that incorporating the topic-based coherence model into machine translation achieves substantial improvement over both the baseline and previous methods that integrate document topics rather than coherence chains into machine translation.
APA, Harvard, Vancouver, ISO, and other styles
18

Luo, Xiangfeng, and Yawen Yi. "Topic-Specific Emotion Mining Model for Online Comments." Future Internet 11, no. 3 (March 24, 2019): 79. http://dx.doi.org/10.3390/fi11030079.

Full text
Abstract:
Nowadays, massive texts are generated on the web, which contain a variety of viewpoints, attitudes, and emotions for products and services. Subjective information mining of online comments is vital for enterprises to improve their products or services and for consumers to make purchase decisions. Various effective methods, the mainstream one of which is the topic model, have been put forward to solve this problem. Although most of topic models can mine the topic-level emotion of the product comments, they do not consider interword relations and the number of topics determined adaptively, which leads to poor comprehensibility, high time requirement, and low accuracy. To solve the above problems, this paper proposes an unsupervised Topic-Specific Emotion Mining Model (TSEM), which adds corresponding relationship between aspect words and opinion words to express comments as a bag of aspect–opinion pairs. On one hand, the rich semantic information obtained by adding interword relationship can enhance the comprehensibility of results. On the other hand, text dimensions reduced by adding relationships can cut the computation time. In addition, the number of topics in our model is adaptively determined by calculating perplexity to improve the emotion accuracy of the topic level. Our experiments using Taobao commodity comments achieve better results than baseline models in terms of accuracy, computation time, and comprehensibility. Therefore, our proposed model can be effectively applied to online comment emotion mining tasks.
APA, Harvard, Vancouver, ISO, and other styles
19

Ormeño, Pablo, Marcelo Mendoza, and Carlos Valle. "Topic Models Ensembles for AD-HOC Information Retrieval." Information 12, no. 9 (September 1, 2021): 360. http://dx.doi.org/10.3390/info12090360.

Full text
Abstract:
Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
APA, Harvard, Vancouver, ISO, and other styles
20

Béchara, Hannah, Alexander Herzog, Slava Jankin, and Peter John. "Transfer learning for topic labeling: Analysis of the UK House of Commons speeches 1935–2014." Research & Politics 8, no. 2 (April 2021): 205316802110222. http://dx.doi.org/10.1177/20531680211022206.

Full text
Abstract:
Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models require the additional step of attaching meaningful labels to estimated topics, a process that is not scalable, suffers from human bias, and is difficult to replicate. We present a transfer topic labeling method that seeks to remedy these problems, using domain-specific codebooks as the knowledge base to automatically label estimated topics. We demonstrate our approach with a large-scale topic model analysis of the complete corpus of UK House of Commons speeches from 1935 to 2014, using the coding instructions of the Comparative Agendas Project to label topics. We evaluated our results using human expert coding and compared our approach with more current state-of-the-art neural methods. Our approach was simple to implement, compared favorably to expert judgments, and outperformed the neural networks model for a majority of the topics we estimated.
APA, Harvard, Vancouver, ISO, and other styles
21

Wood, Justin, Corey Arnold, and Wei Wang. "Knowledge Source Rankings for Semi-Supervised Topic Modeling." Information 13, no. 2 (January 24, 2022): 57. http://dx.doi.org/10.3390/info13020057.

Full text
Abstract:
Recent work suggests knowledge sources can be added into the topic modeling process to label topics and improve topic discovery. The knowledge sources typically consist of a collection of human-constructed articles, each describing a topic (article-topic) for an entire domain. However, these semisupervised topic models assume a corpus to contain topics on only a subset of a domain. Therefore, during inference, the model must consider which article-topics were theoretically used to generate the corpus. Since the knowledge sources tend to be quite large, the many article-topics considered slow down the inference process. The increase in execution time is significant, with knowledge source input greater than 103 becoming unfeasible for use in topic modeling. To increase the applicability of semisupervised topic models, approaches are needed to speed up the overall execution time. This paper presents a way of ranking knowledge source topics to satisfy the above goal. Our approach utilizes a knowledge source ranking, based on the PageRank algorithm, to determine the importance of an article-topic. By applying our ranking technique we can eliminate low scoring article-topics before inference, speeding up the overall process. Remarkably, this ranking technique can also improve perplexity and interpretability. Results show our approach to outperform baseline methods and significantly aid semisupervised topic models. In our evaluation, knowledge source rankings yield a 44% increase in topic retrieval f-score, a 42.6% increase in inter-inference topic elimination, a 64% increase in perplexity, a 30% increase in token assignment accuracy, a 20% increase in topic composition interpretability, and a 5% increase in document assignment interpretability over baseline methods.
APA, Harvard, Vancouver, ISO, and other styles
22

Novello, Noemi. "I mixed methods e la valutazione: un'analisi tramite Structural Topic Model." RIV Rassegna Italiana di Valutazione, no. 76 (August 2021): 107–22. http://dx.doi.org/10.3280/riv2020-076007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Xu, Mingying, Junping Du, Zeli Guan, Zhe Xue, Feifei Kou, Lei Shi, Xin Xu, and Ang Li. "A Multi-RNN Research Topic Prediction Model Based on Spatial Attention and Semantic Consistency-Based Scientific Influence Modeling." Computational Intelligence and Neuroscience 2021 (December 18, 2021): 1–15. http://dx.doi.org/10.1155/2021/1766743.

Full text
Abstract:
Computer science discipline includes many research fields, which mutually influence and promote each other’s development. This poses two great challenges of predicting the research topics of each research field. One is how to model fine-grained topic representation of a research field. The other is how to model research topic of different fields and keep the semantic consistency of research topics when learning the scientific influence context from other related fields. Unfortunately, the existing research topic prediction approaches cannot handle these two challenges. To solve these problems, we employ multiple different Recurrent Neural Network chains which model research topics of different fields and propose a research topic prediction model based on spatial attention and semantic consistency-based scientific influence modeling. Spatial attention is employed in field topic representation which can selectively extract the attributes from the field topics to distinguish the importance of field topic attributes. Semantic consistency-based scientific influence modeling maps research topics of different fields to a unified semantic space to obtain the scientific influence context of other related fields. Extensive experiment results on five related research fields in the computer science (CS) discipline show that the proposed model is superior to the most advanced methods and achieves good topic prediction performance.
APA, Harvard, Vancouver, ISO, and other styles
24

Rizky, Harumi Puspa, and Doddy Setiawan. "Perkembangan Penelitian Akuntansi Sektor Publik di Indonesia." Assets: Jurnal Akuntansi dan Pendidikan 8, no. 2 (October 29, 2019): 94. http://dx.doi.org/10.25273/jap.v8i2.4647.

Full text
Abstract:
<p class="JurnalASSETSABSTRAK">ABSTRAK</p><p>Penelitian ini bertujuan untuk memberikan gambaran terkait perkembangan penelitian akuntansi sektor publik di Indonesia. Metode yang digunakan dalam penelitian ini adalah charting the field. Sampel yang diambil berasal dari 22 jurnal terakreditasi di Indonesia dan diperoleh 137 artikel selama tahun 2010-2018. Penelitian ini mengklasifikasikan artikel berdasarkan topik dan metode penelitian. Hasil penelitian ini menunjukkan bahwa topik yang paling banyak digunakan dalam penelitian akuntansi sektor publik adalah topik mengenai akuntansi keuangan dan untuk model yang sering digunakan adalah metode kuantitatif dan juga metode survei serta archival. Topik akuntansi keuangan pada sektor publik menjadi topik terbanyak yang diteliti dikarenakan akuntansi pada sektor publik masih menjadi perhatian khusus dan masih banyak pemerintah daerah yang terkendala pelaporan keuangan. Sementara itu dari topik penelitian yang paling sedikit dilakukan dari tujuh kategori adalah topik mengenai perpajakan dan sistem akuntansi.<em></em></p><p class="JurnalASSETSABSTRAK"><em>ABSTRACT</em></p><p><em><em>This study aims to provide an overview of the development of public sector accounting research in Indonesia. The method used in this research is charting the field. Samples taken came from 22 accredited journals of Indonesia and 137 articles written during 2010-2018. This study classifies articles based on research topics and methods. The results of this study indicate that the topic most widely used in public sector accounting research is the topic of financial accounting and for models that are often used are quantitative methods as well as survey and archival methods. The topic of financial accounting in the public sector is the most researched topic because accounting in the public sector is still a particular concern and many local governments are constrained by financial reporting. Meanwhile, from the research topics, at least seven of the categories were taxation and accounting systems.</em><br /></em></p>
APA, Harvard, Vancouver, ISO, and other styles
25

Jarmoszko, A. T., Marianne D’Onofrio, Joo Eng Lee-Partridge, and Olga Petkova. "Evaluating Sustainability and Greening Methods." International Journal of Applied Logistics 4, no. 3 (July 2013): 1–13. http://dx.doi.org/10.4018/jal.2013070101.

Full text
Abstract:
Recently much has been written about sustainability and greening and the issue is likely to continue to resurface on the agendas of decision makers. This paper addresses one aspect of the topic: that of sustainability and greening through information technology management. The authors review existing research and publications on the topic and conclude that while much research is available on methods of enhancing sustainability and greening, less exists on guidelines to help gauge success or failure of these methods. To help alleviate this shortcoming, the authors propose a model – called the Greening through Information Technology Model (GITM) – based on the framework of Capability Maturity Model.
APA, Harvard, Vancouver, ISO, and other styles
26

La Hera, Pedro, and Daniel Ortíz Morales. "Model-Based Development of Control Systems for Forestry Cranes." Journal of Control Science and Engineering 2015 (2015): 1–15. http://dx.doi.org/10.1155/2015/256951.

Full text
Abstract:
Model-based methods are used in industry for prototyping concepts based on mathematical models. With our forest industry partners, we have established a model-based workflow for rapid development of motion control systems for forestry cranes. Applying this working method, we can verify control algorithms, both theoretically and practically. This paper is an example of this workflow and presents four topics related to the application of nonlinear control theory. The first topic presents the system of differential equations describing the motion dynamics. The second topic presents nonlinear control laws formulated according to sliding mode control theory. The third topic presents a procedure for model calibration and control tuning that are a prerequisite to realize experimental tests. The fourth topic presents the results of tests performed on an experimental crane specifically equipped for these tasks. Results of these studies show the advantages and disadvantages of these control algorithms, and they highlight their performance in terms of robustness and smoothness.
APA, Harvard, Vancouver, ISO, and other styles
27

Gan, Jingxian, and Yong Qi. "Selection of the Optimal Number of Topics for LDA Topic Model—Taking Patent Policy Analysis as an Example." Entropy 23, no. 10 (October 3, 2021): 1301. http://dx.doi.org/10.3390/e23101301.

Full text
Abstract:
This study constructs a comprehensive index to effectively judge the optimal number of topics in the LDA topic model. Based on the requirements for selecting the number of topics, a comprehensive judgment index of perplexity, isolation, stability, and coincidence is constructed to select the number of topics. This method provides four advantages to selecting the optimal number of topics: (1) good predictive ability, (2) high isolation between topics, (3) no duplicate topics, and (4) repeatability. First, we use three general datasets to compare our proposed method with existing methods, and the results show that the optimal topic number selection method has better selection results. Then, we collected the patent policies of various provinces and cities in China (excluding Hong Kong, Macao, and Taiwan) as datasets. By using the optimal topic number selection method proposed in this study, we can classify patent policies well.
APA, Harvard, Vancouver, ISO, and other styles
28

Pan, Tao, Qian Chen, Dong Dong Lv, Shu Han Yuan, and Xin Jin. "Study of Topic Life Cycle Based on Hierarchical HMM." Applied Mechanics and Materials 687-691 (November 2014): 1324–27. http://dx.doi.org/10.4028/www.scientific.net/amm.687-691.1324.

Full text
Abstract:
This paper presents a topic ontology tree based hierarchical HMM (hHMM), and the generation of topic property based life cycle curve is studied. Compared with the traditional methods, this model can describe the hierarchy relations of topics.
APA, Harvard, Vancouver, ISO, and other styles
29

Yang, Hailu, Jin Zhang, Xiaoyu Ding, Chen Chen, and Lili Wang. "GTIP: A Gaming-Based Topic Influence Percolation Model for Semantic Overlapping Community Detection." Entropy 24, no. 9 (September 9, 2022): 1274. http://dx.doi.org/10.3390/e24091274.

Full text
Abstract:
Community detection in semantic social networks is a crucial issue in online social network analysis, and has received extensive attention from researchers in various fields. Different conventional methods discover semantic communities based merely on users’ preferences towards global topics, ignoring the influence of topics themselves and the impact of topic propagation in community detection. To better cope with such situations, we propose a Gaming-based Topic Influence Percolation model (GTIP) for semantic overlapping community detection. In our approach, community formation is modeled as a seed expansion process. The seeds are individuals holding high influence topics and the expansion is modeled as a modified percolation process. We use the concept of payoff in game theory to decide whether to allow neighbors to accept the passed topics, which is more in line with the real social environment. We compare GTIP with four traditional (GN, FN, LFM, COPRA) and seven representative (CUT, TURCM, LCTA, ACQ, DEEP, BTLSC, SCE) semantic community detection methods. The results show that our method is closer to ground truth in synthetic networks and has a higher semantic modularity in real networks.
APA, Harvard, Vancouver, ISO, and other styles
30

He, Yuhan. "Several Methods of Inventory Control." BCP Business & Management 25 (August 30, 2022): 839–50. http://dx.doi.org/10.54691/bcpbm.v25i.1923.

Full text
Abstract:
This paper would firstly introduce the Single-Period Inventory Model, a basic method of inventory control. By analyzing the Newsvendor Problem which is a simplified situation of storing inventories, this paper would discover a method using normal distribution analysis to solve the problem. Then, the paper would promote the adoption of this method into realistic inventory problems and discuss its utility. Secondly, the topic is expanded to Multi-Period Inventory System. This system focus on a more sophisticated inventory situation, and it could be basically categorized into two models: fixed-order quantity model, and fixed time period model. The paper would discuss the methods of ordering determination of the two models respectively.
APA, Harvard, Vancouver, ISO, and other styles
31

Ikegami, Kenshin, and Yukio Ohsawa. "PageRank Topic Model: Estimation of Multinomial Distributions using Network Structure Analysis Methods." Fundamenta Informaticae 159, no. 3 (March 7, 2018): 257–77. http://dx.doi.org/10.3233/fi-2018-1664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Xu, Xiao, Tao Jin, Zhijie Wei, and Jianmin Wang. "Incorporating Topic Assignment Constraint and Topic Correlation Limitation into Clinical Goal Discovering for Clinical Pathway Mining." Journal of Healthcare Engineering 2017 (2017): 1–13. http://dx.doi.org/10.1155/2017/5208072.

Full text
Abstract:
Clinical pathways are widely used around the world for providing quality medical treatment and controlling healthcare cost. However, the expert-designed clinical pathways can hardly deal with the variances among hospitals and patients. It calls for more dynamic and adaptive process, which is derived from various clinical data. Topic-based clinical pathway mining is an effective approach to discover a concise process model. Through this approach, the latent topics found by latent Dirichlet allocation (LDA) represent the clinical goals. And process mining methods are used to extract the temporal relations between these topics. However, the topic quality is usually not desirable due to the low performance of the LDA in clinical data. In this paper, we incorporate topic assignment constraint and topic correlation limitation into the LDA to enhance the ability of discovering high-quality topics. Two real-world datasets are used to evaluate the proposed method. The results show that the topics discovered by our method are with higher coherence, informativeness, and coverage than the original LDA. These quality topics are suitable to represent the clinical goals. Also, we illustrate that our method is effective in generating a comprehensive topic-based clinical pathway model.
APA, Harvard, Vancouver, ISO, and other styles
33

Dadvandipour, Samad, and Aadil Gani Ganie. "Analyzing and predicting spear-phishing using machine learning methods." Multidiszciplináris tudományok 10, no. 4 (2020): 262–73. http://dx.doi.org/10.35925/j.multi.2020.4.30.

Full text
Abstract:
Phishing implies misdirecting the client by masking himself/herself as a reliable individual, to take the Critical material, for example, bank account number, credit card numbers, and so on; one of the noticeably utilized Phishing these days is spear phishing, and it is one of the effective phishing assaults given its social, mental boundaries. In this paper, we will mitigate the impact of spear phishing by utilizing the multi-layer approach. The multi-layer approach is the best method of managing the web interruption, as the intruder needs to experience shift levels. Practically all the scientists are dealing with the content of the email; however, this paper picks a novel method to counter the phishing messages by utilizing both the attachment and content of an email. We applied sentimental analysis on emails, including both content of the email and the attachment, to check whether they are spam or not using SVM classifier and Randomforest Classifier; the former showed 96 percent accuracy while, as later offers 97.66 percent accuracy. SVM showed false-positive 0 percent and false-negative 4 percent, while RandomForest showed 0 percent false-positive and 2.33 percent false-negative ratios. We also performed topic modeling using LDA(Latent Dirichlet Allocation)) from Gensim package to get the dominant topics in our dataset. We visualized the results of our topic model using pyLDvis. The perplexity and coherence score of our topic model is -12.897670565510511 and 0.44700287476452394, respectively.
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Jiwei, and Sujian Li. "A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization." Transactions of the Association for Computational Linguistics 1 (December 2013): 89–98. http://dx.doi.org/10.1162/tacl_a_00212.

Full text
Abstract:
Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC2007, TAC2008 and TAC2009 demonstrate the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
35

Miyazawa, S., X. Song, R. Jiang, Z. Fan, R. Shibasaki, and T. Sato. "CITY-SCALE HUMAN MOBILITY PREDICTION MODEL BY INTEGRATING GNSS TRAJECTORIES AND SNS DATA USING LONG SHORT-TERM MEMORY." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences V-4-2020 (August 3, 2020): 87–94. http://dx.doi.org/10.5194/isprs-annals-v-4-2020-87-2020.

Full text
Abstract:
Abstract. Human mobility analysis on large-scale mobility data has contributed to multiple applications such as urban and transportation planning, disaster preparation and response, tourism, and public health. However, when some unusual events happen, every individual behaves differently depending on their personal routine and background information. To improve the accuracy of the crowd behavior prediction model, understanding supplemental spatiotemporal topics, such as when, where and what people observe and are interested in, is important. In this research, we develop a model integrating social network service (SNS) data into the human mobility prediction model as background information of the mobility. We employ multi-modal deep learning models using Long short-term memory (LSTM) architecture to incorporate SNS data to a human mobility prediction model based on Global Navigation Satellite System (GNSS) data. We process anonymized interpolated GNSS trajectories from mobile phones into mobility sequence with discretized grid IDs, and apply several topic modeling methods on geo-tagged data to extract spatiotemporal topic features in each spatiotemporal unit similar to the mobility data. Thereafter, we integrate the two datasets in the multi-modal deep learning prediction models to predict city-scale mobility. The experiment proves that the models with SNS topics performed better than baseline models.
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Lidong, Yin Zhang, Yun Zhang, Xiaodong Xu, and Shihua Cao. "Prescription Function Prediction Using Topic Model and Multilabel Classifiers." Evidence-Based Complementary and Alternative Medicine 2017 (2017): 1–10. http://dx.doi.org/10.1155/2017/8279109.

Full text
Abstract:
Determining a prescription’s function is one of the challenging problems in Traditional Chinese Medicine (TCM). In past decades, TCM has been widely researched through various methods in computer science, but none concentrates on the prediction method for a new prescription’s function. In this study, two methods are presented concerning this issue. The first method is based on a novel supervised topic model named Label-Prescription-Herb (LPH), which incorporates herb-herb compatibility rules into learning process. The second method is based on multilabel classifiers built by TFIDF features and herbal attribute features. Experiments undertaken reveal that both methods perform well, but the multilabel classifiers slightly outperform LPH-based method. The prediction results can provide valuable information for new prescription discovery before clinical test.
APA, Harvard, Vancouver, ISO, and other styles
37

Sridhar, Dhanya, Hal Daumé, and David Blei. "Heterogeneous Supervised Topic Models." Transactions of the Association for Computational Linguistics 10 (2022): 732–45. http://dx.doi.org/10.1162/tacl_a_00487.

Full text
Abstract:
Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.
APA, Harvard, Vancouver, ISO, and other styles
38

Chen, H., S. R. K. Branavan, R. Barzilay, and D. R. Karger. "Content Modeling Using Latent Permutations." Journal of Artificial Intelligence Research 36 (October 28, 2009): 129–63. http://dx.doi.org/10.1613/jair.2830.

Full text
Abstract:
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.
APA, Harvard, Vancouver, ISO, and other styles
39

Joty, S., G. Carenini, and R. T. Ng. "Topic Segmentation and Labeling in Asynchronous Conversations." Journal of Artificial Intelligence Research 47 (July 22, 2013): 521–73. http://dx.doi.org/10.1613/jair.3940.

Full text
Abstract:
Topic segmentation and labeling is often considered a prerequisite for higher-level conversation analysis and has been shown to be useful in many Natural Language Processing (NLP) applications. We present two new corpora of email and blog conversations annotated with topics, and evaluate annotator reliability for the segmentation and labeling tasks in these asynchronous conversations. We propose a complete computational framework for topic segmentation and labeling in asynchronous conversations. Our approach extends state-of-the-art methods by considering a fine-grained structure of an asynchronous conversation, along with other conversational features by applying recent graph-based methods for NLP. For topic segmentation, we propose two novel unsupervised models that exploit the fine-grained conversational structure, and a novel graph-theoretic supervised model that combines lexical, conversational and topic features. For topic labeling, we propose two novel (unsupervised) random walk models that respectively capture conversation specific clues from two different sources: the leading sentences and the fine-grained conversational structure. Empirical evaluation shows that the segmentation and the labeling performed by our best models beat the state-of-the-art, and are highly correlated with human annotations.
APA, Harvard, Vancouver, ISO, and other styles
40

Chen, Mo. "Model of Network Topic Detection Based on Web Usage Behaviour Mode Analysis and Mining Technology." International Journal of Computers Communications & Control 12, no. 2 (March 1, 2017): 183. http://dx.doi.org/10.15837/ijccc.2017.2.2599.

Full text
Abstract:
This research has caught researchers’ wide attention for detecting network topic exactly with the arrival of big data era characterized by semi-structured or unstructured text. This paper proposes a model of network topic detection based on web usage behaviour mode analysis and mining technology taking Web news as object of research. The author elaborates main function and method proposed in this model, which include the analysis module of Web news instance clicking mode, the analysis module of Web news instance retrieval mode, the analysis module of Web news instance seed and the analysis module of similar Web news instance supporting topics. Based on these functions and methods, the author elaborates main algorithm proposed in this model, which include the mining algorithm of Web news seed instances and the mining algorithm of similar Web news instances supporting topics. These functional algorithms have been applied in processing module of model, and focus on how to detect network topic efficiently from a large number of web usage behaviour towards to Web news instances, in order to explore a research method for network topic detection. The process of experimental analysis includes three steps, firstly, the author analyses the precision of topic detection under different method, secondly, the author completes the impact analysis of Web news topic detection quality from the number of Web news instances concerned and seed threshold, finally, the author completes the quality impact analysis of Web news instances mined supporting topic from the number of Web news instances concerned and probability threshold. The results of experimental analysis show the feasibility, validity and superiority of model design and play an important role in constructing topic-focused Web news corpus so as to provide a real-time data source for topic evolution tracking.
APA, Harvard, Vancouver, ISO, and other styles
41

Du, Qiming, Nan Li, Wenfu Liu, Daozhu Sun, Shudan Yang, and Feng Yue. "A Topic Recognition Method of News Text Based on Word Embedding Enhancement." Computational Intelligence and Neuroscience 2022 (February 16, 2022): 1–15. http://dx.doi.org/10.1155/2022/4582480.

Full text
Abstract:
Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem.
APA, Harvard, Vancouver, ISO, and other styles
42

Ma, Jialin, Xiaoqiang Gong, Zhaojun Wang, and Qian Xie. "SDTM: A Novel Topic Model Framework for Syndrome Differentiation in Traditional Chinese Medicine." Journal of Healthcare Engineering 2022 (January 4, 2022): 1–12. http://dx.doi.org/10.1155/2022/6938506.

Full text
Abstract:
Syndrome differentiation is the most basic diagnostic method in traditional Chinese medicine (TCM). The process of syndrome differentiation is difficult and challenging due to its complexity, diversity, and vagueness. Recently, artificial intelligent methods have been introduced to discover the regularities of syndrome differentiation from TCM medical records, but the existing DM algorithms failed to consider how a syndrome is generated according to TCM theories. In this paper, we propose a novel topic model framework named syndrome differentiation topic model (SDTM) to dynamically characterize the process of syndrome differentiation. The SDTM framework utilizes latent Dirichlet allocation (LDA) to discover the latent semantic relationship between symptoms and syndromes in mass of Chinese medical records. We also use similarity measurement method to make the uninterpretable topics correspond with the labeled syndromes. Finally, Bayesian method is used in the final differentiated syndromes. Experimental results show the superiority of SDTM over existing topic models for the task of syndrome differentiation.
APA, Harvard, Vancouver, ISO, and other styles
43

Hemphill, Libby, and Angela M. Schöpke-Gonzalez. "Two Computational Models for Analyzing Political Attention in Social Media." Proceedings of the International AAAI Conference on Web and Social Media 14 (May 26, 2020): 260–71. http://dx.doi.org/10.1609/icwsm.v14i1.7297.

Full text
Abstract:
Understanding how political attention is divided and over what subjects is crucial for research on areas such as agenda setting, framing, and political rhetoric. Existing methods for measuring attention, such as manual labeling according to established codebooks, are expensive and can be restrictive. We describe two computational models that automatically distinguish topics in politicians' social media content. Our models—one supervised classifier and one unsupervised topic model—provide different benefits. The supervised classifier reduces the labor required to classify content according to pre-determined topic list. However, tweets do more than communicate policy positions. Our unsupervised model uncovers both political topics and other Twitter uses (e.g., constituent service). These models are effective, inexpensive computational tools for political communication and social media research. We demonstrate their utility and discuss the different analyses they afford by applying both models to the tweets posted by members of the 115th U.S. Congress.
APA, Harvard, Vancouver, ISO, and other styles
44

Miyano, I., H. Kataoka, N. Nakajima, T. Watabe, N. Yasuda, Y. Okuhara, and Y. Hatakeyama. "Use of a Latent Topic Model for Characteristic Extraction from Health Checkup Questionnaire Data." Methods of Information in Medicine 54, no. 06 (2015): 515–21. http://dx.doi.org/10.3414/me15-01-0023.

Full text
Abstract:
Summary Objectives: When patients complete questionnaires during health checkups, many of their responses are subjective, making topic extraction difficult. Therefore, the purpose of this study was to develop a model capable of extracting appropriate topics from subjective data in questionnaires conducted during health checkups. Methods: We employed a latent topic model to group the lifestyle habits of the study participants and represented their responses to items on health checkup questionnaires as a probability model. For the probability model, we used latent Dirichlet allocation to extract 30 topics from the questionnaires. According to the model parameters, a total of 4381 study participants were then divided into groups based on these topics. Results from laboratory tests, including blood glucose level, triglycerides, and estimated glomerular filtration rate, were compared between each group, and these results were then compared with those obtained by hierarchical clustering. Results: If a significant (p < 0.05) difference was observed in any of the laboratory measurements between groups, it was considered to indicate a questionnaire response pattern corresponding to the value of the test result. A comparison between the latent topic model and hierarchical clustering grouping revealed that, in the latent topic model method, a small group of participants who reported having subjective signs of uri-nary disorder were allocated to a single group. Conclusions: The latent topic model is useful for extracting characteristics from a small number of groups from questionnaires with a large number of items. These results show that, in addition to chief complaints and history of past illness, questionnaire data obtained during medical checkups can serve as useful judgment criteria for assessing the conditions of patients.
APA, Harvard, Vancouver, ISO, and other styles
45

Liu, Zheng, Chiyu Liu, Bin Xia, and Tao Li. "Multiple Relational Topic Modeling for Noisy Short Texts." International Journal of Software Engineering and Knowledge Engineering 28, no. 11n12 (November 2018): 1559–74. http://dx.doi.org/10.1142/s021819401840017x.

Full text
Abstract:
Understanding contents in social networks by inferring high-quality latent topics from short texts is a significant task in social analysis, which is challenging because social network contents are usually extremely short, noisy and full of informal vocabularies. Due to the lack of sufficient word co-occurrence instances, well-known topic modeling methods such as LDA and LSA cannot uncover high-quality topic structures. Existing research works seek to pool short texts from social networks into pseudo documents or utilize the explicit relations among these short texts such as hashtags in tweets to make classic topic modeling methods work. In this paper, we explore this problem by proposing a topic model for noisy short texts with multiple relations called MRTM (Multiple Relational Topic Modeling). MRTM exploits both explicit and implicit relations by introducing a document-attribute distribution and a two-step random sampling strategy. Extensive experiments, compared with the state-of-the-art topic modeling approaches, demonstrate that MRTM can alleviate the word co-occurrence sparsity and uncover high-quality latent topics from noisy short texts.
APA, Harvard, Vancouver, ISO, and other styles
46

Hendrawan, Muhammad Yunus, and Nucke Widowati Kusumo Projo. "Topic Modelling in Knowledge Management Documents BPS Statistics Indonesia." Proceedings of The International Conference on Data Science and Official Statistics 2021, no. 1 (January 4, 2022): 119–30. http://dx.doi.org/10.34123/icdsos.v2021i1.52.

Full text
Abstract:
Knowledge management is an important activity in improving the performance an organization. BPS Statistics Indonesia has recently implemented such a system to improve the quality and efficiency of business processes. The purposes of this research are: 1) implementing topic modelling on BPS Knowledge Management System to identify groups of document topics; 2) providing recommendations on which the best topic modelling; 3) building a web service function of topic modelling for BPS that includes data preprocessing function and topic group recommendation function. This study applies the Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) topic modelling methods to determine the best grouping techniques for knowledge management systems in BPS Statistics Indonesia. The results show that the LDA model using Mallet is the best model with 25 topic groups and a coherence score of 0.4803. The performance result suggest that the best modelling method is the LDA. The LDA model is then successfully implemented in RESTful web service to provide services in the preprocessing function and topic recommendations on documents entered into the Knowledge Management System BPS.
APA, Harvard, Vancouver, ISO, and other styles
47

Wang, Jing, Mohit Bansal, Kevin Gimpel, Brian D. Ziebart, and Clement T. Yu. "A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment." Transactions of the Association for Computational Linguistics 3 (December 2015): 59–71. http://dx.doi.org/10.1162/tacl_a_00122.

Full text
Abstract:
Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.
APA, Harvard, Vancouver, ISO, and other styles
48

Hong, Yuling, and Qishan Zhang. "Indicator Selection for Topic Popularity Definition Based on AHP and Deep Learning Models." Discrete Dynamics in Nature and Society 2020 (August 24, 2020): 1–11. http://dx.doi.org/10.1155/2020/9634308.

Full text
Abstract:
Purpose. The purpose of this article is to predict the topic popularity on the social network accurately. Indicator selection model for a new definition of topic popularity with degree of grey incidence (DGI) is undertook based on an improved analytic hierarchy process (AHP). Design/Methodology/Approach. Through screening the importance of indicators by the deep learning methods such as recurrent neural networks (RNNs), long short-term memory (LSTM), and gated recurrent unit (GRU), a selection model of topic popularity indicators based on AHP is set up. Findings. The results show that when topic popularity is being built quantitatively based on the DGI method and different weights of topic indicators are obtained from the help of AHP, the average accuracy of topic popularity prediction can reach 97.66%. The training speed is higher and the prediction precision is higher. Practical Implications. The method proposed in the paper can be used to calculate the popularity of each hot topic and generate the ranking list of topics’ popularities. Moreover, its future popularity can be predicted by deep learning methods. At the same time, a new application field of deep learning technology has been further discovered and verified. Originality/Value. This can lay a theoretical foundation for the formulation of topic popularity tendency prevention measures on the social network and provide an evaluation method which is consistent with the actual situation.
APA, Harvard, Vancouver, ISO, and other styles
49

Wang Gao, Wang Gao, Baoping Yang Wang Gao, Yuwei Wang Baoping Yang, and Yuan Fang Yuwei Wang. "Depression Detection in Social Media using XLNet with Topic Distributions." 電腦學刊 33, no. 4 (August 2022): 095–106. http://dx.doi.org/10.53106/199115992022083304008.

Full text
Abstract:
<p>Due to the complexity of depressive diseases, detecting depressed users on social media platforms is a challenging task. In recent years, with an increasing number of users of social media sites, this field of re-search has begun to develop rapidly. To improve the detection performance of traditional methods, two challenges need to be overcome. The first challenge is that textual content posted on social media plat-forms suffers from serious data sparseness. The second one is how to effectively use emotions, user in-formation, and behavior characteristics to predict potentially depressed users. In this paper, we propose a novel model called the Topic-enriched Depression Detection Model (TDDM), which combines topic in-formation and user behavior to predict depressed users on social media platforms. TDDM first employs a Conditional Random Field Regularized Topic Model (CRFTM) to extract the topic knowledge of user posts. XLNet is used to encode posts to further expand the semantic features of short texts. Finally, we integrate user behavior features into TDDM to improve the detection performance of the model. The ex-perimental results on a real-world Twitter dataset demonstrate that the proposed model performs better than baseline models in detecting depressed users at both pseudo-document level and user level.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
50

Wang, Yiru, Pengda Si, Zeyang Lei, and Yujiu Yang. "Topic Enhanced Controllable CVAE for Dialogue Generation (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13955–56. http://dx.doi.org/10.1609/aaai.v34i10.7250.

Full text
Abstract:
Neural generation models have shown great potential in conversation generation recently. However, these methods tend to generate uninformative or irrelevant responses. In this paper, we present a novel topic-enhanced controllable CVAE (TEC-CVAE) model to address this issue. On the one hand, the model learns the context-interactive topic knowledge through a novel multi-hop hybrid attention in the encoder. On the other hand, we design a topic-aware controllable decoder to constrain the expression of the stochastic latent variable in the CVAE to reduce irrelevant responses. Experimental results on two public datasets show that the two mechanisms synchronize to improve both relevance and diversity, and the proposed model outperforms other competitive methods.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography