To see the other types of publications on this topic, follow the link: Pointwise Mutual Information.

Journal articles on the topic 'Pointwise Mutual Information'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Pointwise Mutual Information.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Aji, S. "Document Summarization Using Positive Pointwise Mutual Information." International Journal of Computer Science and Information Technology 4, no. 2 (April 30, 2012): 47–55. http://dx.doi.org/10.5121/ijcsit.2012.4204.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Takada, Teruko. "Mining local and tail dependence structures based on pointwise mutual information." Data Mining and Knowledge Discovery 24, no. 1 (May 6, 2011): 78–102. http://dx.doi.org/10.1007/s10618-011-0220-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Torun, Orhan, and Seniha Esen Yuksel. "Unsupervised segmentation of LiDAR fused hyperspectral imagery using pointwise mutual information." International Journal of Remote Sensing 42, no. 17 (June 23, 2021): 6461–76. http://dx.doi.org/10.1080/01431161.2021.1939906.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Finn, Conor, and Joseph Lizier. "Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices." Entropy 20, no. 4 (April 18, 2018): 297. http://dx.doi.org/10.3390/e20040297.

Full text
Abstract:
What are the distinct ways in which a set of predictor variables can provide information about a target variable? When does a variable provide unique information, when do variables share redundant information, and when do variables combine synergistically to provide complementary information? The redundancy lattice from the partial information decomposition of Williams and Beer provided a promising glimpse at the answer to these questions. However, this structure was constructed using a much criticised measure of redundant information, and despite sustained research, no completely satisfactory replacement measure has been proposed. In this paper, we take a different approach, applying the axiomatic derivation of the redundancy lattice to a single realisation from a set of discrete variables. To overcome the difficulty associated with signed pointwise mutual information, we apply this decomposition separately to the unsigned entropic components of pointwise mutual information which we refer to as the specificity and ambiguity. This yields a separate redundancy lattice for each component. Then based upon an operational interpretation of redundancy, we define measures of redundant specificity and ambiguity enabling us to evaluate the partial information atoms in each lattice. These atoms can be recombined to yield the sought-after multivariate information decomposition. We apply this framework to canonical examples from the literature and discuss the results and the various properties of the decomposition. In particular, the pointwise decomposition using specificity and ambiguity satisfies a chain rule over target variables, which provides new insights into the so-called two-bit-copy example.
APA, Harvard, Vancouver, ISO, and other styles
5

C N, Pushpa, Gerard Deepak, Mohammed Zakir, Thriveni J, and Venugopal K R. "ENHANCED NEIGHBORHOOD NORMALIZED POINTWISE MUTUAL INFORMATION ALGORITHM FOR CONSTRAINT AWARE DATA CLUSTERING." ICTACT Journal on Soft Computing 6, no. 4 (July 1, 2016): 1287–92. http://dx.doi.org/10.21917/ijsc.2016.0176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Recchia, Gabriel, and Michael N. Jones. "More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis." Behavior Research Methods 41, no. 3 (August 2009): 647–56. http://dx.doi.org/10.3758/brm.41.3.647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Takada, Teruko. "Erratum to: Mining local and tail dependence structures based on pointwise mutual information." Data Mining and Knowledge Discovery 26, no. 1 (October 14, 2011): 213–15. http://dx.doi.org/10.1007/s10618-011-0241-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chennubhotla, SChakra, DanielM Spagnolo, Rekha Gyanchandani, Yousef Al-Kofahi, AndrewM Stern, TimothyR Lezon, Albert Gough, et al. "Pointwise mutual information quantifies intratumor heterogeneity in tissue sections labeled with multiple fluorescent biomarkers." Journal of Pathology Informatics 7, no. 1 (2016): 47. http://dx.doi.org/10.4103/2153-3539.194839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rahman, A. "Comparison Extraction Feature Using Double Propagation and Pointwise Mutual Information to Select a Product." IOP Conference Series: Materials Science and Engineering 407 (September 26, 2018): 012147. http://dx.doi.org/10.1088/1757-899x/407/1/012147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Manivannan, P., and C. S. Kanimozhiselvi. "Pointwise Mutual Information Based Integral Classifier for Sentiment Analysis in Cross Domain Opinion Mining." Journal of Computational and Theoretical Nanoscience 14, no. 11 (November 1, 2017): 5435–43. http://dx.doi.org/10.1166/jctn.2017.6967.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Nesaragi, Naimahmed, Shivnarayan Patidar, and Vaneet Aggarwal. "Tensor learning of pointwise mutual information from EHR data for early prediction of sepsis." Computers in Biology and Medicine 134 (July 2021): 104430. http://dx.doi.org/10.1016/j.compbiomed.2021.104430.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Anita Muhsini, Sri Reski. "ANALISIS DAN IMPLEMENTASI CROSS-LINGUAL SEMANTIC SIMILARITY ANTAR KATA DENGAN METODE POINTWISE MUTUAL INFORMATION." Jurnal Penelitian Pendidikan 18, no. 1 (April 30, 2018): 18–24. http://dx.doi.org/10.17509/jpp.v18i1.11056.

Full text
Abstract:
Implementasi pengukuran kesamaan semantik memiliki peran yang sangat penting dalam beberapa bidang Natural Language Processing (NLP), dimana hasilnya seringkali dijadikan dasar dalam melakukan task NLP yang lebih lanjut. Salah satu penerapannya yaitu dengan melakukan pengukuran kesamaan semantik multibahasa antar kata. Pengukuran ini dilatarbelakangi oleh suatu masalah dimana saat ini banyak sistem pencarian informasi yang harus berurusan dengan teks atau dokumen multibahasa. Sepasang kata dinyatakan memiliki kesamaan semantik jika pasangan kata tersebut memiliki kesamaan dari sisi makna atau konsep. Pada penelitian ini, diimplementasikan perhitungan kesamaan semantik antar kata pada bahasa yang berbeda yaitu bahasa Inggris dan bahasa Spanyol. Korpus yang digunakan pada penelitian ini yakni Europarl Parallel Corpus pada bahasa Inggris dan bahasa Spanyol. Konteks kata bersumber dari Swadesh list, serta hasil dari kesamaan semantiknya dibandingkan dengan datasetGold Standard SemEval 2017 Crosslingual Semantic Similarity untuk diukur nilai korelasinya. Hasil pengujian yang didapat terlihat bahwa pengukuran metode PMI mampu menghasilkan korelasi sebesar 0,5781 untuk korelasi Pearson dan 0.5762 untuk korelasi Spearman. Dari hasil penelitian dapat disimpulkan bahwa Implementasi pengukuran Crosslingual Semantic Similarity menggunakan metode Pointwise Mutual Information (PMI) mampu menghasilkan korelasi terbaik. Peneliti merekomendasikan pada penelitian selanjutnya dapat dilakukan dengan menggunakan dataset lain untuk membuktikan seberapa efektif metode pengukuran Poitnwise Mutual Information (PMI) dalam mengukur Crosslingual Semantic Similarity antar kata.
APA, Harvard, Vancouver, ISO, and other styles
13

LEE, Jung-Been, Taek LEE, and Hoh Peter IN. "Automatic Stop Word Generation for Mining Software Artifact Using Topic Model with Pointwise Mutual Information." IEICE Transactions on Information and Systems E102.D, no. 9 (September 1, 2019): 1761–72. http://dx.doi.org/10.1587/transinf.2018edp7390.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Kawauchi, Saori, Tetsuya Toyota, and Hajime Nobuhara. "Knowledge Expansion Support by Related Search Keyword Generation Based onWikipedia Category and Pointwise Mutual Information." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 2 (March 20, 2012): 247–55. http://dx.doi.org/10.20965/jaciii.2012.p0247.

Full text
Abstract:
When users use search engines to acquire knowledge on certain subjects in unknown domains, they often refer to the related search keywords that are generated on the frequency of use as search keywords. However, such searches by reference to related search keywords may not always turn out to be useful for the expansion of knowledge on the research subjects. We, therefore, propose a new method to generate related search keywords by means of Wikipedia. In the proposed method, users first searchWikipedia pages of the same title with the queries input by users to extract information on the category of the pages. Next, obtain the sets of pages that fall into the category and extract related page groups from the pages contained in any plural product sets of pages. Then, calculate pointwise mutual information or tf-idf for the keywords extracted from each page to make either information of higher values associated with search keywords. We have confirmed effectiveness of the proposed method through comparison with related search keywords generated by Google as well as through subjective evaluation experiments.
APA, Harvard, Vancouver, ISO, and other styles
15

UTSUMI, AKIRA. "A semantic space approach to the computational semantics of noun compounds." Natural Language Engineering 20, no. 2 (January 15, 2013): 185–234. http://dx.doi.org/10.1017/s135132491200037x.

Full text
Abstract:
AbstractThis study examines the ability of a semantic space model to represent the meaning of noun compounds such as ‘information gathering’ or ‘heart disease.’ For a semantic space model to compute the meaning and the attributional similarity (or semantic relatedness) for unfamiliar noun compounds that do not occur in a corpus, the vector for a noun compound must be computed from the vectors of its constituent words using vector composition algorithms. Six composition algorithms (i.e., centroid, multiplication, circular convolution, predication, comparison, and dilation) are compared in terms of the quality of the computation of the attributional similarity for English and Japanese noun compounds. To evaluate the performance of the computation of the similarity, this study uses three tasks (i.e., related word ranking, similarity correlation, and semantic classification), and two types of semantic spaces (i.e., latent semantic analysis-based and positive pointwise mutual information-based spaces). The result of these tasks is that the dilation algorithm is generally most effective in computing the similarity of noun compounds, while the multiplication algorithm is best suited specifically for the positive pointwise mutual information-based space. In addition, the comparison algorithm works better for unfamiliar noun compounds that do not occur in the corpus. These findings indicate that in general a semantic space model, and in particular the dilation, multiplication, and comparison algorithms have sufficient ability to compute the attributional similarity for noun compounds.
APA, Harvard, Vancouver, ISO, and other styles
16

Kherwa, Pooja, and Poonam Bansal. "Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling." International Journal of E-Health and Medical Communications 13, no. 2 (July 2022): 1–17. http://dx.doi.org/10.4018/ijehmc.20220701.oa7.

Full text
Abstract:
The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.
APA, Harvard, Vancouver, ISO, and other styles
17

Chen, Haijian, Yonghui Dai, Yanjie Feng, Bo Jiang, Jun Xiao, and Ben You. "Construction of affective education in mobile learning: The study based on learner’s interest and emotion recognition." Computer Science and Information Systems 14, no. 3 (2017): 685–702. http://dx.doi.org/10.2298/csis170110023c.

Full text
Abstract:
Affective education has been the new educational pattern under modern ubiquitous learning environment. Especially in mobile learning, how to effectively construct affective education to optimize and enhance the teaching effectiveness has attracted many scholars attention. This paper presents the framework of affective education based on learner?s interest and emotion recognition. Learner?s voice, text and behavior log data are firstly preprocessed, then association rules analysis, SO-PMI (Semantic Orientation-Pointwise Mutual Information) and ANN-DL (Artificial Neural Network with Deep Learning) methods are used to learner?s interest mining and emotion recognition. The experimental results show that these methods can effectively recognize the emotion of learners in mobile learning and satisfy the requirements of affective education.
APA, Harvard, Vancouver, ISO, and other styles
18

Chimlek, Sutasinee, Part Pramokchon, and Punpiti Piamsa-nga. "The Selection of Useful Visual Words for Class-Imbalanced Data in Image Classification." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (February 1, 2016): 307. http://dx.doi.org/10.11591/ijece.v6i1.8633.

Full text
Abstract:
<span>The bag of visual words (BOVW) has recently been used for image classification in large datasets. A major problem of image classification using BOVW is high dimensionality, with most features usually being irrelevant and different BOVW for multi-view images in each class. Therefore, the selection of significant visual words for multi-view images in each class is an essential method to reduce the size of BOVW while retaining the high performance of image classification. Many feature scores for ranking produce low classification performance for class imbalanced distributions and multi-views in each class. We propose a feature score based on the statistical t-test technique, which is a statistical evaluation of the difference between two sample means, to assess the discriminating power of each individual feature. The multi-class image classification performance of the proposed feature score is compared with four modern feature scores, such as Document Frequency (DF), Mutual information (MI), Pointwise Mutual information (PMI) and Chi-square statistics (CHI). The results show that the average F1-measure performance on the Paris dataset and the SUN397 dataset using the proposed feature score are 92% and 94%, respectively, while all other feature scores do not exceed 80%.</span>
APA, Harvard, Vancouver, ISO, and other styles
19

Chimlek, Sutasinee, Part Pramokchon, and Punpiti Piamsa-nga. "The Selection of Useful Visual Words for Class-Imbalanced Data in Image Classification." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 1 (February 1, 2016): 307. http://dx.doi.org/10.11591/ijece.v6i1.pp307-319.

Full text
Abstract:
<span>The bag of visual words (BOVW) has recently been used for image classification in large datasets. A major problem of image classification using BOVW is high dimensionality, with most features usually being irrelevant and different BOVW for multi-view images in each class. Therefore, the selection of significant visual words for multi-view images in each class is an essential method to reduce the size of BOVW while retaining the high performance of image classification. Many feature scores for ranking produce low classification performance for class imbalanced distributions and multi-views in each class. We propose a feature score based on the statistical t-test technique, which is a statistical evaluation of the difference between two sample means, to assess the discriminating power of each individual feature. The multi-class image classification performance of the proposed feature score is compared with four modern feature scores, such as Document Frequency (DF), Mutual information (MI), Pointwise Mutual information (PMI) and Chi-square statistics (CHI). The results show that the average F1-measure performance on the Paris dataset and the SUN397 dataset using the proposed feature score are 92% and 94%, respectively, while all other feature scores do not exceed 80%.</span>
APA, Harvard, Vancouver, ISO, and other styles
20

Hillebrand, Lars, David Biesner, Christian Bauckhage, and Rafet Sifa. "Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM." Machine Learning and Knowledge Extraction 3, no. 1 (January 19, 2021): 123–67. http://dx.doi.org/10.3390/make3010007.

Full text
Abstract:
Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.
APA, Harvard, Vancouver, ISO, and other styles
21

Liu, Gaosheng, and Yang Bai. "Statistical inference in functional semiparametric spatial autoregressive model." AIMS Mathematics 6, no. 10 (2021): 10890–906. http://dx.doi.org/10.3934/math.2021633.

Full text
Abstract:
<abstract><p>Semiparametric spatial autoregressive model has drawn great attention since it allows mutual dependence in spatial form and nonlinear effects of covariates. However, with development of scientific technology, there exist functional covariates with high dimensions and frequencies containing rich information. Based on high-dimensional covariates, we propose an interesting and novel functional semiparametric spatial autoregressive model. We use B-spline basis function to approximate the slope function and nonparametric function and propose generalized method of moments to estimate parameters. Under certain regularity conditions, the asymptotic properties of the proposed estimators are obtained. The estimators are computationally convenient with closed-form expression. For slope function and nonparametric function estimators, we propose the residual-based approach to derive its pointwise confidence interval. Simulation studies show that the proposed method performs well.</p></abstract>
APA, Harvard, Vancouver, ISO, and other styles
22

Marciniak, Malgorzata, and Agnieszka Mykowiecka. "Nested term recognition driven by word connection strength." Terminology 21, no. 2 (December 30, 2015): 180–204. http://dx.doi.org/10.1075/term.21.2.03mar.

Full text
Abstract:
Domain corpora are often not very voluminous and even important terms can occur in them not as isolated maximal phrases but only within more complex constructions. Appropriate recognition of nested terms can thus influence the content of the extracted candidate term list and its order. We propose a new method for identifying nested terms based on a combination of two aspects: grammatical correctness and normalised pointwise mutual information (NPMI) counted for all bigrams in a given corpus. NPMI is typically used for recognition of strong word connections, but in our solution we use it to recognise the weakest points to suggest the best place for division of a phrase into two parts. By creating, at most, two nested phrases in each step, we introduce a binary term structure. We test the impact of the proposed method applied, together with the C-value ranking method, to the automatic term recognition task performed on three corpora, two in Polish and one in English.
APA, Harvard, Vancouver, ISO, and other styles
23

Watford, Sean M., Rachel G. Grashow, Vanessa Y. De La Rosa, Ruthann A. Rudel, Katie Paul Friedman, and Matthew T. Martin. "Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: Use case in breast carcinogenesis." Computational Toxicology 7 (August 2018): 46–57. http://dx.doi.org/10.1016/j.comtox.2018.06.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Paperno, Denis, and Marco Baroni. "When the Whole Is Less Than the Sum of Its Parts: How Composition Affects PMI Values in Distributional Semantic Vectors." Computational Linguistics 42, no. 2 (June 2016): 345–50. http://dx.doi.org/10.1162/coli_a_00250.

Full text
Abstract:
Distributional semantic models, deriving vector-based word representations from patterns of word usage in corpora, have many useful applications (Turney and Pantel 2010 ). Recently, there has been interest in compositional distributional models, which derive vectors for phrases from representations of their constituent words (Mitchell and Lapata 2010 ). Often, the values of distributional vectors are pointwise mutual information (PMI) scores obtained from raw co-occurrence counts. In this article we study the relation between the PMI dimensions of a phrase vector and its components in order to gain insights into which operations an adequate composition model should perform. We show mathematically that the difference between the PMI dimension of a phrase vector and the sum of PMIs in the corresponding dimensions of the phrase's parts is an independently interpretable value, namely, a quantification of the impact of the context associated with the relevant dimension on the phrase's internal cohesion, as also measured by PMI. We then explore this quantity empirically, through an analysis of adjective–noun composition.
APA, Harvard, Vancouver, ISO, and other styles
25

Wu, Fan, Yung-Ting Chuang, and Hung-Wei Lai. "Facilitating apps recommendation in Google Play." Electronic Library 36, no. 5 (October 1, 2018): 856–74. http://dx.doi.org/10.1108/el-05-2017-0119.

Full text
Abstract:
PurposeThe purpose of this paper is to present a system that analyzes trustworthiness and ranks applications to improve the search experience.Design/methodology/approachThe system adopts pointwise mutual information to calculate comment semantics. It examines subjective (signed opinions, anonymous opinions and star ratings) and objective factors (download numbers, reputation ratings) before filtering, ranking and displaying). The authors invited three experts to check three categories and compared the results using Spearman and two statistics.FindingsA high correlation between the proposed system and the expert ranking system suggests that the system can act as decision support.Research limitations/implicationsFirst, the authors have only tested the correlation between the proposed system and an expert ranking system; user satisfaction was not evaluated. The authors plan to conduct a later survey to gather user feedback. Second, the ranking system evaluates applications using fixed weights and disregards time. Therefore, in the future, the authors plan to enable their system to weight recent records over older ones.Practical implicationsUser discussion forums, although helpful, have drawbacks. Not all reviews are trustworthy, and forums provide no filtering mechanisms to combat information overload. The solution to this is the authors’ system that crawls a forum, filters information, analyzes the trustworthiness of each comment and ranks the application for the user.Originality/valueThis paper develops a formula to analyze the trustworthiness of opinions, enabling the system to act as decision support when no professional advice is available.
APA, Harvard, Vancouver, ISO, and other styles
26

Soleimani, Behrouz Haji, and Stan Matwin. "Fast PMI-Based Word Embedding with Efficient Use of Unobserved Patterns." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7031–38. http://dx.doi.org/10.1609/aaai.v33i01.33017031.

Full text
Abstract:
Continuous word representations that can capture the semantic information in the corpus are the building blocks of many natural language processing tasks. Pre-trained word embeddings are being used for sentiment analysis, text classification, question answering and so on. In this paper, we propose a new word embedding algorithm that works on a smoothed Positive Pointwise Mutual Information (PPMI) matrix which is obtained from the word-word co-occurrence counts. One of our major contributions is to propose an objective function and an optimization framework that exploits the full capacity of “negative examples”, the unobserved or insignificant wordword co-occurrences, in order to push unrelated words away from each other which improves the distribution of words in the latent space. We also propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm to |V |log|V | with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin.
APA, Harvard, Vancouver, ISO, and other styles
27

Et.al, Dr R. Rooba. "Webpage Recommendation System Based on the Social Media Semantic Details of the Website." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 6 (April 10, 2021): 237–43. http://dx.doi.org/10.17762/turcomat.v12i6.1358.

Full text
Abstract:
The web page recommendation is generated by using the navigational history from web server log files. Semantic Variable Length Markov Chain Model (SVLMC) is a web page recommendation system used to generate recommendation by combining a higher order Markov model with rich semantic data. The problem of state space complexity and time complexity in SVLMC was resolved by Semantic Variable Length confidence pruned Markov Chain Model (SVLCPMC) and Support vector machine based SVLCPMC (SSVLCPMC) meth-ods respectively. The recommendation accuracy was further improved by quickest change detection using Kullback-Leibler Divergence method. In this paper, socio semantic information is included with the similarity score which improves the recommendation accuracy. The social information from the social websites such as twitter is considered for web page recommendation. Initially number of web pages is collected and the similari-ty between web pages is computed by comparing their semantic information. The term frequency and inverse document frequency (tf-idf) is used to produce a composite weight, the most important terms in the web pages are extracted. Then the Pointwise Mutual Information (PMI) between the most important terms and the terms in the twitter dataset are calculated. The PMI metric measures the closeness between the twitter terms and the most important terms in the web pages. Then this measure is added with the similarity score matrix to provide the socio semantic search information for recommendation generation. The experimental results show that the pro-posed method has better performance in terms of prediction accuracy, precision, F1 measure, R measure and coverage.
APA, Harvard, Vancouver, ISO, and other styles
28

Dai, Honglin, Liejun Wang, and Jiwei Qin. "Metric Factorization with Item Cooccurrence for Recommendation." Symmetry 12, no. 4 (April 2, 2020): 512. http://dx.doi.org/10.3390/sym12040512.

Full text
Abstract:
In modern recommender systems, matrix factorization has been widely used to decompose the user–item matrix into user and item latent factors. However, the inner product in matrix factorization does not satisfy the triangle inequality, and the problem of sparse data is also encountered. In this paper, we propose a novel recommendation model, namely, metric factorization with item cooccurrence for recommendation (MFIC), which uses the Euclidean distance to jointly decompose the user–item interaction matrix and the item–item cooccurrence with shared latent factors. The item cooccurrence matrix is obtained from the colike matrix through the calculation of pointwise mutual information. The main contributions of this paper are as follows: (1) The MFIC model is not only suitable for rating prediction and item ranking, but can also well overcome the problem of sparse data. (2) This model incorporates the item cooccurrence matrix into metric learning so it can better learn the spatial positions of users and items. (3) Extensive experiments on a number of real-world datasets show that the proposed method substantially outperforms the compared algorithm in both rating prediction and item ranking.
APA, Harvard, Vancouver, ISO, and other styles
29

Li, Xingzhou, and Xin Zeng. "Expected Income of New Currency in Blockchain Based on Data-Mining Technology." Electronics 9, no. 1 (January 15, 2020): 160. http://dx.doi.org/10.3390/electronics9010160.

Full text
Abstract:
In order to realize the understanding of expected returns after issuance of blockchain new currency initial coin offerings (ICO) and maximize investment returns, in this study, the Semantic Orientation Pointwise Mutual Information (SO-PMI) algorithm is used to create a customer emotional dictionary of blockchain new currency, and collect users’ online comments based on blockchain currency before ICO. The Support Vector Machine (SVM) algorithm is used to construct an evaluation model, analyze and judge users’ comments, make accurate prediction of the expected return of ICO issuing new currency, improve investment operations, and maximize the return of investment. The results show that the combination of the SO-PMI and SVM algorithms can accurately evaluate the price after the issuance of new currency, and then realize the judgment of expected return and obtain the expected return of investment. It can be seen that the combination of algorithms based on data-mining technology is applied to the study of the expected return of new currency issuance in blockchain, which achieves the goal of revenue anticipation and greatly reduces the investment risk of new currency issuance in blockchain.
APA, Harvard, Vancouver, ISO, and other styles
30

Guo, Fei, Zhongshi He, Liangyan Li, and Jing Xuan. "Unsupervised Learning of Multi-Sense Embedding with Matrix Factorization and Sparse Soft Clustering." International Journal of Pattern Recognition and Artificial Intelligence 33, no. 13 (December 15, 2019): 1951011. http://dx.doi.org/10.1142/s021800141951011x.

Full text
Abstract:
In the natural language environment, accurately inferring the meaning of a token according to its context is crucial to understanding a sophisticated expression. However, this is not easy for a machine. The traditional language models used to train distributed word vectors are often restricted by single-sense embedding. In this paper, we develop a model called MSCvec (Multi-sense Soft Clustering Vector) for word sense disambiguation of polysemy in context. We extract the features of individual words by the co-occurrence PPMI (Positive Pointwise Mutual Information) matrix, and decompose the matrix by NMF (Nonnegative Matrix Factorization) into low-rank representations of target words, which are used as the input of an unsupervised sparse soft clustering method called Sparse Fuzzy C-means (SFCM). We use SFCM to determine the global semantic space of words, and partition the subspaces of multiple senses of a polysemous word. We relabel candidate words by the negative average log likelihood, and train multi-sense embedding with extensional vocabulary by the fastText model. Compared with the traditional static embeddings, the result shows that NMF and SFCM design can improve the performance in word similarity and relatedness tasks as well as in text classification tasks of different types of text. Accurate semantic representation of MSCvec would be necessary to produce outstanding results.
APA, Harvard, Vancouver, ISO, and other styles
31

Ren, Hongkai, Xi Mao, Weijun Ma, Jizhou Wang, and Linyun Wang. "An English-Chinese Machine Translation and Evaluation Method for Geographical Names." ISPRS International Journal of Geo-Information 9, no. 3 (February 25, 2020): 139. http://dx.doi.org/10.3390/ijgi9030139.

Full text
Abstract:
In recent years, with increasing international communication and cooperation, the consensus of toponymic information among different countries has become increasingly important. A large number of English geographical names are in urgent need of translation into Chinese, but there are few studies on machine translation of geographical names at present. Therefore, this paper proposes a method of automatically translating English geographical names into Chinese. First, the lexical structure of the geographic names is analyzed to divide the whole name into two parts, the special name and the general name, in an approach based on the statistical template model that implements pointwise mutual information and a directed acyclic graph data structure on the extracted names from different categories of a geographical name corpus. Second, the two parts of the geographic names are translated. The general name can be directly translated via methods of free translation. For the transliteration of the special name, the phonetic symbols are generated based on the cyclic neural network, and then, the syllables are divided based on the minimum entropy and converted into Chinese characters. Finally, the two parts of Chinese characters are combined, and criteria are prepared to evaluate the translation reliability according to the translation process to realize automatic quality inspection and screening of geographical names. As the experimental results show, the method is effective in the translation process of English geographic names into Chinese. This method can be easily extended to other languages such as Arabic.
APA, Harvard, Vancouver, ISO, and other styles
32

Feng, Jun, Cheng Gong, Xiaodong Li, and Raymond Y. K. Lau. "Automatic Approach of Sentiment Lexicon Generation for Mobile Shopping Reviews." Wireless Communications and Mobile Computing 2018 (August 12, 2018): 1–13. http://dx.doi.org/10.1155/2018/9839432.

Full text
Abstract:
The dramatic increase in the use of smartphones has allowed people to comment on various products at any time. The analysis of the sentiment of users’ product reviews largely depends on the quality of sentiment lexicons. Thus, the generation of high-quality sentiment lexicons is a critical topic. In this paper, we propose an automatic approach for constructing a domain-specific sentiment lexicon by considering the relationship between sentiment words and product features in mobile shopping reviews. The approach first selects sentiment words and product features from original reviews and mines the relationship between them using an improved pointwise mutual information algorithm. Second, sentiment words that are related to mobile shopping are clustered into categories to form sentiment dimensions. At each sentiment dimension, each sentiment word can take the value of 0 or 1, where 1 indicates that the word belongs to a particular category whereas 0 indicates that it does not belong to that category. The generated lexicon is evaluated by constructing a sentiment classification task using several product reviews written in both Chinese and English. Two popular non-domain-specific sentiment lexicons as well as state-of-the-art machine-learning and deep-learning models are chosen as benchmarks, and the experimental results show that our sentiment lexicons outperform the benchmarks with statistically significant differences, thus proving the effectiveness of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
33

Liu, Ling, and Sang-Bing Tsai. "Intelligent Recognition and Teaching of English Fuzzy Texts Based on Fuzzy Computing and Big Data." Wireless Communications and Mobile Computing 2021 (July 10, 2021): 1–10. http://dx.doi.org/10.1155/2021/1170622.

Full text
Abstract:
In this paper, we conduct in-depth research and analysis on the intelligent recognition and teaching of English fuzzy text through parallel projection and region expansion. Multisense Soft Cluster Vector (MSCVec), a multisense word vector model based on nonnegative matrix decomposition and sparse soft clustering, is constructed. The MSCVec model is a monolingual word vector model, which uses nonnegative matrix decomposition of positive point mutual information between words and contexts to extract low-rank expressions of mixed semantics of multisense words and then uses sparse. It uses the nonnegative matrix decomposition of the positive pointwise mutual information between words and contexts to extract the low-rank expressions of the mixed semantics of the polysemous words and then uses the sparse soft clustering algorithm to partition the multiple word senses of the polysemous words and also obtains the global sense of the polysemous word affiliation distribution; the specific polysemous word cluster classes are determined based on the negative mean log-likelihood of the global affiliation between the contextual semantics and the polysemous words, and finally, the polysemous word vectors are learned using the Fast text model under the extended dictionary word set. The advantage of the MSCVec model is that it is an unsupervised learning process without any knowledge base, and the substring representation in the model ensures the generation of unregistered word vectors; in addition, the global affiliation of the MSCVec model can also expect polysemantic word vectors to single word vectors. Compared with the traditional static word vectors, MSCVec shows excellent results in both word similarity and downstream text classification task experiments. The two sets of features are then fused and extended into new semantic features, and similarity classification experiments and stack generalization experiments are designed for comparison. In the cross-lingual sentence-level similarity detection task, SCLVec cross-lingual word vector lexical-level features outperform MSCVec multisense word vector features as the input embedding layer; deep semantic sentence-level features trained by twin recurrent neural networks outperform the semantic features of twin convolutional neural networks; extensions of traditional statistical features can effectively improve cross-lingual similarity detection performance, especially cross-lingual topic model (BL-LDA); the stack generalization integration approach maximizes the error rate of the underlying classifier and improves the detection accuracy.
APA, Harvard, Vancouver, ISO, and other styles
34

Zhao, Futao, Zhong Yao, Jing Luan, and Hao Liu. "Inducing stock market lexicons from disparate Chinese texts." Industrial Management & Data Systems 120, no. 3 (December 23, 2019): 508–25. http://dx.doi.org/10.1108/imds-04-2019-0254.

Full text
Abstract:
Purpose The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets. Design/methodology/approach This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons. Findings The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks. Originality/value This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.
APA, Harvard, Vancouver, ISO, and other styles
35

Paquot, Magali. "The phraseological dimension in interlanguage complexity research." Second Language Research 35, no. 1 (March 22, 2017): 121–45. http://dx.doi.org/10.1177/0267658317694221.

Full text
Abstract:
This article reports on the first results of a large-scale research programme that aims to define and circumscribe the construct of phraseological complexity and to theoretically and empirically demonstrate its relevance for second language theory. Within this broad agenda, the study has two main objectives. First, it investigates to what extent measures of phraseological complexity can be used to describe second language (L2) performance at different proficiency levels. Second, it compares measures of phraseological complexity with traditional measures of syntactic and lexical complexity. Variety and sophistication are postulated to be the first two dimensions of phraseological complexity, which is approached via relational co-occurrences, i.e. co-occurring words that appear in a specific structural or syntactic relation (e.g. adjective + noun, adverbial modifier + verb, verb + direct object). Phraseological diversity is operationalized as root type–token ratio computed for each syntactic relation. Two methods are tested to approach phraseological sophistication. First, sophisticated word combinations are defined as academic collocations that appear in the Academic Collocation List (Ackermann and Chen, 2013). Second, it is approximated with the average pointwise mutual information score as this measures has been shown to bring out word combinations made up of closely associated medium to low-frequency (i.e. advanced or sophisticated) words. The study reveals that unlike traditional measures of syntactic and lexical complexity, measures of phraseological sophistication can be used to describe L2 performance at the B2, C1 and C2 levels of the Common European Framework of References for Languages (CEFR), thus suggesting that essential aspects of language development from upper-intermediate to very advanced proficiency level may be situated in the phraseological dimension.
APA, Harvard, Vancouver, ISO, and other styles
36

Čmelo, I., M. Voršilák, and D. Svozil. "Profiling and analysis of chemical compounds using pointwise mutual information." Journal of Cheminformatics 13, no. 1 (January 10, 2021). http://dx.doi.org/10.1186/s13321-020-00483-y.

Full text
Abstract:
AbstractPointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound’s feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.
APA, Harvard, Vancouver, ISO, and other styles
37

Čmelo, I., M. Voršilák, and D. Svozil. "Profiling and analysis of chemical compounds using pointwise mutual information." Journal of Cheminformatics 13, no. 1 (January 10, 2021). http://dx.doi.org/10.1186/s13321-020-00483-y.

Full text
Abstract:
AbstractPointwise mutual information (PMI) is a measure of association used in information theory. In this paper, PMI is used to characterize several publicly available databases (DrugBank, ChEMBL, PubChem and ZINC) in terms of association strength between compound structural features resulting in database PMI interrelation profiles. As structural features, substructure fragments obtained by coding individual compounds as MACCS, PubChemKey and ECFP fingerprints are used. The analysis of publicly available databases reveals, in accord with other studies, unusual properties of DrugBank compounds which further confirms the validity of PMI profiling approach. Z-standardized relative feature tightness (ZRFT), a PMI-derived measure that quantifies how well the given compound’s feature combinations fit these in a particular compound set, is applied for the analysis of compound synthetic accessibility (SA), as well as for the classification of compounds as easy (ES) and hard (HS) to synthesize. ZRFT value distributions are compared with these of SYBA and SAScore. The analysis of ZRFT values of structurally complex compounds in the SAVI database reveals oligopeptide structures that are mispredicted by SAScore as HS, while correctly predicted by ZRFT and SYBA as ES. Compared to SAScore, SYBA and random forest, ZRFT predictions are less accurate, though by a narrow margin (AccZRFT = 94.5%, AccSYBA = 98.8%, AccSAScore = 99.0%, AccRF = 97.3%). However, ZRFT ability to distinguish between ES and HS compounds is surprisingly high considering that while SYBA, SAScore and random forest are dedicated SA models, ZRFT is a generic measurement that merely quantifies the strength of interrelations between structural feature pairs. The results presented in the current work indicate that structural feature co-occurrence, quantified by PMI or ZRFT, contains a significant amount of information relevant to physico-chemical properties of organic compounds.
APA, Harvard, Vancouver, ISO, and other styles
38

Meckbach, Cornelia, Rebecca Tacke, Xu Hua, Stephan Waack, Edgar Wingender, and Mehmet Gültas. "PC-TraFF: identification of potentially collaborating transcription factors using pointwise mutual information." BMC Bioinformatics 16, no. 1 (December 2015). http://dx.doi.org/10.1186/s12859-015-0827-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Cui, Xia, Noor Al-Bazzaz, Danushka Bollegala, and Frans Coenen. "A comparative study of pivot selection strategies for unsupervised cross-domain sentiment classification." Knowledge Engineering Review 33 (2018). http://dx.doi.org/10.1017/s0269888918000085.

Full text
Abstract:
AbstractSelecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain, mutual (or pointwise mutual) information have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods.
APA, Harvard, Vancouver, ISO, and other styles
40

Luo, Xin, Zhigang Liu, Mingsheng Shang, and Mengchu Zhou. "Highly-Accurate Community Detection via Pointwise Mutual Information-Incorporated Symmetric Non-negative Matrix Factorization." IEEE Transactions on Network Science and Engineering, 2020, 1. http://dx.doi.org/10.1109/tnse.2020.3040407.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Lan, Chaowang, Hui Peng, Gyorgy Hutvagner, and Jinyan Li. "Construction of competing endogenous RNA networks from paired RNA-seq data sets by pointwise mutual information." BMC Genomics 20, S9 (December 2019). http://dx.doi.org/10.1186/s12864-019-6321-x.

Full text
Abstract:
Abstract Background A long noncoding RNA (lncRNA) can act as a competing endogenous RNA (ceRNA) to compete with an mRNA for binding to the same miRNA. Such an interplay between the lncRNA, miRNA, and mRNA is called a ceRNA crosstalk. As an miRNA may have multiple lncRNA targets and multiple mRNA targets, connecting all the ceRNA crosstalks mediated by the same miRNA forms a ceRNA network. Methods have been developed to construct ceRNA networks in the literature. However, these methods have limits because they have not explored the expression characteristics of total RNAs. Results We proposed a novel method for constructing ceRNA networks and applied it to a paired RNA-seq data set. The first step of the method takes a competition regulation mechanism to derive candidate ceRNA crosstalks. Second, the method combines a competition rule and pointwise mutual information to compute a competition score for each candidate ceRNA crosstalk. Then, ceRNA crosstalks which have significant competition scores are selected to construct the ceRNA network. The key idea, pointwise mutual information, is ideally suitable for measuring the complex point-to-point relationships embedded in the ceRNA networks. Conclusion Computational experiments and results demonstrate that the ceRNA networks can capture important regulatory mechanism of breast cancer, and have also revealed new insights into the treatment of breast cancer. The proposed method can be directly applied to other RNA-seq data sets for deeper disease understanding.
APA, Harvard, Vancouver, ISO, and other styles
42

Tirumalasetty, Dr Sudhir, P. Tejaswini, R. Renuka, and M. Naga Sirisha. "Discovery of Probable Sentiments in Hypertensive Pregnant Women using Horizontal Fragmentation and Pointwise Mutual Information." International Journal of Scientific Research in Computer Science, Engineering and Information Technology, March 11, 2019, 742–45. http://dx.doi.org/10.32628/cseit1952191.

Full text
Abstract:
Since a decade research over sentiment analysis and opinion mining was evolving slowing and emerging widely with greater perspectives and objectives. Sentiment analysis is an important task in order to gain insights over the huge amounts of opinions that are generated on a daily basis. This analysis relies on the opinions made by the individuals. These opinions are text, may be positive or negative or a phrase which gives significance to the context. Also these opinions have the power of expressing the context besides drags the attention of new folks. Expressing such opinions ranges from documents level, to the sentence level, to phrase level, to word level and to special symbol level. All these opinion types are labelled with common name Sentiment Analysis. Sentiment Analysis is health care is evolving narrowly with wider research strings. This paper mainly focuses in identifying Sentiments in health care. These sentiments can be medical test values which may be numeric and nominal; sometimes in text too. Such sentiments are identified with pre-fragmentation of data set and Pointwise Mutual Information measure. To accomplish this data of hypertensive pregnant women is considered.
APA, Harvard, Vancouver, ISO, and other styles
43

Georgieva-Trifonova, Tsvetanka. "Research on Improvement of N-grams Based Text Classification by Applying Pointwise Mutual Information Measures." Baltic Journal of Modern Computing 9, no. 3 (2021). http://dx.doi.org/10.22364/bjmc.2021.9.3.05.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Fan, Xiao-Nan, Shao-Wu Zhang, Song-Yao Zhang, Kunju Zhu, and Songjian Lu. "Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information." BMC Bioinformatics 20, no. 1 (February 19, 2019). http://dx.doi.org/10.1186/s12859-019-2675-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Lin, Weiqiang, Jiadong Ji, Yuchen Zhu, Mingzhuo Li, Jinghua Zhao, Fuzhong Xue, and Zhongshang Yuan. "PMINR: Pointwise Mutual Information-Based Network Regression – With Application to Studies of Lung Cancer and Alzheimer’s Disease." Frontiers in Genetics 11 (October 15, 2020). http://dx.doi.org/10.3389/fgene.2020.556259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

"Extraction of Relying Factors to be Diabetic in Pregnant Women using Attribute Mutual Information." International Journal of Innovative Technology and Exploring Engineering 9, no. 2 (December 10, 2019): 4959–61. http://dx.doi.org/10.35940/ijitee.b9075.129219.

Full text
Abstract:
Since a decade research over sentiment analysis and opinion mining was evolving slowing and emerging widely with greater perspectives and objectives. Sentiment analysis is an important task in order to gain insights over the huge amounts of opinions that are generated on a daily basis. This analysis relies on the opinions made by the individuals. These opinions are text, may be positive or negative or a phrase which gives significance to the context. Also these opinions have the power of expressing the context besides drags the attention of new folks. Expressing such opinions ranges from documents level, to the sentence level, to phrase level, to word level and to special symbol level. All these opinion types are labelled with common name Sentiment Analysis. Sentiment Analysis is health care is evolving narrowly with wider research strings. This paper mainly focuses in identifying Sentiments in health care. These sentiments can be medical test values which may be numeric and nominal; sometimes in text too. Such sentiments are identified with pre-fragmentation of data set and Pointwise Mutual Information measure. To accomplish this data of hypertensive pregnant women is considered.
APA, Harvard, Vancouver, ISO, and other styles
47

Rozemberczki, Benedek, Carl Allen, and Rik Sarkar. "Multi-Scale attributed node embedding." Journal of Complex Networks 9, no. 2 (April 1, 2021). http://dx.doi.org/10.1093/comnet/cnab014.

Full text
Abstract:
Abstract We present network embedding algorithms that capture information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram. Observations from neighbourhoods of different sizes are either pooled (AE) or encoded distinctly in a multi-scale approach (MUSAE). Capturing attribute-neighbourhood relationships over multiple scales is useful for a range of applications, including latent feature identification across disconnected networks with similar features. We prove theoretically that matrices of node-feature pointwise mutual information are implicitly factorized by the embeddings. Experiments show that our algorithms are computationally efficient and outperform comparable models on social networks and web graphs.
APA, Harvard, Vancouver, ISO, and other styles
48

Bos, Thomas, and Flavius Frasincar. "Automatically Building Financial Sentiment Lexicons While Accounting for Negation." Cognitive Computation, February 11, 2021. http://dx.doi.org/10.1007/s12559-021-09833-w.

Full text
Abstract:
AbstractFinancial investors make trades based on available information. Previous research has proved that microblogs are a useful source for supporting stock market decisions. However, the financial domain lacks specific sentiment lexicons that could be utilized to extract the sentiment from these microblogs. In this research, we investigate automatic approaches that can be used to build financial sentiment lexicons. We introduce weighted versions of the Pointwise Mutual Information approaches to build sentiment lexicons automatically. Furthermore, existing sentiment lexicons often neglect negation while building the sentiment lexicons. In this research, we also propose two methods (Negated Word and Flip Sentiment) to extend the sentiment building approaches to take into account negation when constructing a sentiment lexicon. We build the financial sentiment lexicons by leveraging 200,000 messages from StockTwits. We evaluate the constructed financial sentiment lexicons in two different sentiment classification tasks (unsupervised and supervised). In addition, the created financial sentiment lexicons are compared with each other and with other existing sentiment lexicons. The best performing financial sentiment lexicon is built by combining our Weighted Normalized Pointwise Mutual Information approach with the Negated Word approach. It outperforms all the other sentiment lexicons in the two sentiment classification tasks. In the unsupervised sentiment classification task, it has, on average, a balanced accuracy of 69.4%, and in the supervised setting, a balanced accuracy of 75.1%. Moreover, the various sentiment classification tasks confirm that the sentiment lexicons could be improved by taking into account negation while building the sentiment lexicons. The improvement could be made by using one of the proposed methods to incorporate negation in the sentiment lexicon construction process.
APA, Harvard, Vancouver, ISO, and other styles
49

Joe Dhanith, P. R., and B. Surendiran. "An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm." International Journal of Computers and Applications, October 30, 2019, 1–7. http://dx.doi.org/10.1080/1206212x.2019.1684023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

"Semantic Pattern Detection in Covid-19 using Contextual Clustering and Intelligent Topic Modeling." International Journal of E-Health and Medical Communications 13, no. 2 (July 2022): 0. http://dx.doi.org/10.4018/ijehmc.20220701oa07.

Full text
Abstract:
The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography