Log in

Relevant bibliographies by topics / Latent Semantic Indexing (LSI) / Journal articles

Journal articles on the topic 'Latent Semantic Indexing (LSI)'

To see the other types of publications on this topic, follow the link: Latent Semantic Indexing (LSI).

Author: Grafiati

Published: 10 December 2022

Last updated: 19 February 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Latent Semantic Indexing (LSI).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Xu, Yanyan, Dengfeng Ke, and Kaile Su. "Contextualized Latent Semantic Indexing: A New Approach to Automated Chinese Essay Scoring." Journal of Intelligent Systems 26, no. 2 (April 1, 2017): 263–85. http://dx.doi.org/10.1515/jisys-2015-0048.

Full text

Abstract:

AbstractThe writing part in Chinese language tests is badly in need of a mature automated essay scoring system. In this paper, we propose a new approach applied to automated Chinese essay scoring (ACES), called contextualized latent semantic indexing (CLSI), of which Genuine CLSI and Modified CLSI are two versions. The n-gram language model and the weighted finite-state transducer (WFST), two critical components, are used to extract context information in our ACES system. Not only does CLSI improve conventional latent semantic indexing (LSI), but bridges the gap between latent semantics and their context information, which is absent in LSI. Moreover, CLSI can score essays from the perspectives of language fluency and contents, and address the local overrating and underrating problems caused by LSI. Experimental results show that CLSI outperforms LSI, Regularized LSI, and latent Dirichlet allocation in many aspects, and thus, proves to be an effective approach.

APA, Harvard, Vancouver, ISO, and other styles

2

Atreya, Avinash, and Charles Elkan. "Latent semantic indexing (LSI) fails for TREC collections." ACM SIGKDD Explorations Newsletter 12, no. 2 (March 31, 2011): 5–10. http://dx.doi.org/10.1145/1964897.1964900.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Srinivas, S., and Ch AswaniKumar. "Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval." Journal of Information & Knowledge Management 05, no. 02 (June 2006): 97–105. http://dx.doi.org/10.1142/s0219649206001359.

Full text

Abstract:

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

APA, Harvard, Vancouver, ISO, and other styles

4

Blynova, N. "Latent semantic indexing (LSI) and its impact on copywriting." Communications and Communicative Technologies, no. 19 (May 5, 2019): 4–12. http://dx.doi.org/10.15421/291901.

Full text

Abstract:

Latent semantic indexing (LSI) is becoming more and more popular in copywriting, gradually replacing texts written on the principles of SEO. LSI was called in the 2010s, when popular search engines switched to a qualitatively new way of ranking materials and sites. The difference between SEO and LSI ways of creation lies in the fact that search engines rank SEO materials by keywords, while LSI are ranked how fully the topic is covered and how useful the article will be to the reader. Consequently, in addition to keywords and phrases, the associative core is involved here. Materials written for people have replaced the texts created for the search engine. The article describes the algorithm for creation of the associative and thematic core, the ways in which this can be done. The basic steps helping to create an LSI text are also shown.The author underlines that due to the specificity of the presentation of a significant amount of information and the maximum expertise in the disclosure of the topic, text writers accustomed to working on the principles of SEO have to learn to write within a new paradigm. The owners of the websites that host articles created by LSI principles have discovered the advantages of this way of presenting information, since their resources have become better indexed and take the leading positions in search results. Such algorithms as “Baden-Baden”, “Korolev” and “Panda” have positively influenced the Internet environment as a whole, since re-optimized texts, which were filled with keys and were of little use to the reader, now have turned out to be on the last positions of issue. The new method of ranking according to the LSI method allows specialists to create the texts that are not only useful and expert but also differ in lexical richness, using expressive and figurative means of the language, which could not be assumed in SEO materials.It is highlighted in the article the use of neural networks should bring the way of presenting information to the consumer’s needs even more, inventing techniques that will allow leading materials created in an ordinary language to lead the positions without the need to incorporate key phrases into the text. We believe that the LSI-method, which has perfectly manifested itself in copywriting, is capable of unlocking the potential of the media texts, which are now being written on the principles of SEO.

APA, Harvard, Vancouver, ISO, and other styles

5

Kontostathis, April, and William M. Pottenger. "A framework for understanding Latent Semantic Indexing (LSI) performance." Information Processing & Management 42, no. 1 (January 2006): 56–73. http://dx.doi.org/10.1016/j.ipm.2004.11.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Li, Min Song. "A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features." Advanced Materials Research 181-182 (January 2011): 830–35. http://dx.doi.org/10.4028/www.scientific.net/amr.181-182.830.

Full text

Abstract:

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

APA, Harvard, Vancouver, ISO, and other styles

7

Praus, Petr, and Pavel Praks. "Information retrieval in hydrochemical data using the latent semantic indexing approach." Journal of Hydroinformatics 9, no. 2 (March 1, 2007): 135–43. http://dx.doi.org/10.2166/hydro.2007.003b.

Full text

Abstract:

The latent semantic indexing (LSI) method was applied for the retrieval of similar samples (those samples with a similar composition) in a dataset of groundwater samples. The LSI procedure was based on two steps: (i) reduction of the data dimensionality by principal component analysis (PCA) and (ii) calculation of a similarity between selected samples (queries) and other samples. The similarity measures were expressed as the cosine similarity, the Euclidean and Manhattan distances. Five queries were chosen so as to represent different sampling localities. The original data space of 14 variables measured in 95 samples of groundwater was reduced to the three-dimensional space of the three largest principal components which explained nearly 80% of the total variance. The five most proximity samples to each query were evaluated. The LSI outputs were compared with the retrievals in the orthogonal system of all variables transformed by PCA and in the system of standardized original variables. Most of these retrievals did not agree with the LSI ones, most likely because both systems contained the interfering data noise which was not preliminary removed by the dimensionality reduction. Therefore the LSI approach based on the noise filtration was considered to be a promising strategy for information retrieval in real hydrochemical data.

APA, Harvard, Vancouver, ISO, and other styles

8

Aswani Kumar, Ch, M. Radvansky, and J. Annapurna. "Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval." Cybernetics and Information Technologies 12, no. 1 (March 1, 2012): 34–48. http://dx.doi.org/10.2478/cait-2012-0003.

Full text

Abstract:

Abstract Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represented in the form of matrices. The objective of this paper is to systematically analyze VSM, LSI and FCA for the task of IR using standard and real life datasets.

APA, Harvard, Vancouver, ISO, and other styles

9

Al-Anzi, Fawaz, and Dia AbuZeina. "Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application." International Arab Journal of Information Technology 17, no. 5 (September 1, 2020): 742–49. http://dx.doi.org/10.34028/iajit/17/5/7.

Full text

Abstract:

The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common document representation model. However, there are some challenges to this technique such as high dimensional space and semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact, LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications. Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method that can be used in search engines to find better matched contents of the retrieving documents. The proposed method introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222 unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results show that the performance of the proposed method is superior when compared to the standard LSI

APA, Harvard, Vancouver, ISO, and other styles

10

Zhan, Jiaming, and Han Tong Loh. "Using Latent Semantic Indexing to Improve the Accuracy of Document Clustering." Journal of Information & Knowledge Management 06, no. 03 (September 2007): 181–88. http://dx.doi.org/10.1142/s0219649207001755.

Full text

Abstract:

Document clustering is a significant research issue in information retrieval and text mining. Traditionally, most clustering methods were based on the vector space model which has a few limitations such as high dimensionality and weakness in handling synonymous and polysemous problems. Latent semantic indexing (LSI) is able to deal with such problems to some extent. Previous studies have shown that using LSI could reduce the time in clustering a large document set while having little effect on clustering accuracy. However, when conducting clustering upon a small document set, the accuracy is more concerned than efficiency. In this paper, we demonstrate that LSI can improve the clustering accuracy of a small document set and we also recommend the dimensions needed to achieve the best clustering performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Susanto, Gaguk, and Hari Lugis Purwanto. "Information Retrieval Menggunakan Latent Semantic Indexing Pada Ebook." SMATIKA JURNAL 8, no. 02 (October 30, 2018): 74–79. http://dx.doi.org/10.32664/smatika.v8i02.204.

Full text

Abstract:

Ebook merupakan buku elektronik sebagai pengganti buku kertas yang dapat dibuka dalam perangkat elektronik (smartphone, laptop atau PC). Bagi guru/pengajar maupun siswa/mahasiswa tentunya memiliki koleksi ebook yang banyak dalam komputer dan lebih cenderung hanya tersimpan dalam hardisk dan jarang sekali dibuka padahal isi dalam ebook tersebut mengandung bermacam-macam ilmu pengetahun. Ilmu pengetahuan dari ebook tersebut tentunya dapat dimanfaat untuk merubah komputer menjadi asisten cerdas yang sangat bermanfaat bagi pemiliknya. Asisten cerdas akan memberikan referensi dengan cepat mengenai informasi yang dibutuhkan oleh pemilik komputer. Oleh karena itu pemanfaatan ebook untuk dasar pembentukan sumber ilmu pengetahuan memerlukan suatu media yang menjembatani antara pemilik dan perangkat. Sistem information retrieval dibutuhkan untuk memanfaatkan sumber pengetahuan yang ada dalam koleksi ebook tersebut sehingga proses pencarian informasi dapat dilakukan dengan cepat. Proses information retrieval dilakukan dengan menggunakan metode Latent Semantic Indexing. LSI merupakan sebuah metode automatic indexing dan retrieval dengan memanfaatkan semantic structure yang secara implisit terdapat dalam suatu dokumen untuk digunakan dalam pencarian dokumen yang relevan dengan terms dalam kueri. Oleh sebab itu Information Retrieval Menggunakan Latent Semantic Indexing pada ebook sangat dibutuhkan bukan hanya para pengajar saja tapi bagi semua masarakat yang ingin menjadikan perangkat komputernya menjadi asisten pintar.

APA, Harvard, Vancouver, ISO, and other styles

12

Lee, Ji-Hye, and Young-Mee Chung. "An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI)." Journal of the Korean Society for information Management 26, no. 3 (September 30, 2009): 451–62. http://dx.doi.org/10.3743/kosim.2009.26.3.451.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

ZELIKOVITZ, SARAH, and FINELLA MARQUEZ. "TRANSDUCTIVE LEARNING FOR SHORT-TEXT CLASSIFICATION PROBLEMS USING LATENT SEMANTIC INDEXING." International Journal of Pattern Recognition and Artificial Intelligence 19, no. 02 (March 2005): 143–63. http://dx.doi.org/10.1142/s0218001405003971.

Full text

Abstract:

This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. This method can be especially useful to combat possible inclusion of unrelated data in the original corpus, and to compensate for limited amounts of data. Additionally, we evaluate the vocabulary of the training and test sets and present the results of a series of experiments to illustrate how the test set is used in an advantageous way.

APA, Harvard, Vancouver, ISO, and other styles

14

Boushaki, Saida Ishak, Omar Bendjeghaba, and Nadjet Kamel. "Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm." International Journal of Swarm Intelligence Research 12, no. 4 (October 2021): 169–85. http://dx.doi.org/10.4018/ijsir.2021100109.

Full text

Abstract:

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

APA, Harvard, Vancouver, ISO, and other styles

15

Aquino, Angelica M., and Enrico P. Chavez. "Analysis on the use of Latent Semantic Indexing (LSI) for document classification and retrieval system of PNP files." MATEC Web of Conferences 189 (2018): 03009. http://dx.doi.org/10.1051/matecconf/201818903009.

Full text

Abstract:

Document classification is the process of categorizing documents from many mixed files automatically [1]. In this paper, an approach to classification of documents for admin-case files of Philippine National Police (PNP) using Latent Semantic Indexing (LSI) method is proposed. The model for this that represents term-to-term, document-todocument and term-to-document relationships has been applied. Regular Expression is implemented also to define a search pattern based on character strings which the LSI used to establish the semantic relevance of the character strings to the search term or keyword. The aim of the study is to evaluate the performance of LSI in classifying PNP documents; experimentation was done using software to test the capability of LSI towards text retrieval. Indexing is according to the pattern matched in the collection of text that uses model of SVD. Based on tests, documents were indexed based on file relationships and was able to return a search result as the retrieved information from PNP files. Weights are used to check the accuracy of the method; the positive values identified in query similarity are regarded as the most relevant among the related searches, meaning, the query word matches words in a text file and it returns a query result.

APA, Harvard, Vancouver, ISO, and other styles

16

Kitajima, Risa, and Ichiro Kobayashi. "Latent Topic Estimation Based on Events in a Document." Journal of Advanced Computational Intelligence and Intelligent Informatics 16, no. 5 (July 20, 2012): 603–10. http://dx.doi.org/10.20965/jaciii.2012.p0603.

Full text

Abstract:

Several latent topic model-based methods such as Latent Semantic Indexing (LSI), Probabilistic LSI (pLSI), and Latent Dirichlet Allocation (LDA) have been widely used for text analysis. These methods basically assign topics to words, however, and the relationship between words in a document is therefore not considered. Considering this, we propose a latent topic extraction method that assigns topics to events that represent the relation between words in a document. There are several ways to express events, and the accuracy of estimating latent topics differs depending on the definition of an event. We therefore propose five event types and examine which event type works well in estimating latent topics in a document with a common document retrieval task. As an application of our proposed method, we also show multidocument summarization based on latent topics. Through these experiments, we have confirmed that our proposed method results in higher accuracy than the conventional method.

APA, Harvard, Vancouver, ISO, and other styles

17

Ishak Boushaki, Saida, Nadjet Kamel, and Omar Bendjeghaba. "High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing." Journal of Information & Knowledge Management 17, no. 03 (September 2018): 1850033. http://dx.doi.org/10.1142/s0219649218500338.

Full text

Abstract:

The clustering is an important data analysis technique. However, clustering high-dimensional data like documents needs more effort in order to extract the richness relevant information hidden in the multidimensionality space. Recently, document clustering algorithms based on metaheuristics have demonstrated their efficiency to explore the search area and to achieve the global best solution rather than the local one. However, most of these algorithms are not practical and suffer from some limitations, including the requirement of the knowledge of the number of clusters in advance, they are neither incremental nor extensible and the documents are indexed by high-dimensional and sparse matrix. In order to overcome these limitations, we propose in this paper, a new dynamic and incremental approach (CS_LSI) for document clustering based on the recent cuckoo search (CS) optimization and latent semantic indexing (LSI). Conducted Experiments on four well-known high-dimensional text datasets show the efficiency of LSI model to reduce the dimensionality space with more precision and less computational time. Also, the proposed CS_LSI determines the number of clusters automatically by employing a new proposed index, focused on significant distance measure. This later is also used in the incremental mode and to detect the outlier documents by maintaining a more coherent clusters. Furthermore, comparison with conventional document clustering algorithms shows the superiority of CS_LSI to achieve a high quality of clustering.

APA, Harvard, Vancouver, ISO, and other styles

18

Agung Hasbi Ardiansyah, Kurnia Paranita Kartika, and Saiful Nur Budiman. "PENERAPAN LATENT SEMANTIC INDEXING PADA SISTEM TEMU BALIK INFORMASI PADA UNDANG-UNDANG PEMILU BERDASARKAN KASUS." Jurnal Mnemonic 4, no. 2 (November 6, 2021): 64–70. http://dx.doi.org/10.36040/mnemonic.v4i2.4165.

Full text

Abstract:

Ketika mendapat temuan atau laporan dugaan kasus pelanggaran pemilu, pengawas pemilu akan melakukan klarifikasi dan pencarian bukti-bukti yang cukup sebelum menentukan temuan atau laporan tersebut termasuk kedalam pelanggaran atau tidak. Pada saat proses klarifikasi, pengawas pemilu mencari pasal yang kemungkinan dilanggar pada temuan atau laporan yang masuk. Banyaknya pasal rujukan untuk masing-masing kasus pada temuan atau laporan terkadang menghambat pekerjaan petugas pengawas pemilu, sehingga dibutuhkan sebuah alat bantu untuk mempercepat proses pencarian pasal berdasarkan kasus pelanggaran. Pada penelitian ini, sistem temu balik informasi digunakan untuk mencari pasal-pasal pada undang-undang nomor 10 tahun 2016 yang relevan pada suatu kasus berdasarkan deskripsi kasus. Pada penelitian ini digunakan metode Latent Semantic Indexing (LSI). LSI menggunakan teknik Singular Value Decomposition (SVD) untuk mereduksi dimensi. Pada penelitian ini digunakan 37 pasal, dan 4 kasus atau deskripsi pelanggaran sebagai query. Sistem menerima masukkan berupa query atau deskripsi kasus pelanggaran kemudian menghitung dan menentukan pasal yang terkait. Tingkat keberhasilan dari metode ini untuk menemukan hasil pencarian yang relevan dapat dilihat melalui besar 100% untuk recall, 70% untuk precision dan 82% untuk f-measure.

APA, Harvard, Vancouver, ISO, and other styles

19

MARCUS, ANDRIAN, JONATHAN I. MALETIC, and ANDREY SERGEYEV. "RECOVERY OF TRACEABILITY LINKS BETWEEN SOFTWARE DOCUMENTATION AND SOURCE CODE." International Journal of Software Engineering and Knowledge Engineering 15, no. 05 (October 2005): 811–36. http://dx.doi.org/10.1142/s0218194005002543.

Full text

Abstract:

An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documentation. A semi-automatic process is defined based on the proposed methodology. The paper advocates the use of latent semantic indexing (LSI) as the supporting information retrieval technique. Two case studies using existing software are presented comparing this approach with others. The case studies show positive results for the proposed approach, especially considering the flexibility of the methods used.

APA, Harvard, Vancouver, ISO, and other styles

20

Hasibuan, Muhammad Said, Lukito Edi Nugroho, and Paulus Insap Santosa. "Model detecting learning styles with artificial neural network." Journal of Technology and Science Education 9, no. 1 (February 1, 2019): 85. http://dx.doi.org/10.3926/jotse.540.

Full text

Abstract:

Currently the detection of learning styles from the external aspect has not produced optimal results. This research tries to solve the problem by using an internal approach. The internal approach is one that derives from the personality of the learner. One of the personality traits that each learner possesses is prior knowledge. This research starts with the prior knowledge generation process using the Latent Semantic Indexing (LSI) method. LSI is a technique using Singular Value Decomposition (SVD) to find meaning in a sentence. LSI works to generate the prior knowledge of each learner. After the prior knowledge is raised, then one can predict learning style using the artificial neural network (ANN) method. The results of this study are more accurate than the results of detection conducted with an external approach.

APA, Harvard, Vancouver, ISO, and other styles

21

Christy, A., Anto Praveena, and Jany Shabu. "A Hybrid Model for Topic Modeling Using Latent Dirichlet Allocation and Feature Selection Method." Journal of Computational and Theoretical Nanoscience 16, no. 8 (August 1, 2019): 3367–71. http://dx.doi.org/10.1166/jctn.2019.8234.

Full text

Abstract:

In this information age, Knowledge discovery and pattern matching plays a significant role. Topic Modeling, an area of Text mining is used detecting hidden patterns in a document collection. Topic Modeling and Document Clustering are two important key terms which are similar in concepts and functionality. In this paper, topic modeling is carried out using Latent Dirichlet Allocation-Brute Force Method (LDA-BF), Latent Dirichlet Allocation-Back Tracking (LDA-BT), Latent Semantic Indexing (LSI) method and Nonnegative Matrix Factorization (NMF) method. A hybrid model is proposed which uses Latent Dirichlet Allocation (LDA) for extracting feature terms and Feature Selection (FS) method for feature reduction. The efficiency of document clustering depends upon the selection of good features. Topic modeling is performed by enriching the good features obtained through feature selection method. The proposed hybrid model produces improved accuracy than K-Means clustering method.

APA, Harvard, Vancouver, ISO, and other styles

22

Tareq Jaber, Tareq Jaber. "Lexical Noise Analysis and Removal in Intelligent Search Engines." journal of king abdulaziz university computing and information technology sciences 1, no. 2 (January 10, 2012): 69–103. http://dx.doi.org/10.4197/comp.1-2.4.

Full text

Abstract:

In the field of intelligent information retrieval (IR), latent semantic indexing (LSI) is a popular technique used to retrieve information related more in meaning than in lexical matching. This technique overcomes the problems associated with synonymy and polysemy (common causes of inaccuracy in matching algorithms). A core component in the process is the use of the singular value decomposition (SVD) which acts as a mathematical model for the lexical noise in the term document matrix (TDM). This paper investigates various aspects of LSI from the viewpoint of noise modeling and removal in image processing. A discussion about and an investigation into, mathematical modeling for lexical noise in the TDM is presented. The work addresses a definition for noise in text processing and seeks to determine the best structure of the TDM. In other words, the structure of the TDM that would facilitate efficient searching within the LSI.

APA, Harvard, Vancouver, ISO, and other styles

23

Ayadi, Rami, Mohsen Maraoui, and Mounir Zrigui. "Latent Topic Model for Indexing Arabic Documents." International Journal of Information Retrieval Research 4, no. 1 (January 2014): 29–45. http://dx.doi.org/10.4018/ijirr.2014010102.

Full text

Abstract:

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe our approach for analyzing and preprocessing Arabic text then we describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then the authors compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

APA, Harvard, Vancouver, ISO, and other styles

24

Ayadi, Rami, Mohsen Maraoui, and Mounir Zrigui. "Latent Topic Model for Indexing Arabic Documents." International Journal of Information Retrieval Research 4, no. 2 (April 2014): 57–72. http://dx.doi.org/10.4018/ijirr.2014040104.

Full text

Abstract:

In this paper, the authors present latent topic model to index and represent the Arabic text documents reflecting more semantics. Text representation in a language with high inflectional morphology such as Arabic is not a trivial task and requires some special treatments. The authors describe their approach for analyzing and preprocessing Arabic text then they describe the stemming process. Finally, the latent model (LDA) is adapted to extract Arabic latent topics, the authors extracted significant topics of all texts, each theme is described by a particular distribution of descriptors then each text is represented on the vectors of these topics. The experiment of classification is conducted on in house corpus; latent topics are learned with LDA for different topic numbers K (25, 50, 75, and 100) then they compare this result with classification in the full words space. The results show that performances, in terms of precision, recall and f-measure, of classification in the reduced topics space outperform classification in full words space and when using LSI reduction.

APA, Harvard, Vancouver, ISO, and other styles

25

Zhang, Wen, Fan Xiao, Bin Li, and Siguang Zhang. "Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure." Computational Intelligence and Neuroscience 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/1096271.

Full text

Abstract:

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

APA, Harvard, Vancouver, ISO, and other styles

26

Amarendra Reddy, P., O. Ramesh, and . "Elite Sequence Mining of Big Data using Hadoop Mapreduce." International Journal of Engineering & Technology 7, no. 4.10 (October 2, 2018): 19. http://dx.doi.org/10.14419/ijet.v7i4.10.20696.

Full text

Abstract:

Text mining can deal with unstructured information. The proposed work extricates content from a PDF report is changed over to plain content configuration; at that point record is tokenized and serialized. Record grouping and classification is finished by discovering similarities between reports put away in cloud. Comparable archives are distinguished utilizing Singular Value Decomposition (SVD) strategy in Latent Semantic Indexing (LSI). At that point comparative records are assembled together as a group. A similar report is done between LFS (Local File System) and HDFS (HADOOP DISTRIBUTED FILE SYSTEM) as for rate and dimensionality. The System has been assessed on genuine records and the outcomes are classified.

APA, Harvard, Vancouver, ISO, and other styles

27

Sun, Xiaobing, Xiangyue Liu, Bin Li, Bixin Li, David Lo, and Lingzhi Liao. "Clustering Classes in Packages for Program Comprehension." Scientific Programming 2017 (2017): 1–15. http://dx.doi.org/10.1155/2017/3787053.

Full text

Abstract:

During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.

APA, Harvard, Vancouver, ISO, and other styles

28

Abayomi-Alli, Adebayo, Olusola Abayomi-Alli, Sanjay Misra, and Luis Fernandez-Sanz. "Study of the Yahoo-Yahoo Hash-Tag Tweets Using Sentiment Analysis and Opinion Mining Algorithms." Information 13, no. 3 (March 15, 2022): 152. http://dx.doi.org/10.3390/info13030152.

Full text

Abstract:

Mining opinion on social media microblogs presents opportunities to extract meaningful insight from the public from trending issues like the “yahoo-yahoo” which in Nigeria, is synonymous to cybercrime. In this study, content analysis of selected historical tweets from “yahoo-yahoo” hash-tag was conducted for sentiment and topic modelling. A corpus of 5500 tweets was obtained and pre-processed using a pre-trained tweet tokenizer while Valence Aware Dictionary for Sentiment Reasoning (VADER), Liu Hu method, Latent Dirichlet Allocation (LDA), Latent Semantic Indexing (LSI) and Multidimensional Scaling (MDS) graphs were used for sentiment analysis, topic modelling and topic visualization. Results showed the corpus had 173 unique tweet clusters, 5327 duplicates tweets and a frequency of 9555 for “yahoo”. Further validation using the mean sentiment scores of ten volunteers returned R and R2 of 0.8038 and 0.6402; 0.5994 and 0.3463; 0.5999 and 0.3586 for Human and VADER; Human and Liu Hu; Liu Hu and VADER sentiment scores, respectively. While VADER outperforms Liu Hu in sentiment analysis, LDA and LSI returned similar results in the topic modelling. The study confirms VADER’s performance on unstructured social media data containing non-English slangs, conjunctions, emoticons, etc. and proved that emojis are more representative of sentiments in tweets than the texts.

APA, Harvard, Vancouver, ISO, and other styles

29

Best, Michael L. "An Ecology of Text: Using Text Retrieval to Study Alife on the Net." Artificial Life 3, no. 4 (October 1997): 261–87. http://dx.doi.org/10.1162/artl.1997.3.4.261.

Full text

Abstract:

I introduce a new alife model, an ecology based on a corpus of text, and apply it to the analysis of posts to USENET News. In this corporal ecology posts are organisms, the newsgroups of NetNews define an environment, and human posters situated in their wider context make up a scarce resource. I apply latent semantic indexing (LSI), a text retrieval method based on principal component analysis, to distill from the corpus those replicating units of text. LSI arrives at suitable replicators because it discovers word co-occurrences that segregate and recombine with appreciable frequency. I argue that natural selection is necessarily in operation because sufficient conditions for its occurrence are met: replication, mutagenicity, and trait/fitness covariance. I describe a set of experiments performed on a static corpus of over 10,000 posts. In these experiments I study average population fitness, a fundamental element of population ecology. My study of fitness arrives at the tinhappy discovery that a flame-war, centered around an overly prolific poster, is the king of the jungle.

APA, Harvard, Vancouver, ISO, and other styles

30

Huang, Chun‐Che, and Chia‐Ming Kuo. "The transformation and search of semi‐structured knowledge in organizations." Journal of Knowledge Management 7, no. 4 (October 1, 2003): 106–23. http://dx.doi.org/10.1108/13673270310492985.

Full text

Abstract:

Knowledge is perceived as very important asset for organizations and knowledge management is critical for organization competitiveness. Because the nature of knowledge is always represented as complex and varied, it is difficult to extend effectiveness of knowledge re‐use in organizations. In this article, an approach based on the Zachman’s Framework to externalize organizational knowledge into semi‐structured knowledge is developed, and eXtensible Markup Language (XML) is applied to transform the knowledge into documents. In addition, latent semantic indexing (LSI), which is capable of solving problems of synonyms and antonyms, as well as improving accuracy of document searches, is incorporated to facilitate search of semi‐structured knowledge (SSK) documents based on user demands. The SSK approach shows great promise for organizations to acquire, store, disseminate, and reuse knowledge.

APA, Harvard, Vancouver, ISO, and other styles

31

Yudho Baskoro, Setyoko, Achmad Ridok, and Muhammad Tanzil Furqon. "PENCARIAN PASAL PADA KITAB UNDANG-UNDANG HUKUM PIDANA (KUHP) BERDASARKAN KASUS MENGGUNAKAN METODE COSINE SIMILARITY DAN LATENT SEMANTIC INDEXING (LSI)." Journal of Enviromental Engineering and Sustainable Technology 2, no. 2 (November 1, 2015): 83–88. http://dx.doi.org/10.21776/ub.jeest.2015.002.02.4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Garnsey, Margaret R. "Automatic Classification of Financial Accounting Concepts." Journal of Emerging Technologies in Accounting 3, no. 1 (January 1, 2006): 21–39. http://dx.doi.org/10.2308/jeta.2006.3.1.21.

Full text

Abstract:

Information and standards overload are part of the current business environment. In accounting, this is exacerbated due to the variety of users and the evolving nature of accounting language. This article describes a research project that determines the feasibility of using statistical methods to automatically group related accounting concepts together. Starting with the frequencies of words in documents and modifying them for local and global weighting, Latent Semantic Indexing (LSI) and agglomerative clustering were used to derive clusters of related accounting concepts. Resultant clusters were compared to terms generated randomly and terms identified by individuals to determine if related terms are identified. A recognition test was used to determine if providing individuals with lists of terms generated automatically allowed them to identify additional relevant terms. Results found that both clusters obtained from the weighted term-document matrix and clusters from a LSI matrix based on 50 dimensions contained significant numbers of related terms. There was no statistical difference in the number of related terms found by the methods. However, the LSI clusters contained terms that were of a lower frequency in the corpus. This finding may have significance in using cluster terms to assist in retrieval. When given a specific term and asked for related terms, providing individuals with a list of potential terms significantly increased the number of related terms they were able to identify when compared to their free-recall.

APA, Harvard, Vancouver, ISO, and other styles

33

Liao, Chia-Hung, Li-Xian Chen, Jhih-Cheng Yang, and Shyan-Ming Yuan. "A Photo Post Recommendation System Based on Topic Model for Improving Facebook Fan Page Engagement." Symmetry 12, no. 7 (July 2, 2020): 1105. http://dx.doi.org/10.3390/sym12071105.

Full text

Abstract:

Digital advertising on social media officially surpassed traditional advertising and became the largest marketing media in many countries. However, how to maximize the value of the overall marketing budget is one of the most concerning issues of all enterprises. The content of the Facebook photo post needs to be analyzed effectively so that the social media companies and managers can concentrate on handling their fan pages. This research aimed to use text mining techniques to find the audience accurately. Therefore, we built a topic model recommendation system (TMRS) to analyze Facebook posts by sorting the target posts according to the recommended scores. The TMRS includes six stages, such as data preprocessing, Chinese word segmentation, word refinement, TF-IDF word vector conversion, creating model via Latent Semantic Indexing (LSI), or Latent Dirichlet Allocation (LDA), and calculating the recommendation score. In addition to automatically selecting posts to create advertisements, this model is more effective in using marketing budgets and getting more engagements. Based on the recommendation results, it is verified that the TMRS can increase the engagement rate compared to the traditional engagement rate recommended method (ERRM). Ultimately, advertisers can have the chance to create ads for the post with potentially high engagements under a limited budget.

APA, Harvard, Vancouver, ISO, and other styles

34

Olaleye, Taiwo Olapeju. "Opinion Mining Analytics for Spotting Omicron Fear-Stimuli Using REPTree Classifier and Natural Language Processing." International Journal for Research in Applied Science and Engineering Technology 10, no. 1 (January 31, 2022): 995–1005. http://dx.doi.org/10.22214/ijraset.2022.39903.

Full text

Abstract:

Abstract: Data has indisputably proven overtime to have a better idea and with the surge of big data in the era of coronavirus, research initiatives in the field of data mining continues to leverage computational methodologies. Owing to the dreadful nature of the Omicron-variant, a fight or flight dilemma readily pervades college communities far reaching implications on work ethics of academic front-liners. This study therefore aim to gain insights from academia-sourced data to unravel fear-stimulus in college communities. The predictive analytics is carried out on college-based opinion poll. The Valence Aware Dictionary for Sentiment Reasoning algorithm is deployed for emotion analytics while the Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) are employed for topic modelling. The REPTree algorithm models the fear-spotting decision tree using 10-fold cross-validation. Experimental results shows a high performance metrics of 94.68% on Recall and Precision as the hand-washing attribute is returned as the most significant variable with highest information gain. Results of topic modelling likewise returns non-clinical precautionary measures as fear stimulus while the Vader sentiment analysis shows a 22.47%, 25.8%, and 51.73% positive, negative, and neutral polarity scores respectively, indicative of the academic front-liners’ pessimism towards effective safety measure compliance with non-clinical regulations. Keywords: COVID-19, Omicron, Sentiment Analysis, Topic Modelling.

APA, Harvard, Vancouver, ISO, and other styles

35

Wang, Bangchao, Rong Peng, Zhuo Wang, Xiaomin Wang, and Yuanbang Li. "An Automated Hybrid Approach for Generating Requirements Trace Links." International Journal of Software Engineering and Knowledge Engineering 30, no. 07 (July 2020): 1005–48. http://dx.doi.org/10.1142/s0218194020500278.

Full text

Abstract:

Trace links between requirements and software artifacts provide available traceability information and in-depth insights for different stakeholders. Unfortunately, establishing requirements trace links is a tedious, labor-intensive and fallible task. To alleviate this problem, Information Retrieval (IR) methods, such as Vector Space Model (VSM), Latent Semantic Indexing (LSI), and their variants, have been widely used to establish trace links automatically. But with the widespread use of agile development methodology, artifacts that can be used to generate automatic tracing links are getting shorter and shorter, which decreases the effects of traditional IR-based trace link generation methods. In this paper, Biterm Topic Model–Genetic Algorithm (BTM–GA), which is effective in managing short-text artifacts and configuring initial parameters, is introduced. A hybrid method VSM[Formula: see text]BTM–GA is proposed to generate requirements trace links. Empirical experiments conducted on five real and frequently-used datasets indicate that (1) the hybrid method VSM+BTM[Formula: see text]GA outperforms the others, and its results can achieve the “Good” level, where recall and precision are no less than 70% and 30%, respectively; (2) the performance of the hybrid method is stable and (3) BTM–GA can provide a number of “hard-to-find” trace links that complement the candidate trace links of VSM.

APA, Harvard, Vancouver, ISO, and other styles

36

Teh, Phoey Lee, Scott Piao, Mansour Almansour, Huey Fang Ong, and Abdul Ahad. "Analysis of Popular Social Media Topics Regarding Plastic Pollution." Sustainability 14, no. 3 (February 1, 2022): 1709. http://dx.doi.org/10.3390/su14031709.

Full text

Abstract:

Plastic pollution is one of the most significant environmental issues in the world. The rapid increase of the cumulative amount of plastic waste has caused alarm, and the public have called for actions to mitigate its impacts on the environment. Numerous governments and social activists from various non-profit organisations have set up policies and actively promoted awareness and have engaged the public in discussions on this issue. Nevertheless, social responsibility is the key to a sustainable environment, and individuals are accountable for performing their civic duty and commit to behavioural changes that can reduce the use of plastics. This paper explores a set of topic modelling techniques to assist policymakers and environment communities in understanding public opinions about the issues related to plastic pollution by analysing social media data. We report on an experiment in which a total of 274,404 tweets were collected from Twitter that are related to plastic pollution, and five topic modelling techniques, including (a) Latent Dirichlet Allocation (LDA), (b) Hierarchical Dirichlet Process (HDP), (c) Latent Semantic Indexing (LSI), (d) Non-Negative Matrix Factorisation (NMF), and (e) extension of LDA—Structural Topic Model (STM), were applied to the data to identify popular topics of online conversations, considering topic coherence, topic prevalence, and topic correlation. Our experimental results show that some of these topic modelling techniques are effective in detecting and identifying important topics surrounding plastic pollution, and potentially different techniques can be combined to develop an efficient system for mining important environment-related topics from social media data on a large scale.

APA, Harvard, Vancouver, ISO, and other styles

37

TERAMOTO, REIJI, and TSUYOSHI KATO. "TRANSFER LEARNING FOR CYTOCHROME P450 ISOZYME SELECTIVITY PREDICTION." Journal of Bioinformatics and Computational Biology 09, no. 04 (August 2011): 521–40. http://dx.doi.org/10.1142/s0219720011005434.

Full text

Abstract:

In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug–drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.

APA, Harvard, Vancouver, ISO, and other styles

38

Gupta, Aditi, and Rinkaj Goyal. "Identifying High-Level Concept Clones in Software Programs Using Method’s Descriptive Documentation." Symmetry 13, no. 3 (March 10, 2021): 447. http://dx.doi.org/10.3390/sym13030447.

Full text

Abstract:

Software clones are code fragments with similar or nearly similar functionality or structures. These clones are introduced in a project either accidentally or deliberately during software development or maintenance process. The presence of clones poses a significant threat to the maintenance of software systems and is on the top of the list of code smell types. Clones can be simple (fine-grained) or high-level (coarse-grained), depending on the chosen granularity of code for the clone detection. Simple clones are generally viewed at the lines/statements level, whereas high-level clones have granularity as a block, method, class, or file. High-level clones are said to be composed of multiple simple clones. This study aims to detect high-level conceptual code clones (having granularity as java methods) in java-based projects, which is extendable to the projects developed in other languages as well. Conceptual code clones are the ones implementing a similar higher-level abstraction such as an Abstract Data Type (ADT) list. Based on the assumption that “similar documentation implies similar methods”, the proposed mechanism uses “documentation” associated with methods to identify method-level concept clones. As complete documentation does not contribute to the method’s semantics, we extracted only the description part of the method’s documentation, which led to two benefits: increased efficiency and reduced text corpus size. Further, we used Latent Semantic Indexing (LSI) with different combinations of weight and similarity measures to identify similar descriptions in the text corpus. To show the efficacy of the proposed approach, we validated it using three java open source systems of sufficient length. The findings suggest that the proposed mechanism can detect methods implementing similar high-level concepts with improved recall values.

APA, Harvard, Vancouver, ISO, and other styles

39

Wang, Quan, Jun Xu, Hang Li, and Nick Craswell. "Regularized Latent Semantic Indexing." ACM Transactions on Information Systems 31, no. 1 (January 2013): 1–44. http://dx.doi.org/10.1145/2414782.2414787.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Hofmann, Thomas. "Probabilistic Latent Semantic Indexing." ACM SIGIR Forum 51, no. 2 (August 2, 2017): 211–18. http://dx.doi.org/10.1145/3130348.3130370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. "Indexing by latent semantic analysis." Journal of the American Society for Information Science 41, no. 6 (September 1990): 391–407. http://dx.doi.org/10.1002/(sici)1097-4571(199009)41:6<391::aid-asi1>3.0.co;2-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Papadimitriou, Christos H., Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. "Latent Semantic Indexing: A Probabilistic Analysis." Journal of Computer and System Sciences 61, no. 2 (October 2000): 217–35. http://dx.doi.org/10.1006/jcss.2000.1711.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Thorleuchter, Dirk, and Dirk Van den Poel. "Technology classification with latent semantic indexing." Expert Systems with Applications 40, no. 5 (April 2013): 1786–95. http://dx.doi.org/10.1016/j.eswa.2012.09.023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Dong, Le, Ning Feng, and Qianni Zhang. "LSI: Latent semantic inference for natural image segmentation." Pattern Recognition 59 (November 2016): 282–91. http://dx.doi.org/10.1016/j.patcog.2016.03.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

SivaKumar, A. P., P. Premchand, and A. Govardhan. "Indian Languages IR using Latent Semantic Indexing." International Journal of Computer Science and Information Technology 3, no. 4 (August 30, 2011): 245–53. http://dx.doi.org/10.5121/ijcsit.2011.3419.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Rahim, Robbi, Nuning Kurniasih, Muhammad Dedi Irawan, Yustria Handika Siregar, Abdurrozzaq Hasibuan, Deffi Ayu Puspito Sari, Tiarma Simanihuruk, et al. "Latent Semantic Indexing for Indonesian Text Similarity." International Journal of Engineering & Technology 7, no. 2.3 (March 8, 2018): 73. http://dx.doi.org/10.14419/ijet.v7i2.3.12619.

Full text

Abstract:

Document is a written letter that can be used as evidence of information. Plagiarism is a deliberate or unintentional act of obtaining or attempting to obtain credit or value for a scientific work, citing some or all of the scientific work of another party acknowledged as a scientific work without stating the source properly and adequately. Latent Semantic Indexing method serves to find text that has the same text against from a document. The algorithm used is TF/IDF Algorithm that is the result of multiplication of TF value with IDF for a term in document while Vector Space Model (VSM) is method to see the level of closeness or similarity of word by way of weighting term.

APA, Harvard, Vancouver, ISO, and other styles

47

Vecharynski, Eugene, and Yousef Saad. "Fast Updating Algorithms for Latent Semantic Indexing." SIAM Journal on Matrix Analysis and Applications 35, no. 3 (January 2014): 1105–31. http://dx.doi.org/10.1137/130940414.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Gao, Jing, and Jun Zhang. "Clustered SVD strategies in latent semantic indexing." Information Processing & Management 41, no. 5 (September 2005): 1051–63. http://dx.doi.org/10.1016/j.ipm.2004.10.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Foltz, P. W. "Using latent semantic indexing for information filtering." ACM SIGOIS Bulletin 11, no. 2-3 (April 1990): 40–47. http://dx.doi.org/10.1145/91478.91486.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Zha, Hongyuan, and Horst D. Simon. "On Updating Problems in Latent Semantic Indexing." SIAM Journal on Scientific Computing 21, no. 2 (January 1999): 782–91. http://dx.doi.org/10.1137/s1064827597329266.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!