Journal articles: 'Text indexing'

1

Zbigniew, Kaleta. "Semantic Text Indexing." Computer Science 15, no. 1 (2014): 19. http://dx.doi.org/10.7494/csci.2014.15.1.19.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Ferragina, Paolo, and Giovanni Manzini. "Indexing compressed text." Journal of the ACM 52, no. 4 (July 2005): 552–81. http://dx.doi.org/10.1145/1082036.1082039.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Navarro, Gonzalo, and Nicola Prezza. "Universal compressed text indexing." Theoretical Computer Science 762 (March 2019): 41–50. http://dx.doi.org/10.1016/j.tcs.2018.09.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Amir, Amihood, Gad M. Landau, and Esko Ukkonen. "Online timestamped text indexing." Information Processing Letters 82, no. 5 (June 2002): 253–59. http://dx.doi.org/10.1016/s0020-0190(01)00275-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Jones, Kevin. "Text management and indexing." Learned Publishing 5, no. 3 (January 1, 1992): 168–69. http://dx.doi.org/10.1002/leap/50055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Maaß, Moritz G., and Johannes Nowak. "Text indexing with errors." Journal of Discrete Algorithms 5, no. 4 (December 2007): 662–81. http://dx.doi.org/10.1016/j.jda.2006.11.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Ferragina, Paolo, and Roberto Grossi. "Improved Dynamic Text Indexing." Journal of Algorithms 31, no. 2 (May 1999): 291–319. http://dx.doi.org/10.1006/jagm.1998.0999.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Bernardini, Giulia, Huiping Chen, Gabriele Fici, Grigorios Loukides, and Solon P. Pissis. "Reverse-Safe Text Indexing." ACM Journal of Experimental Algorithmics 26 (July 8, 2021): 1–26. http://dx.doi.org/10.1145/3461698.

Full text

Abstract:

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z -RSDS. The construction algorithm takes O(nɷ log d) time, where ɷ is the matrix multiplication exponent. We show that, despite the nɷ factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z -RSDS for decision pattern matching queries, whose size can be sublinear in n . A preliminary version of this article appeared in ALENEX 2020.

APA, Harvard, Vancouver, ISO, and other styles

9

Golub, Koraljka. "Automatic Subject Indexing of Text." KNOWLEDGE ORGANIZATION 46, no. 2 (2019): 104–21. http://dx.doi.org/10.5771/0943-7444-2019-2-104.

Full text

Abstract:

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collections, and enhance consistency of the metadata. In this work, automatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are discussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: “text categorization,” “document clustering,” and “document classification.” Text categorization is perhaps the most widespread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for subject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equivalent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.

APA, Harvard, Vancouver, ISO, and other styles

10

Belazzougui, Djamal, and Gonzalo Navarro. "Alphabet-Independent Compressed Text Indexing." ACM Transactions on Algorithms 10, no. 4 (August 2014): 1–19. http://dx.doi.org/10.1145/2635816.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Hon, Wing-Kai, Tsung-Han Ku, Rahul Shah, Sharma V. Thankachan, and Jeffrey Scott Vitter. "Compressed text indexing with wildcards." Journal of Discrete Algorithms 19 (March 2013): 23–29. http://dx.doi.org/10.1016/j.jda.2012.12.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Grist, Deirdre. "Indexing legislative text: Alberta Hansard." Indexer: The International Journal of Indexing: Volume 23, Issue 3 23, no. 3 (April 1, 2003): 138–39. http://dx.doi.org/10.3828/indexer.2003.23.3.7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (1998): 483–84. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<483::aid-asi13>3.0.co;2-a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (1998): 484. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<484::aid-asi14>3.0.co;2-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (April 15, 1998): 484. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<484::aid-asi14>3.3.co;2-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Rocher, Tatiana, Mathieu Giraud, and Mikaël Salson. "Indexing labeled sequences." PeerJ Computer Science 4 (March 26, 2018): e148. http://dx.doi.org/10.7717/peerj-cs.148.

Full text

Abstract:

Background Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. Methods We present two indexes for a text with non-overlapping labels. They store the text in a Burrows–Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TLBW-index). Both indexes need a space related to the entropy of the labeled text. Results These indexes allow efficient text–label queries to count and find labeled patterns. The TLBW-index has an overhead on simple label queries but is very efficient on combined pattern–label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. Discussion New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.

APA, Harvard, Vancouver, ISO, and other styles

17

Smiraglia, Richard P. "Keywords, Indexing, Text Analysis: An Editorial." KNOWLEDGE ORGANIZATION 40, no. 3 (2013): 155–59. http://dx.doi.org/10.5771/0943-7444-2013-3-155.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Salminen, Airi, Jean Tague-Sutcliffe, and Charles McClellan. "From text to hypertext by indexing." ACM Transactions on Information Systems 13, no. 1 (January 2, 1995): 69–99. http://dx.doi.org/10.1145/195705.195717.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Gey, Fredric, and Daniel P. Dabney. "Full-text against intellectual indexing controversy." Journal of the American Society for Information Science 41, no. 8 (December 1990): 613–14. http://dx.doi.org/10.1002/(sici)1097-4571(199012)41:8<613::aid-asi9>3.0.co;2-d.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Bille, Philip, Johannes Fischer, Inge Li Gørtz, Tsvi Kopelowitz, Benjamin Sach, and Hjalte Wedel Vildhøj. "Sparse Text Indexing in Small Space." ACM Transactions on Algorithms 12, no. 3 (June 15, 2016): 1–19. http://dx.doi.org/10.1145/2836166.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

ZHANG, MENG, LIANG HU, and YI ZHANG. "WEIGHTED AUTOMATA FOR FULL-TEXT INDEXING." International Journal of Foundations of Computer Science 22, no. 04 (June 2011): 921–43. http://dx.doi.org/10.1142/s0129054111008490.

Full text

Abstract:

Full-text index structures are widely used in string matching and bioinformatics. These structures such as DAWGs and suffix trees allow fast searches on texts. In this paper, we present a new partition of the factors of a word, called a consistent minimal linear partition. Based on this partition, we introduce the weighted directed word graph (WDWG), a space-economical full-text index. WDWGs are basically cyclic, which means that they may accept infinite strings. But by assigning weights to edges, the acceptable strings are limited only to the factors of the input string. For a given word w, any factor of w can be indexed by a state of the WDWG and its length. A WDWG of w has at most |w| states and 2|w| - 1 transition edges. We present an on-line algorithm to construct a WDWG for a given word in time linear in the length of the word. Our experiment shows the size of WDWGs is smaller than that of DAWGs for many data sets including DNA sequences, Chinese texts and English texts.

APA, Harvard, Vancouver, ISO, and other styles

22

Rao .S, Venkata. "Correlation Preserving Indexing Based Text Clustering." IOSR Journal of Computer Engineering 13, no. 1 (2013): 27–30. http://dx.doi.org/10.9790/0661-1312730.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Navarro, Gonzalo, Erkki Sutinen, and Jorma Tarhio. "Indexing text with approximate q-grams." Journal of Discrete Algorithms 3, no. 2-4 (June 2005): 157–75. http://dx.doi.org/10.1016/j.jda.2004.08.003.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ferragina, Paolo. "Dynamic Text Indexing under String Updates." Journal of Algorithms 22, no. 2 (February 1997): 296–328. http://dx.doi.org/10.1006/jagm.1996.0814.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Gibney, Daniel, and Sharma V. Thankachan. "Text Indexing for Regular Expression Matching." Algorithms 14, no. 5 (April 23, 2021): 133. http://dx.doi.org/10.3390/a14050133.

Full text

Abstract:

Finding substrings of a text T that match a regular expression p is a fundamental problem. Despite being the subject of extensive research, no solution with a time complexity significantly better than O(|T||p|) has been found. Backurs and Indyk in FOCS 2016 established conditional lower bounds for the algorithmic problem based on the Strong Exponential Time Hypothesis that helps explain this difficulty. A natural question is whether we can improve the time complexity for matching the regular expression by preprocessing the text T? We show that conditioned on the Online Matrix–Vector Multiplication (OMv) conjecture, even with arbitrary polynomial preprocessing time, a regular expression query on a text cannot be answered in strongly sublinear time, i.e., O(|T|1−ε) for any ε>0. Furthermore, if we extend the OMv conjecture to a plausible conjecture regarding Boolean matrix multiplication with polynomial preprocessing time, which we call Online Matrix–Matrix Multiplication (OMM), we can strengthen this hardness result to there being no solution with a query time that is O(|T|3/2−ε). These results hold for alphabet sizes three or greater. We then provide data structures that answer queries in O(|T||p|τ) time where τ∈[1,|T|] is fixed at construction. These include a solution that works for all regular expressions with Expτ·|T| preprocessing time and space. For patterns containing only ‘concatenation’ and ‘or’ operators (the same type used in the hardness result), we provide (1) a deterministic solution which requires Expτ·|T|log2|T| preprocessing time and space, and (2) when |p|≤|T|z for z=2o(log|T|), a randomized solution with amortized query time which answers queries correctly with high probability, requiring Expτ·|T|2Ωlog|T| preprocessing time and space.

APA, Harvard, Vancouver, ISO, and other styles

26

Lienhart, Rainer, and Wolfgang Effelsberg. "Automatic text segmentation and text recognition for video indexing." Multimedia Systems 8, no. 1 (January 1, 2000): 69–81. http://dx.doi.org/10.1007/s005300050006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

ŽĎÁREK, JAN, and BOŘIVOJ MELICHAR. "TREE-BASED 2D INDEXING." International Journal of Foundations of Computer Science 22, no. 08 (December 2011): 1893–907. http://dx.doi.org/10.1142/s0129054111009100.

Full text

Abstract:

A new approach to the 2D pattern matching and specifically to 2D text indexing is proposed. A transformation of a 2D text into the form of a tree is presented. It preserves the context of each element of the 2D text. The tree can be linearised using the prefix notation into the form of a string (a linear text) and the pattern matching is performed in this text. Pushdown automata indexing the 2D text are constructed over the tree representation. They allow to search for 2D prefixes, 2D suffixes, and 2D factors of the 2D text in time proportional to the size of the representation of a 2D pattern. This result achieves the properties analogous to the results obtained in tree pattern matching and string indexing.

APA, Harvard, Vancouver, ISO, and other styles

28

Ul Hassan, Mohamed Manzoor. "A Robust Multi-Keyword Text Content Retrieval by Utilizing Hash Indexing." International Journal of Innovative Research in Computer Science & Technology 9, no. 2 (March 2021): 1–5. http://dx.doi.org/10.21276/ijircst.2021.9.2.1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Chatterjee, Niladri, and Pramod Kumar Sahoo. "Random Indexing and Modified Random Indexing based approach for extractive text summarization." Computer Speech & Language 29, no. 1 (January 2015): 32–44. http://dx.doi.org/10.1016/j.csl.2014.07.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Sutcliffe, Glyn. "The indexing of biography as a special genre or as historically documented text." Indexer: The International Journal of Indexing: Volume 39, Issue 2 39, no. 2 (June 1, 2021): 151–63. http://dx.doi.org/10.3828/indexer.2021.16.

Full text

Abstract:

The indexing of biography as a genre, per se, is reconsidered with respect to subject indexing in general and the histories of lives in particular. The index of a biography of the chess player Bobby Fischer is compared with the index of a history of chess at the height of the Cold War conflict in which Bobby Fischer was the central protagonist. Some received theory of the indexing of biographies is critiqued and challenged by practical comparisons. Indexing from a literary perspective is considered and contrasted with back-of-book indexing from an information retrieval standpoint.

APA, Harvard, Vancouver, ISO, and other styles

31

Moreo Fernández, Alejandro, Andrea Esuli, and Fabrizio Sebastiani. "Lightweight Random Indexing for Polylingual Text Classification." Journal of Artificial Intelligence Research 57 (October 13, 2016): 151–85. http://dx.doi.org/10.1613/jair.5194.

Full text

Abstract:

Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| − 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.

APA, Harvard, Vancouver, ISO, and other styles

32

Boubekeur, Fatiha, and Wassila Azzoug. "Concept-Based Indexing in Text Information Retrieval." International Journal of Computer Science and Information Technology 5, no. 1 (February 28, 2013): 119–36. http://dx.doi.org/10.5121/ijcsit.2013.5110.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Navarro, Gonzalo. "Indexing text using the Ziv–Lempel trie." Journal of Discrete Algorithms 2, no. 1 (March 2004): 87–114. http://dx.doi.org/10.1016/s1570-8667(03)00066-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Byers, David. "Full-text indexing of non-textual resources." Computer Networks and ISDN Systems 30, no. 1-7 (April 1998): 141–48. http://dx.doi.org/10.1016/s0169-7552(98)00059-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Rachidi, Youssef. "Text Detection in Video for Video Indexing." International Journal of Computer Trends and Technology 68, no. 4 (April 25, 2020): 96–99. http://dx.doi.org/10.14445/22312803/ijctt-v68i4p117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Woodruff, Allison Gyle, and Christian Plaunt. "GIPSY: Automated geographic indexing of text documents." Journal of the American Society for Information Science 45, no. 9 (October 1994): 645–55. http://dx.doi.org/10.1002/(sici)1097-4571(199410)45:9<645::aid-asi2>3.0.co;2-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Weinberg, Bella Hass. "Challenges in indexing electronic text and images." Journal of the American Society for Information Science 45, no. 9 (October 1994): 718–23. http://dx.doi.org/10.1002/(sici)1097-4571(199410)45:9<718::aid-asi9>3.0.co;2-f.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Rahim, Robbi, Nuning Kurniasih, Muhammad Dedi Irawan, Yustria Handika Siregar, Abdurrozzaq Hasibuan, Deffi Ayu Puspito Sari, Tiarma Simanihuruk, et al. "Latent Semantic Indexing for Indonesian Text Similarity." International Journal of Engineering & Technology 7, no. 2.3 (March 8, 2018): 73. http://dx.doi.org/10.14419/ijet.v7i2.3.12619.

Full text

Abstract:

Document is a written letter that can be used as evidence of information. Plagiarism is a deliberate or unintentional act of obtaining or attempting to obtain credit or value for a scientific work, citing some or all of the scientific work of another party acknowledged as a scientific work without stating the source properly and adequately. Latent Semantic Indexing method serves to find text that has the same text against from a document. The algorithm used is TF/IDF Algorithm that is the result of multiplication of TF value with IDF for a term in document while Vector Space Model (VSM) is method to see the level of closeness or similarity of word by way of weighting term.

APA, Harvard, Vancouver, ISO, and other styles

39

Villarroel, Miguel, Pablo de la Fuente, Alberto Pedrero, Jesús Vegas, and Joaquín Adiego. "Obtaining feedback for indexing from highlighted text." Electronic Library 20, no. 4 (August 2002): 306–13. http://dx.doi.org/10.1108/02640470210438919.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Svenonius, Elaine. "Challenges in Indexing Electronic Text and Images." Information Processing & Management 31, no. 2 (March 1995): 259–60. http://dx.doi.org/10.1016/0306-4573(95)80048-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Arroyuelo, Diego, Gonzalo Navarro, and Kunihiko Sadakane. "Stronger Lempel-Ziv Based Compressed Text Indexing." Algorithmica 62, no. 1-2 (September 8, 2010): 54–101. http://dx.doi.org/10.1007/s00453-010-9443-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Mansour, Nashat, Ramzi A. Haraty, Walid Daher, and Manal Houri. "An auto-indexing method for Arabic text." Information Processing & Management 44, no. 4 (July 2008): 1538–45. http://dx.doi.org/10.1016/j.ipm.2007.12.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Esler, Sandra L., and Michael L. Nelson. "NASA indexing benchmarks: evaluating text search engines." Journal of Network and Computer Applications 20, no. 4 (October 1997): 339–53. http://dx.doi.org/10.1006/jnca.1997.0049.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Dai, Suyang, Ronghui You, Zhiyong Lu, Xiaodi Huang, Hiroshi Mamitsuka, and Shanfeng Zhu. "FullMeSH: improving large-scale MeSH indexing with full text." Bioinformatics 36, no. 5 (October 9, 2019): 1533–41. http://dx.doi.org/10.1093/bioinformatics/btz756.

Full text

Abstract:

Abstract Motivation With the rapidly growing biomedical literature, automatically indexing biomedical articles by Medical Subject Heading (MeSH), namely MeSH indexing, has become increasingly important for facilitating hypothesis generation and knowledge discovery. Over the past years, many large-scale MeSH indexing approaches have been proposed, such as Medical Text Indexer, MeSHLabeler, DeepMeSH and MeSHProbeNet. However, the performance of these methods is hampered by using limited information, i.e. only the title and abstract of biomedical articles. Results We propose FullMeSH, a large-scale MeSH indexing method taking advantage of the recent increase in the availability of full text articles. Compared to DeepMeSH and other state-of-the-art methods, FullMeSH has three novelties: (i) Instead of using a full text as a whole, FullMeSH segments it into several sections with their normalized titles in order to distinguish their contributions to the overall performance. (ii) FullMeSH integrates the evidence from different sections in a ‘learning to rank’ framework by combining the sparse and deep semantic representations. (iii) FullMeSH trains an Attention-based Convolutional Neural Network for each section, which achieves better performance on infrequent MeSH headings. FullMeSH has been developed and empirically trained on the entire set of 1.4 million full-text articles in the PubMed Central Open Access subset. It achieved a Micro F-measure of 66.76% on a test set of 10 000 articles, which was 3.3% and 6.4% higher than DeepMeSH and MeSHLabeler, respectively. Furthermore, FullMeSH demonstrated an average improvement of 4.7% over DeepMeSH for indexing Check Tags, a set of most frequently indexed MeSH headings. Availability and implementation The software is available upon request. Supplementary information Supplementary data are available at Bioinformatics online.

APA, Harvard, Vancouver, ISO, and other styles

45

Amir, Amihood, Ayelet Butman, and Ely Porat. "On the relationship between histogram indexing and block-mass indexing." Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 372, no. 2016 (May 28, 2014): 20130132. http://dx.doi.org/10.1098/rsta.2013.0132.

Full text

Abstract:

Histogram indexing , also known as jumbled pattern indexing and permutation indexing is one of the important current open problems in pattern matching. It was introduced about 6 years ago and has seen active research since. Yet, to date there is no algorithm that can preprocess a text T in time o (| T | 2 /polylog| T |) and achieve histogram indexing, even over a binary alphabet, in time independent of the text length. The pattern matching version of this problem has a simple linear-time solution. Block-mass pattern matching problem is a recently introduced problem, motivated by issues in mass-spectrometry. It is also an example of a pattern matching problem that has an efficient, almost linear-time solution but whose indexing version is daunting. However, for fixed finite alphabets, there has been progress made. In this paper, a strong connection between the histogram indexing problem and the block-mass pattern indexing problem is shown. The reduction we show between the two problems is amazingly simple. Its value lies in recognizing the connection between these two apparently disparate problems, rather than the complexity of the reduction. In addition, we show that for both these problems, even over unbounded alphabets, there are algorithms that preprocess a text T in time o (| T | 2 /polylog| T |) and enable answering indexing queries in time polynomial in the query length. The contributions of this paper are twofold: (i) we introduce the idea of allowing a trade-off between the preprocessing time and query time of various indexing problems that have been stumbling blocks in the literature. (ii) We take the first step in introducing a class of indexing problems that, we believe, cannot be pre-processed in time o (| T | 2 /polylog| T |) and enable linear-time query processing.

APA, Harvard, Vancouver, ISO, and other styles

46

Farrow, John. "All in the mind: concept analysis in indexing." Indexer: The International Journal of Indexing: Volume 19, Issue 4 19, no. 4 (October 1, 1995): 243–47. http://dx.doi.org/10.3828/indexer.1995.19.4.2.

Full text

Abstract:

The indexing process consists of the comprehension of the document to be indexed, followed by the production of a set of index terms. Differences between academic indexing and back-of-book indexing are discussed. Text comprehension is a branch of human information processing, and it is argued that the model of text comprehension and production developed by van Dijk and Kintsch can form the basis for a cognitive process model of indexing. Strategies for testing such a model are suggested.

APA, Harvard, Vancouver, ISO, and other styles

47

Ding, Yi, and Xian Fu. "Topical Concept Based Text Clustering Method." Advanced Materials Research 532-533 (June 2012): 939–43. http://dx.doi.org/10.4028/www.scientific.net/amr.532-533.939.

Full text

Abstract:

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. . To solve these problems, based on topic concept clustering, this paper proposes a method for Chinese document clustering. In this paper, we introduce a novel topical document clustering method called Document Features Indexing Clustering (DFIC), which can identify topics accurately and cluster documents according to these topics. In DFIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document features are investigated and exploited. Experimental results show that DFIC can gain a higher precision (92.76%) than some widely used traditional clustering methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Gupta, Shweta, Sunita Yadav, and Rajesh Prasad. "Document Retrieval using Efficient Indexing Techniques." International Journal of Business Analytics 3, no. 4 (October 2016): 64–82. http://dx.doi.org/10.4018/ijban.2016100104.

Full text

Abstract:

Document retrieval plays a crucial role in retrieving relevant documents. Relevancy depends upon the occurrences of query keywords in a document. Several documents include a similar key terms and hence they need to be indexed. Most of the indexing techniques are either based on inverted index or full-text index. Inverted index create lists and support word-based pattern queries. While full-text index handle queries comprise of any sequence of characters rather than just words. Problems arise when text cannot be separated as words in some western languages. Also, there are difficulties in space used by compressed versions of full-text indexes. Recently, one of the unique data structure called wavelet tree has been popular in the text compression and indexing. It indexes words or characters of the text documents and help in retrieving top ranked documents more efficiently. This paper presents a review on most recent efficient indexing techniques used in document retrieval.

APA, Harvard, Vancouver, ISO, and other styles

49

Harish, B. S. "Text Document Classification: An Approach Based on Indexing." International Journal of Data Mining & Knowledge Management Process 2, no. 1 (January 31, 2012): 43–62. http://dx.doi.org/10.5121/ijdkp.2012.2104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Rose, Leonard. "Index Maker: Automatic indexing from word‐processor text." Electronic Library 5, no. 3 (March 1987): 140. http://dx.doi.org/10.1108/eb044744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Text indexing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles