Academic literature on the topic 'Text indexing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Text indexing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Text indexing"

1

Zbigniew, Kaleta. "Semantic Text Indexing." Computer Science 15, no. 1 (2014): 19. http://dx.doi.org/10.7494/csci.2014.15.1.19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ferragina, Paolo, and Giovanni Manzini. "Indexing compressed text." Journal of the ACM 52, no. 4 (July 2005): 552–81. http://dx.doi.org/10.1145/1082036.1082039.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Navarro, Gonzalo, and Nicola Prezza. "Universal compressed text indexing." Theoretical Computer Science 762 (March 2019): 41–50. http://dx.doi.org/10.1016/j.tcs.2018.09.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Amir, Amihood, Gad M. Landau, and Esko Ukkonen. "Online timestamped text indexing." Information Processing Letters 82, no. 5 (June 2002): 253–59. http://dx.doi.org/10.1016/s0020-0190(01)00275-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jones, Kevin. "Text management and indexing." Learned Publishing 5, no. 3 (January 1, 1992): 168–69. http://dx.doi.org/10.1002/leap/50055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Maaß, Moritz G., and Johannes Nowak. "Text indexing with errors." Journal of Discrete Algorithms 5, no. 4 (December 2007): 662–81. http://dx.doi.org/10.1016/j.jda.2006.11.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ferragina, Paolo, and Roberto Grossi. "Improved Dynamic Text Indexing." Journal of Algorithms 31, no. 2 (May 1999): 291–319. http://dx.doi.org/10.1006/jagm.1998.0999.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bernardini, Giulia, Huiping Chen, Gabriele Fici, Grigorios Loukides, and Solon P. Pissis. "Reverse-Safe Text Indexing." ACM Journal of Experimental Algorithmics 26 (July 8, 2021): 1–26. http://dx.doi.org/10.1145/3461698.

Full text
Abstract:
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z - reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D . The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z , we propose an algorithm that constructs a z -reverse-safe data structure ( z -RSDS) that has size O(n) and answers decision and counting pattern matching queries of length at most d optimally, where d is maximal for any such z -RSDS. The construction algorithm takes O(nɷ log d) time, where ɷ is the matrix multiplication exponent. We show that, despite the nɷ factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We also show that plugging our method in data analysis applications gives insignificant or no data utility loss. Furthermore, we show how our technique can be extended to support applications under realistic adversary models. Finally, we show a z -RSDS for decision pattern matching queries, whose size can be sublinear in n . A preliminary version of this article appeared in ALENEX 2020.
APA, Harvard, Vancouver, ISO, and other styles
9

Golub, Koraljka. "Automatic Subject Indexing of Text." KNOWLEDGE ORGANIZATION 46, no. 2 (2019): 104–21. http://dx.doi.org/10.5771/0943-7444-2019-2-104.

Full text
Abstract:
Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collections, and enhance consistency of the metadata. In this work, automatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are discussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: “text categorization,” “document clustering,” and “document classification.” Text categorization is perhaps the most widespread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for subject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equivalent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
APA, Harvard, Vancouver, ISO, and other styles
10

Belazzougui, Djamal, and Gonzalo Navarro. "Alphabet-Independent Compressed Text Indexing." ACM Transactions on Algorithms 10, no. 4 (August 2014): 1–19. http://dx.doi.org/10.1145/2635816.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Text indexing"

1

He, Meng. "Indexing Compressed Text." Thesis, University of Waterloo, 2003. http://hdl.handle.net/10012/1143.

Full text
Abstract:
As a result of the rapid growth of the volume of electronic data, text compression and indexing techniques are receiving more and more attention. These two issues are usually treated as independent problems, but approaches of combining them have recently attracted the attention of researchers. In this thesis, we review and test some of the more effective and some of the more theoretically interesting techniques. Various compression and indexing techniques are presented, and we also present two compressed text indices. Based on these techniques, we implement an compressed full-text index, so that compressed texts can be indexed to support fast queries without decompressing the whole texts. The experiments show that our index is compact and supports fast search.
APA, Harvard, Vancouver, ISO, and other styles
2

Sani, Sadiq. "Role of semantic indexing for text classification." Thesis, Robert Gordon University, 2014. http://hdl.handle.net/10059/1133.

Full text
Abstract:
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, failure to take into account the semantic relatedness between terms means that document similarity is not properly captured in the VSM. To address this problem, semantic indexing approaches have been proposed for modelling the semantic relatedness between terms in document representations. Accordingly, in this thesis, we empirically review the impact of semantic indexing on text classification. This empirical review allows us to answer one important question: how beneficial is semantic indexing to text classification performance. We also carry out a detailed analysis of the semantic indexing process which allows us to identify reasons why semantic indexing may lead to poor text classification performance. Based on our findings, we propose a semantic indexing framework called Relevance Weighted Semantic Indexing (RWSI) that addresses the limitations identified in our analysis. RWSI uses relevance weights of terms to improve the semantic indexing of documents. A second problem with the VSM is the lack of supervision in the process of creating document representations. This arises from the fact that the VSM was originally designed for unsupervised document retrieval. An important feature of effective document representations is the ability to discriminate between relevant and non-relevant documents. For text classification, relevance information is explicitly available in the form of document class labels. Thus, more effective document vectors can be derived in a supervised manner by taking advantage of available class knowledge. Accordingly, we investigate approaches for utilising class knowledge for supervised indexing of documents. Firstly, we demonstrate how the RWSI framework can be utilised for assigning supervised weights to terms for supervised document indexing. Secondly, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. A further limitation of the standard VSM is that an indexing vocabulary that consists only of terms from the document collection is used for document representation. This is based on the assumption that terms alone are sufficient to model the meaning of text documents. However for certain classification tasks, terms are insufficient to adequately model the semantics needed for accurate document classification. A solution is to index documents using semantically rich concepts. Accordingly, we present an event extraction framework called Rule-Based Event Extractor (RUBEE) for identifying and utilising event information for concept-based indexing of incident reports. We also demonstrate how certain attributes of these events e.g. negation, can be taken into consideration to distinguish between documents that describe the occurrence of an event, and those that mention the non-occurrence of that event.
APA, Harvard, Vancouver, ISO, and other styles
3

Bowden, Paul Richard. "Automated knowledge extraction from text." Thesis, Nottingham Trent University, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.298900.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mick, Alan A. "Knowledge based text indexing and retrieval utilizing case based reasoning /." Online version of thesis, 1994. http://hdl.handle.net/1850/11715.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lester, Nicholas, and nml@cs rmit edu au. "Efficient Index Maintenance for Text Databases." RMIT University. Computer Science and Information Technology, 2006. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20070214.154933.

Full text
Abstract:
All practical text search systems use inverted indexes to quickly resolve user queries. Offline index construction algorithms, where queries are not accepted during construction, have been the subject of much prior research. As a result, current techniques can invert virtually unlimited amounts of text in limited main memory, making efficient use of both time and disk space. However, these algorithms assume that the collection does not change during the use of the index. This thesis examines the task of index maintenance, the problem of adapting an inverted index to reflect changes in the collection it describes. Existing approaches to index maintenance are discussed, including proposed optimisations. We present analysis and empirical evidence suggesting that existing maintenance algorithms either scale poorly to large collections, or significantly degrade query resolution speed. In addition, we propose a new strategy for index maintenance that trades a strictly controlled amount of querying efficiency for greatly increased maintenance speed and scalability. Analysis and empirical results are presented that show that this new algorithm is a useful trade-off between indexing and querying efficiency. In scenarios described in Chapter 7, the use of the new maintenance algorithm reduces the time required to construct an index to under one sixth of the time taken by algorithms that maintain contiguous inverted lists. In addition to work on index maintenance, we present a new technique for accumulator pruning during ranked query evaluation, as well as providing evidence that existing approaches are unsatisfactory for collections of large size. Accumulator pruning is a key problem in both querying efficiency and overall text search system efficiency. Existing approaches either fail to bound the memory footprint required for query evaluation, or suffer loss of retrieval accuracy. In contrast, the new pruning algorithm can be used to limit the memory footprint of ranked query evaluation, and in our experiments gives retrieval accuracy not worse than previous alternatives. The results presented in this thesis are validated with robust experiments, which utilise collections of significant size, containing real data, and tested using appropriate numbers of real queries. The techniques presented in this thesis allow information retrieval applications to efficiently index and search changing collections, a task that has been historically problematic.
APA, Harvard, Vancouver, ISO, and other styles
6

Chung, EunKyung. "A Framework of Automatic Subject Term Assignment: An Indexing Conception-Based Approach." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5473/.

Full text
Abstract:
The purpose of dissertation is to examine whether the understandings of subject indexing processes conducted by human indexers have a positive impact on the effectiveness of automatic subject term assignment through text categorization (TC). More specifically, human indexers' subject indexing approaches or conceptions in conjunction with semantic sources were explored in the context of a typical scientific journal article data set. Based on the premise that subject indexing approaches or conceptions with semantic sources are important for automatic subject term assignment through TC, this study proposed an indexing conception-based framework. For the purpose of this study, three hypotheses were tested: 1) the effectiveness of semantic sources, 2) the effectiveness of an indexing conception-based framework, and 3) the effectiveness of each of three indexing conception-based approaches (the content-oriented, the document-oriented, and the domain-oriented approaches). The experiments were conducted using a support vector machine implementation in WEKA (Witten, & Frank, 2000). The experiment results pointed out that cited works, source title, and title were as effective as the full text, while keyword was found more effective than the full text. In addition, the findings showed that an indexing conception-based framework was more effective than the full text. Especially, the content-oriented and the document-oriented indexing approaches were found more effective than the full text. Among three indexing conception-based approaches, the content-oriented approach and the document-oriented approach were more effective than the domain-oriented approach. In other words, in the context of a typical scientific journal article data set, the objective contents and authors' intentions were more focused that the possible users' needs. The research findings of this study support that incorporation of human indexers' indexing approaches or conception in conjunction with semantic sources has a positive impact on the effectiveness of automatic subject term assignment.
APA, Harvard, Vancouver, ISO, and other styles
7

Haouam, Kamel Eddine. "RVSM A rhetorical conceptual model for content-based indexing and retrieval of text document." Thesis, London Metropolitan University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.517132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zhu, Weizhong Allen Robert B. "Text clustering and active learning using a LSI subspace signature model and query expansion /." Philadelphia, Pa. : Drexel University, 2009. http://hdl.handle.net/1860/3077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Thachuk, Christopher Joseph. "Space and energy efficient molecular programming and space efficient text indexing methods for sequence alignment." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44172.

Full text
Abstract:
Nucleic acids play vital roles in the cell by virtue of the information encoded into their nucleotide sequence and the folded structures they form. Given their propensity to alter their shape over time under changing environmental conditions, an RNA molecule will fold through a series of structures called a folding pathway. As this is a thermodynamically-driven probabilistic process, folding pathways tend to avoid high energy structures and those which do are said to have a low energy barrier. In the first part of this thesis, we study the problem of predicting low energy barrier folding pathways of a nucleic acid strand. We show various restrictions of the problem are computationally intractable, unless P=NP. We propose an exact algorithm that has exponential worst-case runtime, but uses only polynomial space and performs well in practice. Motivated by recent applications in molecular programming we also consider a number of related problems that leverage folding pathways to perform computation. We show that verifying the correctness of these systems is PSPACE-hard and in doing so show that predicting low energy barrier folding pathways of multiple interacting strands is PSPACE-complete. We explore the computational limits of this class of molecular programs which are capable, in principle, of logically reversible and thus energy efficient computation. We demonstrate that a space and energy efficient molecular program of this class can be constructed to solve any problem in SPACE ---the class of all space-bounded problems. We prove a number of limits to deterministic and also to space efficient computation of molecular programs that leverage folding pathways, and show limits for more general classes. In the second part of this thesis, we continue the study of algorithms and data structures for predicting properties of nucleic acids, but with quite different motivations pertaining to sequence rather than structure. We design a number of compressed text indexes that improve pattern matching queries in light of common biological events such as single nucleotide polymorphisms in genomes and alternative splicing in transcriptomes. Our text indexes and associated algorithms have the potential for use in alignment of sequencing data to reference sequences.
APA, Harvard, Vancouver, ISO, and other styles
10

Hon, Wing-kai. "On the construction and application of compressed text indexes." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B31059739.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Text indexing"

1

Rasmussen Neal, Diane, ed. Indexing and Retrieval of Non-Text Information. Berlin, Boston: DE GRUYTER SAUR, 2012. http://dx.doi.org/10.1515/9783110260588.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Silvester, June P. Machine aided indexing from natural language text. Washington, D. C: NASA, Scientific and Technical Information Program, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gonnet, G. H. Lexicographical indices for text: Inverted files vs. PAT trees. Waterloo: Centre for the New OED and Text Research, University of Waterloo, 1991.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Alistair, Moffat, and Bell Timothy C, eds. Managing gigabytes: Compressing and indexing documents and images. New York: Van Nostrand Reinhold, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Qurʼanic text: Toward a retrieval system. Herndon, Va: International Institute of Islamic Thought, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Sabourin, Conrad. Computational linguistics in information science: Information retrieval (full-text or conceptual), automatic indexing, text abstraction, content analysis, information extraction, query languages : bibliography. Montréal: Infolingua, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Weinberg, Bella Hass. Education and training in indexing and abstracting: A directory of courses and workshops offered in the United States and Canada, with a bibliography of text-books used in indexing and abstracting courses. 3rd ed. New York: American Society of Indexers, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Textual information access: Statistical models. London: ISTE, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ibekwe-SanJuan, Fidelia. Fouille de textes: Méthodes, outils et applications. Paris: Hermès science publications, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Automatic indexing and abstracting of document texts. Boston: Kluwer Academic Publishers, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Text indexing"

1

Aluru, Srinivas. "Text Indexing." In Encyclopedia of Algorithms, 2226–31. New York, NY: Springer New York, 2016. http://dx.doi.org/10.1007/978-1-4939-2864-4_422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Aluru, Srinivas. "Text Indexing." In Encyclopedia of Algorithms, 950–54. Boston, MA: Springer US, 2008. http://dx.doi.org/10.1007/978-0-387-30162-4_422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Jo, Taeho. "Text Indexing." In Studies in Big Data, 19–40. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-91815-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

De Moura, Edleno Silva. "Text Indexing Techniques." In Encyclopedia of Database Systems, 1–4. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_1135-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ferragina, Paolo, and Rossano Venturini. "Indexing Compressed Text." In Encyclopedia of Database Systems, 1–8. New York, NY: Springer New York, 2017. http://dx.doi.org/10.1007/978-1-4899-7993-3_1144-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Strate, Jason. "Full-Text Indexing." In Expert Performance Indexing in SQL Server 2019, 213–28. Berkeley, CA: Apress, 2019. http://dx.doi.org/10.1007/978-1-4842-5464-6_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Silva De Moura, Edleno. "Text Indexing Techniques." In Encyclopedia of Database Systems, 4084–88. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_1135.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Ferragina, Paolo, and Rossano Venturini. "Indexing Compressed Text." In Encyclopedia of Database Systems, 1861–68. New York, NY: Springer New York, 2018. http://dx.doi.org/10.1007/978-1-4614-8265-9_1144.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mäkinen, Veli, and Gonzalo Navarro. "Compressed Text Indexing." In Encyclopedia of Algorithms, 176–78. Boston, MA: Springer US, 2008. http://dx.doi.org/10.1007/978-0-387-30162-4_83.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

De Moura, Edleno Silva. "Text Indexing Techniques." In Encyclopedia of Database Systems, 3058–61. Boston, MA: Springer US, 2009. http://dx.doi.org/10.1007/978-0-387-39940-9_1135.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Text indexing"

1

Gelernter, Judith, and Michael Lesk. "Text mining for indexing." In the 2009 joint international conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1555400.1555517.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Qi-Rui Zhang, Ling Zhang, Shou-Bin Dong, and Jing-Hua Tan. "Document indexing in text categorization." In Proceedings of 2005 International Conference on Machine Learning and Cybernetics. IEEE, 2005. http://dx.doi.org/10.1109/icmlc.2005.1527600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Lienhart, Rainer. "Automatic text recognition for video indexing." In the fourth ACM international conference. New York, New York, USA: ACM Press, 1996. http://dx.doi.org/10.1145/244130.244137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Ganguly, Arnab, Wing-Kai Hon, Yu-An Huang, Solon P. Pissis, Rahul Shah, and Sharma V. Thankachan. "Parameterized Text Indexing with One Wildcard." In 2019 Data Compression Conference (DCC). IEEE, 2019. http://dx.doi.org/10.1109/dcc.2019.00023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shams, Rushdi, and Robert E. Mercer. "Investigating keyphrase indexing with text denoising." In the 12th ACM/IEEE-CS joint conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2232817.2232866.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Salton, Gerald. "Automatic text indexing using complex identifiers." In the ACM conference. New York, New York, USA: ACM Press, 1988. http://dx.doi.org/10.1145/62506.62530.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

D'Amore, Raymond J., and Clinton P. Mah. "One-time complete indexing of text." In the 8th annual international ACM SIGIR conference. New York, New York, USA: ACM Press, 1985. http://dx.doi.org/10.1145/253495.253521.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Hore, Bijit, Hakan Hacigumus, Bala Iyer, and Sharad Mehrotra. "Indexing text data under space constraints." In the Thirteenth ACM conference. New York, New York, USA: ACM Press, 2004. http://dx.doi.org/10.1145/1031171.1031212.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Martynov, Maxim, and Boris Novikov. "An Indexing Algorithm for Text Retrieval*." In Proceedings of the International Workshop on Advances in Databases and Information Systems (ADBIS ‘96). BCS Learning & Development, 1996. http://dx.doi.org/10.14236/ewic/adbis1996.16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Du, Zhang Yibo, Sun Le, Sun Yufang, and Han Jie. "PM-based indexing for Chinese text retrieval." In the fifth international workshop on. New York, New York, USA: ACM Press, 2000. http://dx.doi.org/10.1145/355214.355222.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Text indexing"

1

Furey, John, Austin Davis, and Jennifer Seiter-Moser. Natural language indexing for pedoinformatics. Engineer Research and Development Center (U.S.), September 2021. http://dx.doi.org/10.21079/11681/41960.

Full text
Abstract:
The multiple schema for the classification of soils rely on differing criteria but the major soil science systems, including the United States Department of Agriculture (USDA) and the international harmonized World Reference Base for Soil Resources soil classification systems, are primarily based on inferred pedogenesis. Largely these classifications are compiled from individual observations of soil characteristics within soil profiles, and the vast majority of this pedologic information is contained in nonquantitative text descriptions. We present initial text mining analyses of parsed text in the digitally available USDA soil taxonomy documentation and the Soil Survey Geographic database. Previous research has shown that latent information structure can be extracted from scientific literature using Natural Language Processing techniques, and we show that this latent information can be used to expedite query performance by using syntactic elements and part-of-speech tags as indices. Technical vocabulary often poses a text mining challenge due to the rarity of its diction in the broader context. We introduce an extension to the common English vocabulary that allows for nearly-complete indexing of USDA Soil Series Descriptions.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography