Journal articles: 'Automatic text retrieval'

1

SALTON, G. "Developments in Automatic Text Retrieval." Science 253, no. 5023 (August 30, 1991): 974–80. http://dx.doi.org/10.1126/science.253.5023.974.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wai Lam, M. Ruiz, and P. Srinivasan. "Automatic text categorization and its application to text retrieval." IEEE Transactions on Knowledge and Data Engineering 11, no. 6 (1999): 865–79. http://dx.doi.org/10.1109/69.824599.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Salton, Gerard. "Another look at automatic text-retrieval systems." Communications of the ACM 29, no. 7 (July 1986): 648–56. http://dx.doi.org/10.1145/6138.6149.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Foo, Schubert, Siu Cheung Hui, Hong Koon Lim, and Li Hui. "Automatic thesaurus for enhanced Chinese text retrieval." Library Review 49, no. 5 (July 2000): 230–40. http://dx.doi.org/10.1108/00242530010331754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Salton, Gerard, and Christopher Buckley. "Term-weighting approaches in automatic text retrieval." Information Processing & Management 24, no. 5 (January 1988): 513–23. http://dx.doi.org/10.1016/0306-4573(88)90021-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Salton, Gerard, James Allan, and Chris Buckley. "Automatic structuring and retrieval of large text files." Communications of the ACM 37, no. 2 (February 1994): 97–108. http://dx.doi.org/10.1145/175235.175243.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Wu, Zimin, and Gwyneth Tseng. "ACTS: An automatic Chinese text segmentation system for full text retrieval." Journal of the American Society for Information Science 46, no. 2 (March 1995): 83–96. http://dx.doi.org/10.1002/(sici)1097-4571(199503)46:2<83::aid-asi2>3.0.co;2-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

W Zaki, W. Mimi Diyana, Ling Chei Siong, Aini Hussain, W. Siti Halimatul Munirah W Ahmad, and Hamzaini Abdul Hamid. "12-APR Segmentation and Global Hu-F Descriptor for Human Spine MRI Image Retrieval." Jurnal Kejuruteraan 34, no. 4 (July 30, 2022): 659–70. http://dx.doi.org/10.17576/jkukm-2022-34(4)-14.

Full text

Abstract:

The image retrieval system has been used to provide the needed correct images to the physicians while the diagnosis and treatment process is being conducted. The earlier image retrieval system was a text-based image retrieval system (TBIRS) that used keywords for the image context and it requires human’s help to manually make text annotation on the images. The text annotation process is a laborious task especially when dealing with a huge database and is prone to human errors. To overcome the aforementioned issues, the approach of a content-based image retrieval system (CBIRS) with automatic indexing using visual features such as colour, shape and texture becomes popular. Thus, this study proposes a semi-automated shape segmentation method using a 12-anatomical point representation method of the human spine vertebrae for CBIRS. The 12 points, which are annotated manually on the region of interest (ROI), is followed by automatic ROI extraction. The segmentation method performs excellently, as evidenced by the highest accuracy of 0.9987, specificity of 0.9989, and sensitivity of 0.9913. The features of the segmented ROI are extracted with a novel global Hu-F descriptor that combines a global shape descriptor, a Hu moment invariant, and a Fourier descriptor based on the ANOVA selection approach. The retrieval phase is implemented using 100 MRI data of the human spine for thoracic, lumbar, and sacral bones. The highest obtained precision is 0.9110 using a normalized Manhattan metric for lumbar bones. In a conclusion, a retrieval system to retrieve lumbar bones of the MRI human spine has been successfully developed to help radiologists in diagnosing human spine diseases.

APA, Harvard, Vancouver, ISO, and other styles

9

Ostovar, Ahmad, Suna Bensch, and Thomas Hellström. "Natural language guided object retrieval in images." Acta Informatica 58, no. 4 (July 19, 2021): 243–61. http://dx.doi.org/10.1007/s00236-021-00400-2.

Full text

Abstract:

AbstractThe ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.

APA, Harvard, Vancouver, ISO, and other styles

10

Hamdy, Abeer, and Mohamed Elsayed. "Automatic Recommendation of Software Design Patterns: Text Retrieval Approach." Journal of Software 13, no. 4 (April 2018): 260–68. http://dx.doi.org/10.17706/jsw.13.4.260-268.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Kim, Jin-Suk, Du-Seok Jin, Kwang-Young Kim, and Ho-Seop Choe. "Automatic In-Text Keyword Tagging based on Information Retrieval." Journal of Information Processing Systems 5, no. 3 (September 30, 2009): 159–66. http://dx.doi.org/10.3745/jips.2009.5.3.159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Fkih, Fethi, and Mohamed Nazih Omri. "Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold." International Journal of Information Retrieval Research 2, no. 4 (October 2012): 12–30. http://dx.doi.org/10.4018/ijirr.2012100102.

Full text

Abstract:

Collocation is defined as a sequence of lexical tokens which habitually co-occur. This type of information is widely used in various applications such as Information Retrieval, document indexing, machine translation, lexicography, etc. Therefore, many techniques are developed for the automatic retrieval of collocations from textual documents. These techniques use statistical measures based on a joint frequency calculation to quantify the connection strength between the tokens of a candidate collocation. The discrimination between relevant and irrelevant collocations is performed using a priori fixed threshold. Generally, the discrimination threshold estimation is performed manually by a domain expert. This supervised estimation is considered as an additional cost which reduces system performance. In this paper, the authors propose a new technique for the threshold automatic learning to retrieve information from web text document. This technique is mainly based on the usual performance evaluation measures (such as ROC and Precision-Recall curves). The results show the ability to automatically estimate a statistical threshold independently of the treated corpus.

APA, Harvard, Vancouver, ISO, and other styles

13

Loukachevitch, Natalia, and Boris Dobrov. "The Sociopolitical Thesaurus as a resource for automatic document processing in Russian." Terminology 21, no. 2 (December 30, 2015): 237–62. http://dx.doi.org/10.1075/term.21.2.05lou.

Full text

Abstract:

This paper presents the structure and current state of the Sociopolitical thesaurus, which was developed for automatic document analysis and information-retrieval applications in Russian in a broad domain of public affairs. The scope of the Sociopolitical thesaurus resembles traditional information-retrieval thesauri for broad domains such as the EUROVOC or UNBIS thesauri, but the Sociopolitical thesaurus is intended as a tool for automatic document processing and this difference leads to considerable distinctions in the thesaurus structure and principles of its development. The knowledge representation in the Sociopolitical thesaurus is based on the combination of three existing traditions of developing information-retrieval thesauri, wordnets, and formal ontology research, which facilitates the consistent representation for such a broad scope of concepts and automatic document analysis of unstructured texts. The Sociopolitical thesaurus is used in such applications as conceptual indexing in information-retrieval systems, knowledge-based text categorization, automatic summarization of single and multiple documents, and question-answering. This paper presents an evaluation of the Sociopolitical thesaurus in automatic knowledge-based text categorization.

APA, Harvard, Vancouver, ISO, and other styles

14

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (1998): 483–84. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<483::aid-asi13>3.0.co;2-a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (1998): 484. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<484::aid-asi14>3.0.co;2-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Lancaster, F. W. "Retrieval experiments: Full text versus human indexing versus automatic indexing." Journal of the American Society for Information Science 49, no. 5 (April 15, 1998): 484. http://dx.doi.org/10.1002/(sici)1097-4571(19980415)49:5<484::aid-asi14>3.3.co;2-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Ni, Pin, Yuming Li, and Victor Chang. "Research on Text Classification Based on Automatically Extracted Keywords." International Journal of Enterprise Information Systems 16, no. 4 (October 2020): 1–16. http://dx.doi.org/10.4018/ijeis.2020100101.

Full text

Abstract:

Automatic keywords extraction and classification tasks are important research directions in the domains of NLP (natural language processing), information retrieval, and text mining. As the fine granularity abstracted from text data, keywords are also the most important feature of text data, which has great practical and potential value in document classification, topic modeling, information retrieval, and other aspects. The compact representation of documents can be achieved through keywords, which contains massive significant information. Therefore, it may be quite advantageous to realize text classification with high-dimensional feature space. For this reason, this study designed a supervised keyword classification method based on TextRank keyword automatic extraction technology and optimize the model with the genetic algorithm to contribute to modeling the keywords of the topic for text classification.

APA, Harvard, Vancouver, ISO, and other styles

18

Yadav, Niharika, and Vinay Kumar. "A novel technique for automatic retrieval of embedded text from books." Optik 127, no. 20 (October 2016): 9538–50. http://dx.doi.org/10.1016/j.ijleo.2016.05.122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Zhu, Hong Mei, Yong Quan Liang, Qi Jia Tian, and Shu Juan Ji. "Agricultural Policy-Oriented Ontology-Based Semantic Information Retrieval." Key Engineering Materials 439-440 (June 2010): 572–76. http://dx.doi.org/10.4028/www.scientific.net/kem.439-440.572.

Full text

Abstract:

Research on architecture of ontology-based information semantic representation and Retrieval is done. As a case study, a prototype for agricultural policy-oriented ontology-based semantic information retrieval system (APOSIRS) is established. Ontology plays a role that providing a shared terminology and supporting for the retrieval process. The architecture allows APOSIRS-based applications to perform automatic semantic information Retrieval of agricultural policy text at more length: automatic and dynamic semantic annotation of unstructured and semi-structured content, semantically-enabled information extraction, indexing, retrieval, as well as ontology management, such as querying and modifying the underlying ontology and knowledge bases. Main components of this architecture have been implemented and their results are reported.

APA, Harvard, Vancouver, ISO, and other styles

20

Pavlick, Ellie, and Chris Callison-Burch. "Extracting Structured Information via Automatic + Human Computation." Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 3 (September 23, 2015): 26–27. http://dx.doi.org/10.1609/hcomp.v3i1.13253.

Full text

Abstract:

We present a system for extracting structured information from unstructured text using a combination of information retrieval, natural language processing, machine learning, and crowdsourcing. We test our pipeline by building a structured database of gun violence incidents in the United States. The results of our pilot study demonstrate that the proposed methodology is a viable way of collecting large-scale, up-to-date data for public health, public policy, and social science research.

APA, Harvard, Vancouver, ISO, and other styles

21

Periyasamy, A. R. Pon. "Reversible N-grams Stemming Stripping Algorithm for Classification of Text Data." International Journal of Advanced Research in Computer Science and Software Engineering 7, no. 7 (July 30, 2017): 465. http://dx.doi.org/10.23956/ijarcsse/v7i4/0210.

Full text

Abstract:

Abstract—Stemming methods traces the root or stem of a word that is possibly used for Information retrieval (IR) tasks for increasing the recall rate to enhance most relevant searches. There are numerous ways ranging from manual and automatic, language dependent to language independent of methods available for performing the task of stemming. Those algorithms are designed for the purpose of overcoming the challenges involved with the existing methods. This paper represents a comparative study of various available stemming algorithms widely used to enhance the effectiveness and efficiency of information retrieval.

APA, Harvard, Vancouver, ISO, and other styles

22

Benício, Diego Henrique Pegado, João Carlos Xavier Junior, Kairon Ramon Sabino de Paiva, and Juliana Dantas de Araújo Santos Camargo. "Applying Text Mining and Natural Language Processing to Electronic Medical Records for extracting and transforming texts into structured data." Research, Society and Development 11, no. 6 (April 30, 2022): e37711629184. http://dx.doi.org/10.33448/rsd-v11i6.29184.

Full text

Abstract:

The recording of patients' data in electronic patient records (EPRs) by healthcare providers is usually performed in free text fields, allowing different ways of describing that type of information (e.g., abbreviation, terminology, etc.). In scenarios like that, retrieving data from such source (text) by using SQL (Structured Query Language) queries becomes an unfeasible issue. Based on this fact, we present in this paper a tool for extracting comprehensible and standardized patients' data from unstructured data which applies Text Mining and Natural Language Processing techniques. Our main goal is to carry out an automatic process of extracting, clearing and structuring data obtained from EPRs belonging to pregnant patients from the Januario Cicco maternity hospital located in Natal - Brazil. 3,000 EPRs written in Portuguese from 2016 e 2020 were used in our comparison analysis between data manually retrieved by health professionals (e.g., doctors and nurses) and data retrieved by our tool. Moreover, we applied the Kruskal-Wallis statistical test in order to statically evaluate the obtained results between manual and automatic processes. Finally, the statistical results have showed that there was no statistical difference between the retrieval processes. In this sense, the final results were considerably promising.

APA, Harvard, Vancouver, ISO, and other styles

23

Takada, Tomoki, Mizuki Arai, and Tomohiro Takagi. "Automatic Keyword Annotation System Using Newspapers." Journal of Advanced Computational Intelligence and Intelligent Informatics 18, no. 3 (May 20, 2014): 340–46. http://dx.doi.org/10.20965/jaciii.2014.p0340.

Full text

Abstract:

Nowadays, an increasingly large amount of information exists on the web. Therefore, a method is needed that enables us to find necessary information quickly because this is becoming increasingly difficult for users. To solve this problem, information retrieval systems like Google and recommendation systems like that on Amazon are used. In this paper, we focus on information retrieval systems. These retrieval systems require index terms, which affect the precision of retrieval. Two methods generally decide index terms. One is analyzing a text using natural language processing and deciding index terms using varying amounts of statistics. The other is someone choosing document keywords as index terms. However, the latter method requires too much time and effort and becomes more impractical as information grows. Therefore, we propose the Nikkei annotator system, which is based on the model of the human brain and learns patterns of past keyword annotation and automatically outputs keywords that users prefer. The purposes of the proposed method are automating manual keyword annotation and achieving high speed and high accuracy keyword annotation. Experimental results showed that the proposed method is more accurate than TFIDF and Naive Bayes in P@5 and P@10. Moreover, these results also showed that the proposed method could annotate about 19 times faster than Naive Bayes.

APA, Harvard, Vancouver, ISO, and other styles

24

Fjeldvig, Tove, and Anne Golden. "Experiments with Language-based Aids in Information Retrieval Systems." Nordic Journal of Linguistics 11, no. 1-2 (June 1988): 33–46. http://dx.doi.org/10.1017/s0332586500001736.

Full text

Abstract:

The fact that a lexeme can appear in various forms causes problems in information retrieval. As a solution to this problem, we have developed methods for automatic root lemmatization, automatic truncation and automatic splitting of compound words. All the methods have as their basis a set of rules which contain information regarding inflected and derived forms of words – and not a dictionary. The methods have been tested on several collections of texts, and have produced very good results. By controlled experiments in text retrieval, we have studied the effects on search results. These results show that both the method of automatic root lemmatization and the method of automatic truncation make a considerable improvement on search quality. The experiments with splitting of compound words did not give quite the same improvement, however, but all the same this experiment showed that such a method could contribute to a richer and more complete search request.

APA, Harvard, Vancouver, ISO, and other styles

25

Hkiri, Emna, Souheyl Mallat, and Mounir Zrigui. "Events Automatic Extraction from Arabic Texts." International Journal of Information Retrieval Research 6, no. 1 (January 2016): 36–51. http://dx.doi.org/10.4018/ijirr.2016010103.

Full text

Abstract:

The event extraction task consists in determining and classifying events within an open-domain text. It is very new for the Arabic language, whereas it attained its maturity for some languages such as English and French. Events extraction was also proved to help Natural Language Processing tasks such as Information Retrieval and Question Answering, text mining, machine translation etc… to obtain a higher performance. In this article, we present an ongoing effort to build a system for event extraction from Arabic texts using Gate platform and other tools.

APA, Harvard, Vancouver, ISO, and other styles

26

Zhou, Ning, and Jianping Fan. "Automatic image–text alignment for large-scale web image indexing and retrieval." Pattern Recognition 48, no. 1 (January 2015): 205–19. http://dx.doi.org/10.1016/j.patcog.2014.07.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Baopeng, Yanyun Qu, Jinye Peng, and Jianping Fan. "An automatic image-text alignment method for large-scale web image retrieval." Multimedia Tools and Applications 76, no. 20 (October 27, 2016): 21401–21. http://dx.doi.org/10.1007/s11042-016-4059-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Boyce, Bert R. "Concepts of information retrieval and automatic text processing: The transformation analysis, and retrieval of information by computer." Journal of the American Society for Information Science 41, no. 2 (March 1990): 150–51. http://dx.doi.org/10.1002/(sici)1097-4571(199003)41:2<150::aid-asi12>3.0.co;2-8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Gragnaniello, Diego, Andrea Bottino, Sandro Cumani, and Wonjoon Kim. "Special Issue on Advances in Deep Learning." Applied Sciences 10, no. 9 (May 2, 2020): 3172. http://dx.doi.org/10.3390/app10093172.

Full text

Abstract:

Nowadays, deep learning is the fastest growing research field in machine learning and has a tremendous impact on a plethora of daily life applications, ranging from security and surveillance to autonomous driving, automatic indexing and retrieval of media content, text analysis, speech recognition, automatic translation, and many others [...]

APA, Harvard, Vancouver, ISO, and other styles

30

Chen, Rong, Feng Chen, and Yi Sun. "Research on Automatic Text Classification Algorithm Based on ITF-IDF and KNN." Applied Mechanics and Materials 713-715 (January 2015): 1830–34. http://dx.doi.org/10.4028/www.scientific.net/amm.713-715.1830.

Full text

Abstract:

We consider how to efficiently text classification on all pairs of documents. This information can be used to information retrieval, digital library, information filtering, and search engine, among others. This paper describes text classification model which based on KNN algorithm. The text feature extraction algorithm, TF-IDF, can loss related information between text features, an improved ITF-IDF algorithm has been presented in order to overcome it. Our experiments show that our algorithm is better than others.

APA, Harvard, Vancouver, ISO, and other styles

31

RUGGIERO, FRANCESCO, and REINIER VAN KLEIJ. "ON-LINE HYPERMEDIA NEWSPAPERS: AN EXPERIMENT WITH “L’UNIONE SARDA”." International Journal of Modern Physics C 05, no. 05 (October 1994): 899–906. http://dx.doi.org/10.1142/s0129183194001033.

Full text

Abstract:

In this brief paper we present a prototype of a an On-line hypermedia newspaper, the first example of daily electronic publishing in Italy, based on the results of a collaboration between CRS4 and L’UNIONE SARDA. The on-line newspaper (text and picture) is created by automatic retrieval, compression, transmission and conversion of newspaper data. The prototype is under development and currently allows automatic hypertextual links, article retrieval facilities and a simple mechanism for creating a personal newspaper. Some HyperText Markup Language (HTML) pages, are shown to give an impression of the prototype.

APA, Harvard, Vancouver, ISO, and other styles

32

Bichi, Abdulkadir Abubakar, Ruhaidah Samsudin, and Rohayanti Hassan. "Automatic construction of generic stop words list for hausa text." Indonesian Journal of Electrical Engineering and Computer Science 25, no. 3 (March 1, 2022): 1501. http://dx.doi.org/10.11591/ijeecs.v25.i3.pp1501-1507.

Full text

Abstract:

<span lang="EN-US">Stop-words are words having the highest frequencies in a document without any significant information. They are characterized by having common relations within a cluster. They are the noise of the text that are evenly distributed over a document. Removal of stop words improve the performance and accuracy of information retrieval algorithms and machine learning at large. It saves the storage space by reducing the vector space dimension, and helps in effective documents indexing. This research generated a list of Hausa stop words automatically using aggregated method by combining frequency and statistics methods. The experiments are conducted using a primarily collected Hausa corpus consisting of 841 Hausa news articles of size 646862 words and finally a list of distinct 81 Hausa stop words is generated.</span>

APA, Harvard, Vancouver, ISO, and other styles

33

Jantima Polpinij, and Chumsak Sribunruang. "Automatic Retrieval of Particular Oncology Documents from PubMed by Semantic-based Text Clustering." International Journal of Advancements in Computing Technology 5, no. 11 (July 31, 2013): 65–74. http://dx.doi.org/10.4156/ijact.vol5.issue11.8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Zhang, Hongli. "Voice Keyword Retrieval Method Using Attention Mechanism and Multimodal Information Fusion." Scientific Programming 2021 (January 23, 2021): 1–11. http://dx.doi.org/10.1155/2021/6662841.

Full text

Abstract:

A cross-modal speech-text retrieval method using interactive learning convolution automatic encoder (CAE) is proposed. First, an interactive learning autoencoder structure is proposed, including two inputs of speech and text, as well as processing links such as encoding, hidden layer interaction, and decoding, to complete the modeling of cross-modal speech-text retrieval. Then, the original audio signal is preprocessed and the Mel frequency cepstrum coefficient (MFCC) feature is extracted. In addition, the word bag model is used to extract the text features, and then the attention mechanism is used to combine the text and speech features. Through interactive learning CAE, the shared features of speech and text modes are obtained and then sent to modal classifier to identify modal information, so as to realize cross-modal voice text retrieval. Finally, experiments show that the performance of the proposed algorithm is better than that of the contrast algorithm in terms of recall rate, accuracy rate, and false recognition rate.

APA, Harvard, Vancouver, ISO, and other styles

35

Jabbar, Muhammad Shahid, Jitae Shin, and Jun-Dong Cho. "AI Ekphrasis: Multi-Modal Learning with Foundation Models for Fine-Grained Poetry Retrieval." Electronics 11, no. 8 (April 18, 2022): 1275. http://dx.doi.org/10.3390/electronics11081275.

Full text

Abstract:

Artificial intelligence research in natural language processing in the context of poetry struggles with the recognition of holistic content such as poetic symbolism, metaphor, and other fine-grained attributes. Given these challenges, multi-modal image–poetry reasoning and retrieval remain largely unexplored. Our recent accessibility study indicates that poetry is an effective medium to convey visual artwork attributes for improved artwork appreciation of people with visual impairments. We, therefore, introduce a deep learning approach for the automatic retrieval of poetry suitable to the input images. The recent state-of-the-art CLIP provides a way for multi-modal visual and text features matched using cosine similarity. However, it lacks shared cross-modality attention features to model fine-grained relationships. The proposed approach in this work takes advantage of strong pre-training of the CLIP model and overcomes its limitations by introducing shared attention parameters to better model the fine-grained relationship between both modalities. We test and compare our proposed approach using the expertly annotated MiltiM-Poem dataset, which is considered the largest public image–poetry pair dataset for English poetry. The proposed approach aims to solve the problems of image-based attribute recognition and automatic retrieval for fine-grained poetic verses. The test results reflect that the shared attention parameters alleviate fine-grained attribute recognition, and the proposed approach is a significant step towards automatic multi-modal retrieval for improved artwork appreciation of people with visual impairments.

APA, Harvard, Vancouver, ISO, and other styles

36

Klochko, Andriy. "INTRODUCTION OF INTELLECTUAL ANALYSIS TECHNOLOGIES OF TEXT DOCUMENTS INTO FIELD OF TECHNICAL REGULATION IN CONSTRUCTION." Management of Development of Complex Systems, no. 47 (September 27, 2021): 63–70. http://dx.doi.org/10.32347/2412-9933.2021.47.63-70.

Full text

Abstract:

The article is devoted to the introduction of intellectual analyses technology of text documents into the field of technical regulation in Ukraine construction. The main attention in the paper is directed on the decision of questions of automatic collection and intellectual analysis of construction branch`s normative documents. These issues are becoming extremely important in connection with the digitalization of all sectors of the economy.Urgent problems of the technical regulation system in construction are highlighted.It is shown that these problems bring to the fore the task of increasing the speed and reliability of processing text documents in electronic information systems. The solution to this problem involves the development of automatic systems that are capable of intelligent document search in uncertainty conditions that caused by the presence of redundant textual information.The overview of information retrieval systems used to processing text documents in electronic information resources is conducted.Preconditions of introduction of intellectual analysis technologies of text documents into the technical regulation sphere in Ukraine construction are investigated.The timeliness of the technology implementation substantiated. The basic concepts used in the models and methods development of automatic extraction of meaningful information from texts are given.Process of data mining is studied and Models of textual information mining are analyzed that used in different information retrieval systems. Scheme of introduction of intellectual analysis technology of text documents into the Unified state electronic system in the field of construction is offered.Solution of clustering of text documents problem apply artificial neural networks is supposed.The possibility of using such models as Deep Structured Semantic Model and Self-Organizing Map is considered. The choice of these models is based on their ability to determine the proximity degree of information retrieval images of text documents. Work practical significance is seen in the improvement of search engines in the field of technical regulation in construction and the ability to significantly accelerate the process of restructuring into the field of technical regulation in Ukraine construction.

APA, Harvard, Vancouver, ISO, and other styles

37

Ahn, Hyeokju, and Harksoo Kim. "Enhanced Spoken Sentence Retrieval Using a Conventional Automatic Speech Recognizer in Smart Home." International Journal on Artificial Intelligence Tools 25, no. 03 (June 2016): 1650017. http://dx.doi.org/10.1142/s0218213016500172.

Full text

Abstract:

With the rapid evolution of smart home environment, the demand for spoken information retrieval (e.g., voice-activated FAQ retrieval) on information appliances is increasing. In spoken information retrieval, users’ spoken queries are converted into text queries using automatic speech recognition (ASR) engines. If top-1 results of the ASR engines are incorrect, the errors are propagated to information retrieval systems. If a document collection is a small set of sentences such as frequently asked questions (FAQs), the errors have additional effect on the performance of information retrieval systems. To improve the performance of such a sentence retrieval system, we propose a post-processing model of an ASR engine. The post-processing model consists of a re-ranking and a query term generation model. The re-ranking model rearranges top-n outputs of the ASR engines using the ranking support vector machine (Ranking SVM). The query term generation model extracts meaningful content words from the re-ranked queries based on term frequencies and query rankings. In the experiments, the re-ranking model improved the top-1 performance results of an underlying ASR engine with 4.4% higher precision and 6.4% higher recall rate. The query term generation model improved the performance results of an underlying information retrieval system with an accuracy 2.4% to 2.6% higher. Based on the experimental result, the proposed model revealed that it could improve the performance of a spoken sentence retrieval system in a restricted domain.

APA, Harvard, Vancouver, ISO, and other styles

38

Kim, Pan Koo. "An automatic indexing of compound words based on mutual information for Korean text retrieval." Library and Information Science 34 (March 31, 1997): 29–38. http://dx.doi.org/10.46895/lis.34.29.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Brandt, Cynthia, and Prakash Nadkarni. "Web-based UMLS concept retrieval by automatic text scanning: a comparison of two methods." Computer Methods and Programs in Biomedicine 64, no. 1 (January 2001): 37–43. http://dx.doi.org/10.1016/s0169-2607(00)00092-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Küçük, Dilek, and Adnan Yazıcı. "A semi-automatic text-based semantic video annotation system for Turkish facilitating multilingual retrieval." Expert Systems with Applications 40, no. 9 (July 2013): 3398–411. http://dx.doi.org/10.1016/j.eswa.2012.12.048.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Mao, Wenlei, and Wesley W. Chu. "The phrase-based vector space model for automatic retrieval of free-text medical documents." Data & Knowledge Engineering 61, no. 1 (April 2007): 76–92. http://dx.doi.org/10.1016/j.datak.2006.02.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Demner-Fushman, Dina, Marc D. Kohli, Marc B. Rosenman, Sonya E. Shooshan, Laritza Rodriguez, Sameer Antani, George R. Thoma, and Clement J. McDonald. "Preparing a collection of radiology examinations for distribution and retrieval." Journal of the American Medical Informatics Association 23, no. 2 (July 1, 2015): 304–10. http://dx.doi.org/10.1093/jamia/ocv080.

Full text

Abstract:

Abstract Objective Clinical documents made available for secondary use play an increasingly important role in discovery of clinical knowledge, development of research methods, and education. An important step in facilitating secondary use of clinical document collections is easy access to descriptions and samples that represent the content of the collections. This paper presents an approach to developing a collection of radiology examinations, including both the images and radiologist narrative reports, and making them publicly available in a searchable database. Materials and Methods The authors collected 3996 radiology reports from the Indiana Network for Patient Care and 8121 associated images from the hospitals’ picture archiving systems. The images and reports were de-identified automatically and then the automatic de-identification was manually verified. The authors coded the key findings of the reports and empirically assessed the benefits of manual coding on retrieval. Results The automatic de-identification of the narrative was aggressive and achieved 100% precision at the cost of rendering a few findings uninterpretable. Automatic de-identification of images was not quite as perfect. Images for two of 3996 patients (0.05%) showed protected health information. Manual encoding of findings improved retrieval precision. Conclusion Stringent de-identification methods can remove all identifiers from text radiology reports. DICOM de-identification of images does not remove all identifying information and needs special attention to images scanned from film. Adding manual coding to the radiologist narrative reports significantly improved relevancy of the retrieved clinical documents. The de-identified Indiana chest X-ray collection is available for searching and downloading from the National Library of Medicine ( http://openi.nlm.nih.gov/ ).

APA, Harvard, Vancouver, ISO, and other styles

43

Zheng, Xin, and Ai Ping Cai. "The Method of Web Image Annotation Classification Automatic." Advanced Materials Research 889-890 (February 2014): 1323–26. http://dx.doi.org/10.4028/www.scientific.net/amr.889-890.1323.

Full text

Abstract:

It has been heavy work that to find the related pictures form Internet without annotation. Therefore, the automatic image annotation was extremely important in image retrieval. The traditional method were translated image visual feature into keywords simply, but it ignored the image similarity problem between the low-level visual features and high-level semantic. That is image "gap" problem, so image annotation was very lower. This paper puts forward a classification of web based image content automatic tagging mixing technology, the first it will map visual feature of image to one or more rough images, then we will preprocess the web page text information, finally we select some keywords similarity as image annotation by using similar semantic processing module. So it realizes the image and text combining the automatic annotation and it achieve high precision of image annotation.

APA, Harvard, Vancouver, ISO, and other styles

44

Zheng, Min, Bo Liu, and Le Sun. "LawRec: Automatic Recommendation of Legal Provisions Based on Legal Text Analysis." Computational Intelligence and Neuroscience 2022 (September 14, 2022): 1–7. http://dx.doi.org/10.1155/2022/6313161.

Full text

Abstract:

Smart court technologies are making full use of modern science to promote the modernization of the trial system and trial capabilities, for example, artificial intelligence, Internet of things, and cloud computing. The smart court technologies can improve the efficiency of case handling and achieving convenience for the people. Article recommendation is an important part of intelligent trial. For ordinary people without legal background, the traditional information retrieval system that searches laws and regulations based on keywords is not applicable because they do not have the ability to extract professional legal vocabulary from complex case processes. This paper proposes a law recommendation framework, called LawRec, based on Bidirectional Encoder Representation from Transformers (BERT) and Skip-Recurrent Neural Network (Skip-RNN) models. It intends to integrate the knowledge of legal provisions with the case description and uses the BERT model to learn the case description text and legal knowledge, respectively. At last, laws and regulations for cases can be recommended. Experiment results show that the proposed LawRec can achieve better performance than state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

45

Wang, Jiapeng, and Yihong Dong. "Measurement of Text Similarity: A Survey." Information 11, no. 9 (August 31, 2020): 421. http://dx.doi.org/10.3390/info11090421.

Full text

Abstract:

Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.

APA, Harvard, Vancouver, ISO, and other styles

46

Wang, Qicai, Peiyu Liu, Zhenfang Zhu, Hongxia Yin, Qiuyue Zhang, and Lindong Zhang. "A Text Abstraction Summary Model Based on BERT Word Embedding and Reinforcement Learning." Applied Sciences 9, no. 21 (November 4, 2019): 4701. http://dx.doi.org/10.3390/app9214701.

Full text

Abstract:

As a core task of natural language processing and information retrieval, automatic text summarization is widely applied in many fields. There are two existing methods for text summarization task at present: abstractive and extractive. On this basis we propose a novel hybrid model of extractive-abstractive to combine BERT (Bidirectional Encoder Representations from Transformers) word embedding with reinforcement learning. Firstly, we convert the human-written abstractive summaries to the ground truth labels. Secondly, we use BERT word embedding as text representation and pre-train two sub-models respectively. Finally, the extraction network and the abstraction network are bridged by reinforcement learning. To verify the performance of the model, we compare it with the current popular automatic text summary model on the CNN/Daily Mail dataset, and use the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metrics as the evaluation method. Extensive experimental results show that the accuracy of the model is improved obviously.

APA, Harvard, Vancouver, ISO, and other styles

47

FRAGKOU, PAVLINA. "INFORMATION EXTRACTION VERSUS TEXT SEGMENTATION FOR WEB CONTENT MINING." International Journal of Software Engineering and Knowledge Engineering 23, no. 08 (October 2013): 1109–37. http://dx.doi.org/10.1142/s0218194013500332.

Full text

Abstract:

The information explosion of the Web aggravates the problem of effective information retrieval. Even though various approaches in the literature aim to enhance retrieval, they prove to be insufficient because the actual content of a page is poorly exploited with regard to a specific semantic content. This paper extends an existing method for performing automatic semantic segmentation. The existing method initially partitions a web page into blocks based on its visual layout and the application of a set of heuristics. The subsequent step performs partitioning based on the appearance of specific types of named entities with the help of a machine learning algorithm. Our work extends the initial method in multiple directions. First of all, it examines alternative named entities as features in the learning step. Secondly, it extends the initial corpus. Thirdly, it evaluates and compares the initial method with metrics used in text segmentation. Furthermore, the result of text segmentation is incorporated as feature in the learning process. Finally, two text segmentation algorithms are applied to evaluate the effectiveness of manual annotation. Reported results show that the synergy of semantic-based and text segmentation algorithms strongly depends on the predefined semantic model used for text segmentation.

APA, Harvard, Vancouver, ISO, and other styles

48

LEE, HANG-MAO, KARL-JOSEF DIETZ, and RALF HOFESTÄDT. "COMPUTATIONAL CONSTRUCTION OF SPECIALIZED BIOLOGICAL NETWORKS." Journal of Bioinformatics and Computational Biology 11, no. 01 (February 2013): 1340003. http://dx.doi.org/10.1142/s0219720013400039.

Full text

Abstract:

Today we have access to more than 1500 molecular database systems inside the internet. Based on these databases and information systems, computer scientists developed and implemented different methods for the automatic integration and prediction of biological networks. The idea is to use such methods for the automatic prediction and expansion of rudimentary molecular knowledge. However, the inherent data deficiency problem concerning the properties of specialized network hampers the database- and text-mining-based network construction. This paper presents the concept concerning the computational network expansion, namely for the specific biological network–thiol-disulfide redox regulatory network. Besides, a network-contexted document retrieval system (ncDocReSy) is also introduced to assist the network reduction by providing indirectly relevant literature for user's manual curation. NcDocReSy combines literature search with biological network and ranks the retrieved literature according to the network topology. NcDocReSy is implemented as a Cytoscape plugin.

APA, Harvard, Vancouver, ISO, and other styles

49

Ting, Yu. "Multiple Features Image Fusion Based on Visual Dictionary." Advanced Materials Research 889-890 (February 2014): 1111–14. http://dx.doi.org/10.4028/www.scientific.net/amr.889-890.1111.

Full text

Abstract:

It has been heavy work that to find the related pictures form Internet without annotation. Therefore, the automatic image annotation was extremely important in image retrieval. The traditional method were translated image visual feature into keywords simply, but it ignored the image similarity problem between the low-level visual features and high-level semantic. That is image "gap" problem, so image annotation was very lower. This paper puts forward a classification of web based image content automatic tagging mixing technology, the first it will map visual feature of image to one or more rough images, then we will preprocess the web page text information, finally we select some keywords similarity as image annotation by using similar semantic processing module. So it realizes the image and text combining the automatic annotation and it achieve high precision of image annotation.

APA, Harvard, Vancouver, ISO, and other styles

50

Mosbah, Mawloud. "Improving the Results of Google Scholar Engine through Automatic Query Expansion Mechanism and Pseudo Re-ranking using MVRA." Journal of information and organizational sciences 42, no. 2 (December 10, 2018): 219–29. http://dx.doi.org/10.31341/jios.42.2.5.

Full text

Abstract:

In this paper, we address the enhancing of Google Scholar engine, in the context of text retrieval, through two mechanisms related to the interrogation protocol of that query expansion and reformulation. The both schemes are applied with re-ranking results using a pseudo relevance feedback algorithm that we have proposed previously in the context of Content based Image Retrieval (CBIR) namely Majority Voting Re-ranking Algorithm (MVRA). The experiments conducted using ten queries reveal very promising results in terms of effectiveness.

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Automatic text retrieval'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles